问题描述
我正在使用 Google 的 TensorFlow 开发深度学习模型.该模型应该用于分割和标记场景.
I am working on a deep learning model using Google's TensorFlow. The model should be used to segment and label scenes.
- 我正在使用具有 33 个语义的 SiftFlow 数据集类和256x256像素的图像.
- 因此,在我使用卷积和反卷积的最后一层,我得到了以下张量(数组)[256, 256, 33].
- 接下来我想应用 Softmax 并将结果与大小的语义标签进行比较[256, 256].
- I am using the SiftFlow dataset which has 33 semanticclasses and images with 256x256 pixels.
- As a result, at my final layer using convolution and deconvolution I arrive at the following tensor(array) [256, 256, 33].
- Next I would like toapply Softmax and compare the results to a semantic label of size[256, 256].
问题:我应该将均值平均或 argmax 应用到我的最后一层,使其形状变为 [256,256,1] 然后循环遍历每个像素并分类,就好像我对 256x256 实例进行分类一样?如果答案是肯定的,那么如何,如果不是,还有什么其他选择?
Questions:Should I apply mean averaging or argmax to my final layer so its shape becomes [256,256,1] and then loop through each pixel and classify as if I were classying 256x256 instances? If the answer is yes, how, if not, what other options?
推荐答案
要应用 softmax 并使用交叉熵损失,您必须保持完整的最终输出您的网络大小batch_size x 256 x 256 x 33.因此,您不能使用均值平均或 argmax,因为它会破坏网络的输出概率.
To apply softmax and use a cross entropy loss, you have to keep intact the final output of your network of size batch_size x 256 x 256 x 33. Therefore you cannot use mean averaging or argmax because it would destroy the output probabilities of your network.
您必须遍历所有 batch_size x 256 x 256 像素,并将交叉熵损失应用于您对该像素的预测.这很容易使用内置函数 tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels)
.
You have to loop through all the batch_size x 256 x 256 pixels and apply a cross entropy loss to your prediction for this pixel. This is easy with the built-in function tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels)
.
一些警告之前来自文档应用以下代码:
Some warnings from the doc before applying the code below:
- 警告:此操作需要未缩放的 logits,因为它在内部对 logits 执行 softmax 以提高效率.不要用 softmax 的输出调用这个操作,因为它会产生不正确的结果.
- logits 并且必须具有形状 [batch_size, num_classes] 和 dtype(float32 或 float64).
- 标签必须具有 [batch_size] 形状和 dtype int64.
- WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.
- logits and must have the shape [batch_size, num_classes] and the dtype (either float32 or float64).
- labels must have the shape [batch_size] and the dtype int64.
诀窍是使用 batch_size * 256 * 256
作为函数所需的批大小.我们将把 logits
和 labels
改造成这种格式.这是我使用的代码:
The trick is to use batch_size * 256 * 256
as the batch size required by the function. We will reshape logits
and labels
to this format.Here is the code I use:
inputs = tf.placeholder(tf.float32, [batch_size, 256, 256, 3]) # input images
logits = inference(inputs) # your outputs of shape [batch_size, 256, 256, 33] (no final softmax !!)
labels = tf.placeholder(tf.float32, [batch_size, 256, 256]) # your labels of shape [batch_size, 256, 256] and type int64
reshaped_logits = tf.reshape(logits, [-1, 33]) # shape [batch_size*256*256, 33]
reshaped_labels = tf.reshape(labels, [-1]) # shape [batch_size*256*256]
loss = sparse_softmax_cross_entropy_with_logits(reshaped_logits, reshaped_labels)
然后,您可以对该损失应用优化器.
You can then apply your optimizer on that loss.
的文档tf.sparse_softmax_cross_entropy_with_logits
表明它现在接受 logits
的任何形状,因此无需重新调整张量的形状(感谢 @chillinger):
The documentation of tf.sparse_softmax_cross_entropy_with_logits
shows that it now accepts any shape for logits
, so there is no need to reshape the tensors (thanks @chillinger):
inputs = tf.placeholder(tf.float32, [batch_size, 256, 256, 3]) # input images
logits = inference(inputs) # your outputs of shape [batch_size, 256, 256, 33] (no final softmax !!)
labels = tf.placeholder(tf.float32, [batch_size, 256, 256]) # your labels of shape [batch_size, 256, 256] and type int64
loss = sparse_softmax_cross_entropy_with_logits(logits, labels)
这篇关于如何在 TensorFlow 中实现场景标记的逐像素分类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!