如何在 TensorFlow 中实现场景标记的逐像素分类?

本文介绍了如何在 TensorFlow 中实现场景标记的逐像素分类?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 Google 的 TensorFlow 开发深度学习模型.该模型应该用于分割和标记场景.

I am working on a deep learning model using Google's TensorFlow. The model should be used to segment and label scenes.

我正在使用具有 33 个语义的 SiftFlow 数据集类和256x256像素的图像.
因此，在我使用卷积和反卷积的最后一层，我得到了以下张量(数组)[256, 256, 33].
接下来我想应用 Softmax 并将结果与大小的语义标签进行比较[256, 256].

I am using the SiftFlow dataset which has 33 semanticclasses and images with 256x256 pixels.
As a result, at my final layer using convolution and deconvolution I arrive at the following tensor(array) [256, 256, 33].
Next I would like toapply Softmax and compare the results to a semantic label of size[256, 256].

问题:我应该将均值平均或 argmax 应用到我的最后一层，使其形状变为 [256,256,1] 然后循环遍历每个像素并分类，就好像我对 256x256 实例进行分类一样?如果答案是肯定的，那么如何，如果不是，还有什么其他选择?

Questions:Should I apply mean averaging or argmax to my final layer so its shape becomes [256,256,1] and then loop through each pixel and classify as if I were classying 256x256 instances? If the answer is yes, how, if not, what other options?

推荐答案

要应用 softmax 并使用交叉熵损失，您必须保持完整的最终输出您的网络大小batch_size x 256 x 256 x 33.因此，您不能使用均值平均或 argmax，因为它会破坏网络的输出概率.

To apply softmax and use a cross entropy loss, you have to keep intact the final output of your network of size batch_size x 256 x 256 x 33. Therefore you cannot use mean averaging or argmax because it would destroy the output probabilities of your network.

您必须遍历所有 batch_size x 256 x 256 像素，并将交叉熵损失应用于您对该像素的预测.这很容易使用内置函数 tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels).

You have to loop through all the batch_size x 256 x 256 pixels and apply a cross entropy loss to your prediction for this pixel. This is easy with the built-in function tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels).

一些警告之前来自文档应用以下代码:

Some warnings from the doc before applying the code below:

警告:此操作需要未缩放的 logits，因为它在内部对 logits 执行 softmax 以提高效率.不要用 softmax 的输出调用这个操作，因为它会产生不正确的结果.
logits 并且必须具有形状 [batch_size, num_classes] 和 dtype(float32 或 float64).
标签必须具有 [batch_size] 形状和 dtype int64.

WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.
logits and must have the shape [batch_size, num_classes] and the dtype (either float32 or float64).
labels must have the shape [batch_size] and the dtype int64.

诀窍是使用 batch_size * 256 * 256 作为函数所需的批大小.我们将把 logits 和 labels 改造成这种格式.这是我使用的代码:

The trick is to use batch_size * 256 * 256 as the batch size required by the function. We will reshape logits and labels to this format.Here is the code I use:

inputs = tf.placeholder(tf.float32, [batch_size, 256, 256, 3])  # input images
logits = inference(inputs)  # your outputs of shape [batch_size, 256, 256, 33] (no final softmax !!)
labels = tf.placeholder(tf.float32, [batch_size, 256, 256])  # your labels of shape [batch_size, 256, 256] and type int64

reshaped_logits = tf.reshape(logits, [-1, 33])  # shape [batch_size*256*256, 33]
reshaped_labels = tf.reshape(labels, [-1])  # shape [batch_size*256*256]
loss = sparse_softmax_cross_entropy_with_logits(reshaped_logits, reshaped_labels)

然后，您可以对该损失应用优化器.

You can then apply your optimizer on that loss.

的文档tf.sparse_softmax_cross_entropy_with_logits 表明它现在接受 logits 的任何形状，因此无需重新调整张量的形状(感谢 @chillinger):

The documentation of tf.sparse_softmax_cross_entropy_with_logits shows that it now accepts any shape for logits, so there is no need to reshape the tensors (thanks @chillinger):

inputs = tf.placeholder(tf.float32, [batch_size, 256, 256, 3])  # input images
logits = inference(inputs)  # your outputs of shape [batch_size, 256, 256, 33] (no final softmax !!)
labels = tf.placeholder(tf.float32, [batch_size, 256, 256])  # your labels of shape [batch_size, 256, 256] and type int64

loss = sparse_softmax_cross_entropy_with_logits(logits, labels)

这篇关于如何在 TensorFlow 中实现场景标记的逐像素分类?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！