python - Tensorflow中的CNN-损失保持恒定

我才刚刚开始从事机器学习，并想创建简单的CNN来对2种不同的叶子（属于2种不同的树）进行分类。在收集大量叶子的图片之前，我决定在Tensorflow中创建非常小的简单CNN，并仅在一张图像上对其进行训练，以检查代码是否正确。我将256x256（x 3通道）大小的照片标准化为，并创建了4层（2个conv和2个密集）网络。不幸的是，损失从一开始就几乎总是趋于某个恒定值（通常是一个整数）。我认为图片有问题，因此我将其替换为相同尺寸的随机numpy数组。不幸的是，损失仍然是恒定的。有时候，网络似乎正在学习，因为损失正在减少，但是大多数时间从一开始就是恒定的。谁能帮忙解释一下，为什么呢？我读过一个例子，训练是检查代码中是否缺少bug的最好方法，但是我在其中苦苦挣扎的时间越长，看到的内容就越少。

这是我的代码（基于TensorFlow教程1）。我使用指数线性单位，因为我认为我的问题是由初始化不当的ReLU中的0梯度引起的。

import matplotlib.pyplot as plt
import numpy as np
from numpy import random
from sklearn import utils
import tensorflow as tf

#original dataset of 6 leaves
# input = [ndimage.imread("E:\leaves\dab1.jpg"),
#         ndimage.imread("E:\leaves\dab2.jpg"),
#        ndimage.imread("E:\leaves\dab3.jpg"),
#        ndimage.imread("E:\leaves\klon1.jpg"),
#        ndimage.imread("E:\leaves\klon2.jpg"),
#        ndimage.imread("E:\leaves\klon3.jpg")]

#normalize each image (originally uint8)
#input=[input/255 for i in range(len(input))

#temporary testing dataset, mimicking 6 images, each 3-channel, of dimension 256x256
input=[random.randn(256,256,3)]
       # random.randn(256, 256, 3),
       # random.randn(256, 256, 3),
       # random.randn(256, 256, 3),
       # random.randn(256, 256, 3),
       # random.randn(256, 256, 3)]

#each image belong to one of two classes
labels=[[1]]#,[1,0],[1,0],[0,1],[0,1],[0,1]]


def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.truncated_normal(shape, stddev=.1)
  return tf.Variable(initial)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

x = tf.placeholder(tf.float32, shape=[None, 256,256,3])
y_ = tf.placeholder(tf.float32, shape=[None, 1])

x_image = tf.reshape(x, [-1,256,256,3])

#first conv layer
W_conv1 = weight_variable([5,5, 3,8])
b_conv1 = bias_variable([8])
h_conv1 = tf.nn.elu(conv2d(x_image, W_conv1) + b_conv1)

#second conv layer
W_conv2 = weight_variable([5,5, 8,16])
b_conv2 = bias_variable([16])
h_conv2 = tf.nn.elu(conv2d(h_conv1, W_conv2) + b_conv2)

#first dense layer
W_fc1 = weight_variable([256*256*16, 10])
b_fc1 = bias_variable([10])
out_flat = tf.reshape(h_conv2, [-1, 256*256*16])
h_fc1 = tf.nn.elu(tf.matmul(out_flat, W_fc1) + b_fc1)

#second dense layer
W_fc2 = weight_variable([10, 1])
b_fc2 = bias_variable([1])
h_fc2 = tf.nn.elu(tf.matmul(h_fc1, W_fc2) + b_fc2)

#tried also with softmax with logits
cross_entropy=tf.losses.mean_squared_error(predictions=h_fc2, labels=y_)
train_step = tf.train.AdamOptimizer(1e-3).minimize(cross_entropy)

print("h2", h_fc2.shape)
print("y", y_.shape)

sess=tf.Session()
sess.run(tf.global_variables_initializer())
loss = []
for i in range(10):
    sess.run(train_step, feed_dict={x:input, y_:labels})
    input, labels = utils.shuffle(input, labels)
    loss.append(sess.run(cross_entropy, feed_dict={x:input, y_:labels}))
    print(i, " LOSS: ", loss[-1])

np.set_printoptions(precision=3, suppress=True)
for i in range(len(input)):
    print(labels[i], sess.run(h_fc2, feed_dict={x:[input[i]], y_:[labels[i]]}))

plt.plot(loss)
plt.show()

这里是我尝试过的清单：

上面的基本代码导致的损失几乎总是等于4.0
将训练时间扩展到100个时代。事实证明，实现持续损失的可能性增加了。这很奇怪，因为在我看来，在训练的早期，时期的数量应该改变任何东西。
我将特征图的数量更改为I层中的32个，II层中的64个和密集层中的100个神经元
因为我的输出是二进制的，所以最初我只使用单个输出。我将其更改为排除2个输出。损失变为2.5。原来，我的输出倾向于为[-1，-1]，而label为[1,0]
我尝试了各种学习率，从0.001到0.00005
我用标准偏差等于2而不是0.1初始化权重和偏差。损失似乎有所减少，但达到了很高的价值，如1e10。因此，我将纪元数从10更改为100 ..再次，损失从一开始就是2.5。回到10个时期后，损失仍然为2.5
我将数据集扩展到6个元素。损失与以前相同。

有谁知道，为什么会这样？据我所知，如果网络不能推广，损失不会减少而是增加/振荡，但不会保持恒定？

最佳答案

我找到了答案。该问题是由以下行引起的：

h_fc2 = tf.nn.elu(tf.matmul(h_fc1, W_fc2) + b_fc2)

我不知道为什么，但是它使输出等于-1。当我将其更改为

h_fc2 = f.matmul(h_fc1, W_fc2) + b_fc2

它像魅力一样运作，损失开始减少。谁能解释，为什么我们应该避免在最后一层中使用激活功能（我在前面提到的TensorFlow教程中看到了相同的问题）？我不明白，我想每一层都应该有自己的激活功能？

关于python - Tensorflow中的CNN-损失保持恒定，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/45577747/