本文介绍了卷积神经网络为所有标签输出相等的概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在训练MNIST上的CNN,并且随着训练的进行,输出概率(softmax)给出[0.1,0.1,...,0.1].初始值不统一,所以我不知道我在这里做些蠢事吗?

I am currently training a CNN on MNIST, and the output probabilities (softmax) are giving [0.1,0.1,...,0.1] as training goes on. The initial values aren't uniform, so I can't figure out if I'm doing something stupid here?

我只训练15个步骤,只是为了了解训练的进度;即使这个数字很小,我也不认为这应该导致一致的预测吗?

I'm only training for 15 steps, just to see how training progresses; even though that's a low number, I don't think that should result in uniform predictions?

import numpy as np
import tensorflow as tf
import imageio

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')

# Getting data

from sklearn.model_selection import train_test_split
def one_hot_encode(data):
    new_ = []
    for i in range(len(data)):
        _ = np.zeros([10],dtype=np.float32)
        _[int(data[i])] = 1.0
        new_.append(np.asarray(_))
    return new_

data = np.asarray(mnist["data"],dtype=np.float32)
labels = np.asarray(mnist["target"],dtype=np.float32)
labels = one_hot_encode(labels)
tr_data,test_data,tr_labels,test_labels = train_test_split(data,labels,test_size = 0.1)
tr_data = np.asarray(tr_data)
tr_data = np.reshape(tr_data,[len(tr_data),28,28,1])
test_data = np.asarray(test_data)
test_data = np.reshape(test_data,[len(test_data),28,28,1])
tr_labels = np.asarray(tr_labels)
test_labels = np.asarray(test_labels)

def get_conv(x,shape):
    weights = tf.Variable(tf.random_normal(shape,stddev=0.05))
    biases = tf.Variable(tf.random_normal([shape[-1]],stddev=0.05))
    conv = tf.nn.conv2d(x,weights,[1,1,1,1],padding="SAME")
    return tf.nn.relu(tf.nn.bias_add(conv,biases))

def get_pool(x,shape):
    return tf.nn.max_pool(x,ksize=shape,strides=shape,padding="SAME")

def get_fc(x,shape):
    sh = x.get_shape().as_list()
    dim = 1
    for i in sh[1:]:
        dim *= i
    x = tf.reshape(x,[-1,dim])
    weights = tf.Variable(tf.random_normal(shape,stddev=0.05))
    return tf.nn.relu(tf.matmul(x,weights) + tf.Variable(tf.random_normal([shape[1]],stddev=0.05)))

#Creating model

x = tf.placeholder(tf.float32,shape=[None,28,28,1])
y = tf.placeholder(tf.float32,shape=[None,10])

conv1_1 = get_conv(x,[3,3,1,128])
conv1_2 = get_conv(conv1_1,[3,3,128,128])
pool1 = get_pool(conv1_2,[1,2,2,1])

conv2_1 = get_conv(pool1,[3,3,128,512])
conv2_2 = get_conv(conv2_1,[3,3,512,512])
pool2 = get_pool(conv2_2,[1,2,2,1])

conv3_1 = get_conv(pool2,[3,3,512,1024])
conv3_2 = get_conv(conv3_1,[3,3,1024,1024])
conv3_3 = get_conv(conv3_2,[3,3,1024,1024])
conv3_4 = get_conv(conv3_3,[3,3,1024,1024])
pool3 = get_pool(conv3_4,[1,3,3,1])

fc1 = get_fc(pool3,[9216,1024])
fc2 = get_fc(fc1,[1024,10])

softmax = tf.nn.softmax(fc2)
loss = tf.losses.softmax_cross_entropy(logits=fc2,onehot_labels=y)
train_step = tf.train.AdamOptimizer().minimize(loss)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for i in range(15):
    print(i)
    indices = np.random.randint(len(tr_data),size=[200])
    batch_data = tr_data[indices]
    batch_labels = tr_labels[indices]
    sess.run(train_step,feed_dict={x:batch_data,y:batch_labels})

非常感谢您.

推荐答案

您的代码存在多个问题,包括基本问题.我强烈建议您首先阅读MNIST的Tensorflow分步教程, MNIST对于ML初学者深入MNIST专家.

There are several issues with your code, including elementary ones. I strongly suggest you first go through the Tensorflow step-by-step tutorials for MNIST, MNIST For ML Beginners and Deep MNIST for Experts.

简而言之,关于您的代码:

In short, regarding your code:

首先,您的最后一层fc2应该具有ReLU激活.

First, your final layer fc2 should not have a ReLU activation.

第二,构建批处理的方式,即

Second, the way you build your batches, i.e.

indices = np.random.randint(len(tr_data),size=[200])

是在每次迭代中仅获取随机样本,这与正确的方法相去甚远...

is by just grabbing random samples in each iteration, which is far from the correct way of doing so...

第三,您馈入网络的数据未在[0,1]中进行规范化,因为它们应该是:

Third, the data you feed into the network are not normalized in [0, 1], as they should be:

np.max(tr_data[0]) # get the max value of your first training sample
# 255.0


第三点最初也让我感到困惑,因为在前面提到的Tensorflow教程中,它们似乎也没有对数据进行规范化.但是仔细检查发现了原因:如果您通过Tensorflow提供的实用程序功能(而不是scikit-learn的功能,如您在此处所做的操作)导入MNIST数据,则它们已经在[0,1]中进行了标准化,这无处不在提示:


The third point was initially puzzling for me, too, since in the aforementioned Tensorflow tutorials they don't seem to normalize the data either. But close inspection revealed the reason: if you import the MNIST data through the Tensorflow-provided utility functions (instead of the scikit-learn ones, as you do here), they come already normalized in [0, 1], something that is nowhere hinted at:

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
np.max(mnist.train.images[0])
# 0.99607849

这是一个公认的奇怪的设计决定-据我所知,在所有其他类似情况/教程中,对输入数据进行规范化是管道的显式部分(请参见例如 Keras示例),并且有充分的理由(肯定会做到这一点)以后,当您使用自己的数据时.

This is an admittedly strange design decision - as far as I am aware of, in all other similar cases/tutorials normalizing the input data is an explicit part of the pipeline (see e.g. the Keras example), and with good reason (it is something you will be certainly expected to do yourself later, when using your own data).

这篇关于卷积神经网络为所有标签输出相等的概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-12 03:04