火炬，“大小不匹配"在随机梯度函数训练中

本文介绍了火炬，“大小不匹配"在随机梯度函数训练中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在Torch7中实现一个深层神经网络，其中的数据集由两个torch.Tensor()对象组成.第一个由12个元素(completeTable)组成，另一个由1个元素(presentValue)组成.每个数据集行都是这两个张量的数组:

I'm implementing a deep neural network in Torch7 with a dataset made of two torch.Tensor() objects.The first is made of 12 elements (completeTable), the other one is made of 1 element (presentValue).Each dataset row is an array of these two tensors:

dataset[p] = {torch.Tensor(completeTable[p]), torch.Tensor(presentValue)};

一切都对神经网络进行训练和测试.但是现在我只想切换并使用completeTable的12个元素中的一半，只有6个元素(firstChromRegionProfile).

Everything works for the neural network training and testing.But now I want to switch and use only half of the 12 elements of completeTable, that are only 6 elements (firstChromRegionProfile).

dataset_firstChromRegion[p] = {torch.Tensor(firstChromRegionProfile), torch.Tensor(presentValue)};

如果我使用这个新数据集运行相同的神经网络体系结构，它将无法正常工作. 它说Trainer:train(dataset_firstChromRegion)函数由于大小不匹配"而无法工作.

If I run the same neural network architecture with this new dataset, it does not work. It says that the trainer:train(dataset_firstChromRegion) function cannot work because of "size mismatch".

这是我的神经网络功能:

Here's my neural network function:

-- Neural network application
function neuralNetworkApplication(input_number, output_number, datasetTrain, datasetTest, dropOutFlag, hiddenUnits, hiddenLayers)

      require "nn"
    -- act_function = nn.Sigmoid();
    act_function = nn.Tanh();

    print('input_number '.. input_number);
    print('output_number '.. output_number);

      -- NEURAL NETWORK CREATION - <START>

      perceptron=nn.Sequential();  -- make a multi-layer perceptron
      perceptron:add(nn.Linear(input_number, hiddenUnits));
      perceptron:add(act_function);
      if dropOutFlag==TRUE then perceptron:add(nn.Dropout()) end  -- DROPOUT

        -- we add w layers DEEP LEARNING
      for w=0, hiddenLayers do
          perceptron:add(nn.Linear(hiddenUnits,hiddenUnits)) -- DEEP LEARNING layer
          perceptron:add(act_function); -- DEEP LEARNING
          if dropOutFlag==TRUE then
        perceptron:add(nn.Dropout())  -- DROPOUT
          end
      end

    print('\n#datasetTrain '.. #datasetTrain);
    print('#datasetTrain[1] '.. #datasetTrain[1]);
    print('(#datasetTrain[1][1])[1] '..(#datasetTrain[1][1])[1]);
    print('\n#datasetTest '.. #datasetTest);
    print('#datasetTest[1] '.. #datasetTest[1]);
    print('(#datasetTest[1][1])[1] '..(#datasetTest[1][1])[1]);

      perceptron:add(nn.Linear(hiddenUnits, output_number));
      perceptron:add(act_function);

      criterion = nn.MSECriterion();  -- MSE: Mean Square Error
      trainer = nn.StochasticGradient(perceptron, criterion)
      trainer.learningRate = LEARNING_RATE_CONST;
      trainer:train(datasetTrain);

      idp=3;
      predValueVector={}
      for i=1,(#datasetTest) do
        pred=perceptron:forward(datasetTest[i][1]);  -- get the prediction of the perceptron
        predValueVector[i]=pred[1];
      end

      -- NEURAL NETWORK CREATION - <END>

    return predValueVector;

end

这是错误日志:

input_number 6
output_number 1

#datasetTrain 13416
#datasetTrain[1] 2
(#datasetTrain[1][1])[1] 6

#datasetTest 3354
#datasetTest[1] 2
(#datasetTest[1][1])[1] 6
# StochasticGradient: training
/mnt/work1/software/torch/7/bin/luajit: /mnt/work1/software/torch/7/share/lua/5.1/nn/Linear.lua:71: size mismatch
stack traceback:
    [C]: in function 'addmv'
    /mnt/work1/software/torch/7/share/lua/5.1/nn/Linear.lua:71: in function 'updateGradInput'
    /mnt/work1/software/torch/7/share/lua/5.1/nn/Sequential.lua:36: in function 'updateGradInput'
    ...software/torch/7/share/lua/5.1/nn/StochasticGradient.lua:37: in function 'train'
    siamese_neural_network.lua:278: in function 'neuralNetworkApplication'
    siamese_neural_network.lua:223: in function 'kfold_cross_validation_separate'
    siamese_neural_network.lua:753: in main chunk
    [C]: in function 'dofile'
    ...1/software/torch/7/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x004057d0

推荐答案

所有激活层都共享相同的nn.Tanh()对象.那就是问题所在.尝试这样的事情:

All of your activation layers share the same nn.Tanh() object. That is the problem. Try something like this instead:

act_function = nn.Tanh
perceptron:add( act_function() )

为什么?

要执行反向传播步骤，我们必须计算w.r.t层的梯度.它的输入.在我们的情况下:

To perform a backward propagation step, we have to compute a gradient of the layer w.r.t. its input. In our case:

可以注意到图层前进步骤的 tanh(输入) = 输出.您可以将此输出存储在图层中，并在向后传递期间使用它以加快训练速度.这正是nn库内部发生的情况:

One can notice that tanh(input) = output of the layer's forward step. You can store this output inside the layer and use it during backward pass to speed up training. This is exactly what happens inside nn library:

// torch/nn/generic/Tanh.c/Tanh_updateGradInput:

for(i = 0; i < THTensor_(nElement)(gradInput); i++)
    {
        real z = ptr_output[i];
        ptr_gradInput[i] = ptr_gradOutput[i] * (1. - z*z);
    }

激活层的输出大小不匹配，因此会发生错误.即使这样做，也会导致错误的结果.

Output sizes of your activation layers don't match, so error occurs. Even if they did, it would lead to wrong result.

对不起，我的英语.

这篇关于火炬，“大小不匹配"在随机梯度函数训练中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！