如何计算卷积神经网络的参数数量?

本文介绍了如何计算卷积神经网络的参数数量?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我无法提供正确数量的 AlexNet 或 VGG Net .

例如，要计算VGG Net的conv3-256层的参数数，答案为0.59M =(3 * 3)*(256 * 256)，即(内核大小)*(层中两个通道的数量)，但是以这种方式，我无法获取138M参数.

那么您能告诉我我的计算出了什么问题，或者告诉我正确的计算程序吗?

解决方案

如果您引用的是带有16层的VGG Net(表1，D列)，则138M是指参数总数该网络的强"，即包括所有卷积层，还包括完全连接的层.

看由3 x conv3-256层组成的第三次卷积阶段:

第一个具有N = 128个输入平面和F = 256个输出平面，
另外两个具有N = 256个输入平面和F = 256个输出平面.

对于这些层中的每一层，卷积内核都是3x3.就参数而言，这给出了:

第一个参数为128x3x3x256(权重)+ 256(偏置)= 295,168个参数，
其他两个参数的256x3x3x256(权重)+ 256(偏置)= 590,080.

如上所述，您必须对所有层(也包括完全连接的层)执行此操作，并对这些值求和以获得最终的138M编号.

更新:各层之间的细分为:

conv3-64  x 2       : 38,720
conv3-128 x 2       : 221,440
conv3-256 x 3       : 1,475,328
conv3-512 x 3       : 5,899,776
conv3-512 x 3       : 7,079,424
fc1                 : 102,764,544
fc2                 : 16,781,312
fc3                 : 4,097,000
TOTAL               : 138,357,544

特别是对于完全连接的层(fc):

 fc1 (x): (512x7x7)x4,096 (weights) + 4,096 (biases)
 fc2    : 4,096x4,096     (weights) + 4,096 (biases)
 fc3    : 4,096x1,000     (weights) + 1,000 (biases)

(x)参见文章3.2，首先将完全连接的层转换为卷积层(第一个FC层转换为7×7转换层，最后两个FC层转换为1×1转换层).

关于fc1

的详细信息

在送入完全连接的层之前，要精确到高于空间分辨率的水平，为7x7像素.这是因为此VGG Net会在卷积之前使用 spatial padding ，如本文第2.1节所述:

[...]转换的空间填充层输入是这样的，即在卷积后保留空间分辨率，即3×3转换的填充为1个像素.层.

使用这种填充并处理224x224像素的输入图像后，分辨率在以下层级上依次降低:在具有512个特征图的最后一个卷积/合并阶段之后，层级分别为112x112、56x56、28x28、14x14和7x7. >

这提供了传递给fc1的尺寸为512x7x7的特征向量.

I can't give the correct number of parameters of AlexNet or VGG Net.

For example, to calculate the number of parameters of a conv3-256 layer of VGG Net, the answer is 0.59M = (3*3)*(256*256), that is (kernel size) * (product of both number of channels in the joint layers), however in that way, I can't get the 138M parameters.

So could you please show me where is wrong with my calculation, or show me the right calculation procedure?

解决方案

If you refer to VGG Net with 16-layer (table 1, column D) then 138M refers to the total number of parameters of this network, i.e including all convolutional layers, but also the fully connected ones.

Looking at the 3rd convolutional stage composed of 3 x conv3-256 layers:

the first one has N=128 input planes and F=256 output planes,
the two other ones have N=256 input planes and F=256 output planes.

The convolution kernel is 3x3 for each of these layers. In terms of parameters this gives:

128x3x3x256 (weights) + 256 (biases) = 295,168 parameters for the 1st one,
256x3x3x256 (weights) + 256 (biases) = 590,080 parameters for the two other ones.

As explained above you have to do that for all layers, but also the fully-connected ones, and sum these values to obtain the final 138M number.

UPDATE: the breakdown among layers give:

conv3-64  x 2       : 38,720
conv3-128 x 2       : 221,440
conv3-256 x 3       : 1,475,328
conv3-512 x 3       : 5,899,776
conv3-512 x 3       : 7,079,424
fc1                 : 102,764,544
fc2                 : 16,781,312
fc3                 : 4,097,000
TOTAL               : 138,357,544

In particular for the fully-connected layers (fc):

 fc1 (x): (512x7x7)x4,096 (weights) + 4,096 (biases)
 fc2    : 4,096x4,096     (weights) + 4,096 (biases)
 fc3    : 4,096x1,000     (weights) + 1,000 (biases)

(x) see section 3.2 of the article: the fully-connected layers are first converted to convolutional layers (the first FC layer to a 7 × 7 conv. layer, the last two FC layers to 1 × 1 conv. layers).

Details about fc1

As precised above the spatial resolution right before feeding the fully-connected layers is 7x7 pixels. This is because this VGG Net uses spatial padding before convolutions, as detailed within section 2.1 of the paper:

[...] the spatial padding of conv. layer input is such that the spatial resolution is preserved after convolution, i.e. the padding is 1 pixel for 3×3 conv. layers.

With such a padding, and working with a 224x224 pixels input image, the resolution decreases as follow along the layers: 112x112, 56x56, 28x28, 14x14 and 7x7 after the last convolution/pooling stage which has 512 feature maps.

This gives a feature vector passed to fc1 with dimension: 512x7x7.

这篇关于如何计算卷积神经网络的参数数量?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！