问题描述
我正在阅读残余学习,有一个问题.3.2中提到的线性投影"是什么?看起来很简单,一旦知道了这一点,但没想到...
I am reading through Residual learning, and I have a question.What is "linear projection" mentioned in 3.2? Looks pretty simple once got this but could not get the idea...
我基本上不是计算机科学的人,所以如果有人为我提供一个简单的示例,我将不胜感激.
I am basically not a computer science person, so I would very appreciate if someone provide me a simple example.
推荐答案
首先,重要的是要了解x
,y
和F
是什么以及它们为什么需要任何投影.我将尝试用简单的术语进行解释,但是需要对 ConvNets 的基本了解.
First up, it's important to understand what x
, y
and F
are and why they need any projection at all. I'll try explain in simple terms, but basic understanding of ConvNets is required.
x
是该层的输入数据(称为 tensor ),在ConvNets的情况下,其排名为4.您可以将其视为 4维数组. F
通常是一个conv层(在本文中为conv+relu+batchnorm
),而y
将两者结合在一起(形成输出通道). F
的结果也是第4级,并且除1以外,大多数尺寸与x
中的尺寸相同.这正是转换应修补的内容.
x
is an input data (called tensor) of the layer, in case of ConvNets it's rank is 4. You can think of it as a 4-dimensional array. F
is usually a conv layer (conv+relu+batchnorm
in this paper), and y
combines the two together (forming the output channel). The result of F
is also of rank 4, and most of dimensions will be the same as in x
, except for one. That's exactly what the transformation should patch.
例如,x
的形状可能是(64, 32, 32, 3)
,其中64是批处理大小,32x32是图像大小,3代表(R,G,B)颜色通道. F(x)
可能是(64, 32, 32, 16)
:批处理大小不会更改,为简单起见,ResNet转换层也不会更改图像大小,但可能会使用其他数量的过滤器-16.
For example, x
shape might be (64, 32, 32, 3)
, where 64 is the batch size, 32x32 is image size and 3 stands for (R, G, B) color channels. F(x)
might be (64, 32, 32, 16)
: batch size never changes, for simplicity, ResNet conv-layer doesn't change the image size too, but will likely use a different number of filters - 16.
因此,为了使y=F(x)+x
有效,必须将x
从"(64, 32, 32, 3)
"更改为"(64, 32, 32, 16)
".
So, in order for y=F(x)+x
to be a valid operation, x
must be "reshaped" from (64, 32, 32, 3)
to (64, 32, 32, 16)
.
我想在这里强调,重塑"不是numpy.reshape
所做的.
I'd like to stress here that "reshaping" here is not what numpy.reshape
does.
相反,x[3]
用13个零填充,如下所示:
Instead, x[3]
is padded with 13 zeros, like this:
pad(x=[1, 2, 3],padding=[7, 6]) = [0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0]
如果考虑一下,这是3维向量在16维上的投影.换句话说,我们开始认为我们的向量是相同的,但是那里还有13个维.其他x
尺寸均未更改.
If you think about it, this is a projection of a 3-dimensional vector onto 16 dimensions. In other words, we start to think that our vector is the same, but there are 13 more dimensions out there. None of the other x
dimensions are changed.
这是链接在Tensorflow中做到这一点.
Here's the link to the code in Tensorflow that does this.
这篇关于什么是“线性投影"?卷积神经网络中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!