本文介绍了什么是“线性投影"?卷积神经网络中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读残余学习,有一个问题.3.2中提到的线性投影"是什么?看起来很简单,一旦知道了这一点,但没想到...

I am reading through Residual learning, and I have a question.What is "linear projection" mentioned in 3.2? Looks pretty simple once got this but could not get the idea...

我基本上不是计算机科学的人,所以如果有人为我提供一个简单的示例,我将不胜感激.

I am basically not a computer science person, so I would very appreciate if someone provide me a simple example.

推荐答案

首先,重要的是要了解xyF是什么以及它们为什么需要任何投影.我将尝试用简单的术语进行解释,但是需要对 ConvNets 的基本了解.

First up, it's important to understand what x, y and F are and why they need any projection at all. I'll try explain in simple terms, but basic understanding of ConvNets is required.

x是该层的输入数据(称为 tensor ),在ConvNets的情况下,其排名为4.您可以将其视为 4维数组. F通常是一个conv层(在本文中为conv+relu+batchnorm),而y将两者结合在一起(形成输出通道). F的结果也是第4级,并且除1以外,大多数尺寸与x中的尺寸相同.这正是转换应修补​​的内容.

x is an input data (called tensor) of the layer, in case of ConvNets it's rank is 4. You can think of it as a 4-dimensional array. F is usually a conv layer (conv+relu+batchnorm in this paper), and y combines the two together (forming the output channel). The result of F is also of rank 4, and most of dimensions will be the same as in x, except for one. That's exactly what the transformation should patch.

例如,x的形状可能是(64, 32, 32, 3),其中64是批处理大小,32x32是图像大小,3代表(R,G,B)颜色通道. F(x)可能是(64, 32, 32, 16):批处理大小不会更改,为简单起见,ResNet转换层也不会更改图像大小,但可能会使用其他数量的过滤器-16.

For example, x shape might be (64, 32, 32, 3), where 64 is the batch size, 32x32 is image size and 3 stands for (R, G, B) color channels. F(x) might be (64, 32, 32, 16): batch size never changes, for simplicity, ResNet conv-layer doesn't change the image size too, but will likely use a different number of filters - 16.

因此,为了使y=F(x)+x有效,必须将x从"(64, 32, 32, 3)"更改为"(64, 32, 32, 16)".

So, in order for y=F(x)+x to be a valid operation, x must be "reshaped" from (64, 32, 32, 3) to (64, 32, 32, 16).

我想在这里强调,重塑"不是numpy.reshape所做的.

I'd like to stress here that "reshaping" here is not what numpy.reshape does.

相反,x[3]用13个零填充,如下所示:

Instead, x[3] is padded with 13 zeros, like this:

pad(x=[1, 2, 3],padding=[7, 6]) = [0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0]

如果考虑一下,这是3维向量在16维上的投影.换句话说,我们开始认为我们的向量是相同的,但是那里还有13个维.其他x尺寸均未更改.

If you think about it, this is a projection of a 3-dimensional vector onto 16 dimensions. In other words, we start to think that our vector is the same, but there are 13 more dimensions out there. None of the other x dimensions are changed.

这是链接在Tensorflow中做到这一点.

Here's the link to the code in Tensorflow that does this.

这篇关于什么是“线性投影"?卷积神经网络中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-12 02:43