问题描述
我从中获取了卷积神经网络(CNN)。它接受32 x 32图像,默认为10类。但是,我有500个类的64 x 64图像。当我传递64 x 64图像(批量大小恒定为32)时,出现以下错误。
I took this convolutional neural network (CNN) from here. It accepts 32 x 32 images and defaults to 10 classes. However, I have 64 x 64 images with 500 classes. When I pass in 64 x 64 images (batch size held constant at 32), I get the following error.
ValueError: Expected input batch_size (128) to match target batch_size (32).
堆栈跟踪从 loss = loss_fn(outputs,labels)这行开始
。 outputs.shape
是 [128,500]
和 labels.shape
是 [32]
。
The stack trace starts at the line loss = loss_fn(outputs, labels)
. The outputs.shape
is [128, 500]
and the labels.shape
is [32]
.
此处列出的代码是完整的。
The code is listed here for completeness.
class Unit(nn.Module):
def __init__(self,in_channels,out_channels):
super(Unit,self).__init__()
self.conv = nn.Conv2d(in_channels=in_channels,kernel_size=3,out_channels=out_channels,stride=1,padding=1)
self.bn = nn.BatchNorm2d(num_features=out_channels)
self.relu = nn.ReLU()
def forward(self,input):
output = self.conv(input)
output = self.bn(output)
output = self.relu(output)
return output
class SimpleNet(nn.Module):
def __init__(self,num_classes=10):
super(SimpleNet,self).__init__()
self.unit1 = Unit(in_channels=3,out_channels=32)
self.unit2 = Unit(in_channels=32, out_channels=32)
self.unit3 = Unit(in_channels=32, out_channels=32)
self.pool1 = nn.MaxPool2d(kernel_size=2)
self.unit4 = Unit(in_channels=32, out_channels=64)
self.unit5 = Unit(in_channels=64, out_channels=64)
self.unit6 = Unit(in_channels=64, out_channels=64)
self.unit7 = Unit(in_channels=64, out_channels=64)
self.pool2 = nn.MaxPool2d(kernel_size=2)
self.unit8 = Unit(in_channels=64, out_channels=128)
self.unit9 = Unit(in_channels=128, out_channels=128)
self.unit10 = Unit(in_channels=128, out_channels=128)
self.unit11 = Unit(in_channels=128, out_channels=128)
self.pool3 = nn.MaxPool2d(kernel_size=2)
self.unit12 = Unit(in_channels=128, out_channels=128)
self.unit13 = Unit(in_channels=128, out_channels=128)
self.unit14 = Unit(in_channels=128, out_channels=128)
self.avgpool = nn.AvgPool2d(kernel_size=4)
self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, self.unit5, self.unit6
,self.unit7, self.pool2, self.unit8, self.unit9, self.unit10, self.unit11, self.pool3,
self.unit12, self.unit13, self.unit14, self.avgpool)
self.fc = nn.Linear(in_features=128,out_features=num_classes)
def forward(self, input):
output = self.net(input)
output = output.view(-1,128)
output = self.fc(output)
return output
关于如何修改此CNN以接受并正确返回输出的任何想法?
Any ideas on how to modify this CNN to accept and properly return outputs?
推荐答案
问题是最后的重塑(视图)不兼容。
The problem is an incompatible reshape (view) at the end.
您正在使用一种扁平化最后,这与全局池不同。两者都对CNN有效,但是只有全局池才可以与任何图像大小兼容。
You're using a sort of "flattening" at the end, which is different from a "global pooling". Both are valid for CNNs, but only the global poolings are compatible with any image size.
,使用展平,您需要跟踪所有图像尺寸,以便知道如何在最后进行重塑。
In your case, with a flatten, you need to keep track of all image dimensions in order to know how to reshape at the end.
所以:
- 输入64x64
- Pool1至32x32
- Pool2至16x16
- Pool3至8x8
- AvgPool转换为2x2
- Enter with 64x64
- Pool1 to 32x32
- Pool2 to 16x16
- Pool3 to 8x8
- AvgPool to 2x2
然后,最后您得到的形状为(第128、2、2批)
。如果图像是32x32,则为最终数字的四倍。
Then, at the end you've got a shape of (batch, 128, 2, 2)
. Four times the final number if the image were 32x32.
然后,最终的重塑应为 output = output.view(-1,128 * 2 * 2)
。
Then, your final reshape should be output = output.view(-1,128*2*2)
.
这是一个具有不同分类层的不同网络,因为 in_features = 512
。
This is a different net with a different classification layer, though, because in_features=512
.
对于任何大于等于32的图像,都可以使用相同的模型,相同的层和相同的权重您用全局池替换了最后一个池:
On the other hand, you could use the same model, same layers and same weights for any image size >= 32 if you replace the last pooling with a global pooling:
def flatChannels(x):
size = x.size()
return x.view(size[0],size[1],size[2]*size[3])
def globalAvgPool2D(x):
return flatChannels(x).mean(dim=-1)
def globalMaxPool2D(x):
return flatChannels(x).max(dim=-1)
模型的结尾:
#removed the pool from here to put it in forward
self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4,
self.unit5, self.unit6, self.unit7, self.pool2, self.unit8,
self.unit9, self.unit10, self.unit11, self.pool3,
self.unit12, self.unit13, self.unit14)
self.fc = nn.Linear(in_features=128,out_features=num_classes)
def forward(self, input):
output = self.net(input)
output = globalAvgPool2D(output) #or globalMaxPool2D
output = self.fc(output)
return output
这篇关于如何修改此PyTorch卷积神经网络以接受64 x 64图像并正确输出预测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!