如何修改此PyTorch卷积神经网络以接受64 x 64图像并正确输出预测？

本文介绍了如何修改此PyTorch卷积神经网络以接受64 x 64图像并正确输出预测？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我从中获取了卷积神经网络（CNN）。它接受32 x 32图像，默认为10类。但是，我有500个类的64 x 64图像。当我传递64 x 64图像（批量大小恒定为32）时，出现以下错误。

I took this convolutional neural network (CNN) from here. It accepts 32 x 32 images and defaults to 10 classes. However, I have 64 x 64 images with 500 classes. When I pass in 64 x 64 images (batch size held constant at 32), I get the following error.


ValueError: Expected input batch_size (128) to match target batch_size (32).

堆栈跟踪从 loss = loss_fn（outputs，labels）这行开始 。 outputs.shape 是 [128，500] 和 labels.shape 是 [32] 。

The stack trace starts at the line loss = loss_fn(outputs, labels). The outputs.shape is [128, 500] and the labels.shape is [32].

此处列出的代码是完整的。

The code is listed here for completeness.

class Unit(nn.Module):
    def __init__(self,in_channels,out_channels):
        super(Unit,self).__init__()
        self.conv = nn.Conv2d(in_channels=in_channels,kernel_size=3,out_channels=out_channels,stride=1,padding=1)
        self.bn = nn.BatchNorm2d(num_features=out_channels)
        self.relu = nn.ReLU()

    def forward(self,input):
        output = self.conv(input)
        output = self.bn(output)
        output = self.relu(output)
        return output

class SimpleNet(nn.Module):
    def __init__(self,num_classes=10):
        super(SimpleNet,self).__init__()

        self.unit1 = Unit(in_channels=3,out_channels=32)
        self.unit2 = Unit(in_channels=32, out_channels=32)
        self.unit3 = Unit(in_channels=32, out_channels=32)

        self.pool1 = nn.MaxPool2d(kernel_size=2)

        self.unit4 = Unit(in_channels=32, out_channels=64)
        self.unit5 = Unit(in_channels=64, out_channels=64)
        self.unit6 = Unit(in_channels=64, out_channels=64)
        self.unit7 = Unit(in_channels=64, out_channels=64)

        self.pool2 = nn.MaxPool2d(kernel_size=2)

        self.unit8 = Unit(in_channels=64, out_channels=128)
        self.unit9 = Unit(in_channels=128, out_channels=128)
        self.unit10 = Unit(in_channels=128, out_channels=128)
        self.unit11 = Unit(in_channels=128, out_channels=128)

        self.pool3 = nn.MaxPool2d(kernel_size=2)

        self.unit12 = Unit(in_channels=128, out_channels=128)
        self.unit13 = Unit(in_channels=128, out_channels=128)
        self.unit14 = Unit(in_channels=128, out_channels=128)

        self.avgpool = nn.AvgPool2d(kernel_size=4)

        self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, self.unit5, self.unit6
                                 ,self.unit7, self.pool2, self.unit8, self.unit9, self.unit10, self.unit11, self.pool3,
                                 self.unit12, self.unit13, self.unit14, self.avgpool)

        self.fc = nn.Linear(in_features=128,out_features=num_classes)

    def forward(self, input):
        output = self.net(input)
        output = output.view(-1,128)
        output = self.fc(output)
        return output

关于如何修改此CNN以接受并正确返回输出的任何想法？

Any ideas on how to modify this CNN to accept and properly return outputs?

推荐答案

问题是最后的重塑（视图）不兼容。

The problem is an incompatible reshape (view) at the end.

您正在使用一种扁平化最后，这与全局池不同。两者都对CNN有效，但是只有全局池才可以与任何图像大小兼容。

You're using a sort of "flattening" at the end, which is different from a "global pooling". Both are valid for CNNs, but only the global poolings are compatible with any image size.

，使用展平，您需要跟踪所有图像尺寸，以便知道如何在最后进行重塑。

In your case, with a flatten, you need to keep track of all image dimensions in order to know how to reshape at the end.

所以：

输入64x64

Pool1至32x32

Pool2至16x16

Pool3至8x8

AvgPool转换为2x2

Enter with 64x64
Pool1 to 32x32
Pool2 to 16x16
Pool3 to 8x8
AvgPool to 2x2

然后，最后您得到的形状为（第128、2、2批）。如果图像是32x32，则为最终数字的四倍。

Then, at the end you've got a shape of (batch, 128, 2, 2). Four times the final number if the image were 32x32.

然后，最终的重塑应为 output = output.view（-1,128 * 2 * 2）。

Then, your final reshape should be output = output.view(-1,128*2*2).

这是一个具有不同分类层的不同网络，因为 in_features = 512 。

This is a different net with a different classification layer, though, because in_features=512.

对于任何大于等于32的图像，都可以使用相同的模型，相同的层和相同的权重您用全局池替换了最后一个池：

On the other hand, you could use the same model, same layers and same weights for any image size >= 32 if you replace the last pooling with a global pooling:

def flatChannels(x):
    size = x.size()
    return x.view(size[0],size[1],size[2]*size[3])

def globalAvgPool2D(x):        
    return flatChannels(x).mean(dim=-1)

def globalMaxPool2D(x):
    return flatChannels(x).max(dim=-1)

模型的结尾：

    #removed the pool from here to put it in forward
    self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, 
                             self.unit5, self.unit6, self.unit7, self.pool2, self.unit8, 
                             self.unit9, self.unit10, self.unit11, self.pool3, 
                             self.unit12, self.unit13, self.unit14)

    self.fc = nn.Linear(in_features=128,out_features=num_classes)


def forward(self, input):
    output = self.net(input)
    output = globalAvgPool2D(output) #or globalMaxPool2D
    output = self.fc(output)
    return output

这篇关于如何修改此PyTorch卷积神经网络以接受64 x 64图像并正确输出预测？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！