本文介绍了如何修改此PyTorch卷积神经网络以接受64 x 64图像并正确输出预测?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从中获取了卷积神经网络(CNN)。它接受32 x 32图像,默认为10类。但是,我有500个类的64 x 64图像。当我传递64 x 64图像(批量大小恒定为32)时,出现以下错误。

I took this convolutional neural network (CNN) from here. It accepts 32 x 32 images and defaults to 10 classes. However, I have 64 x 64 images with 500 classes. When I pass in 64 x 64 images (batch size held constant at 32), I get the following error.


ValueError: Expected input batch_size (128) to match target batch_size (32).

堆栈跟踪从 loss = loss_fn(outputs,labels)这行开始 outputs.shape [128,500] labels.shape [32]

The stack trace starts at the line loss = loss_fn(outputs, labels). The outputs.shape is [128, 500] and the labels.shape is [32].

此处列出的代码是完整的。

The code is listed here for completeness.

class Unit(nn.Module):
    def __init__(self,in_channels,out_channels):
        super(Unit,self).__init__()
        self.conv = nn.Conv2d(in_channels=in_channels,kernel_size=3,out_channels=out_channels,stride=1,padding=1)
        self.bn = nn.BatchNorm2d(num_features=out_channels)
        self.relu = nn.ReLU()

    def forward(self,input):
        output = self.conv(input)
        output = self.bn(output)
        output = self.relu(output)
        return output

class SimpleNet(nn.Module):
    def __init__(self,num_classes=10):
        super(SimpleNet,self).__init__()

        self.unit1 = Unit(in_channels=3,out_channels=32)
        self.unit2 = Unit(in_channels=32, out_channels=32)
        self.unit3 = Unit(in_channels=32, out_channels=32)

        self.pool1 = nn.MaxPool2d(kernel_size=2)

        self.unit4 = Unit(in_channels=32, out_channels=64)
        self.unit5 = Unit(in_channels=64, out_channels=64)
        self.unit6 = Unit(in_channels=64, out_channels=64)
        self.unit7 = Unit(in_channels=64, out_channels=64)

        self.pool2 = nn.MaxPool2d(kernel_size=2)

        self.unit8 = Unit(in_channels=64, out_channels=128)
        self.unit9 = Unit(in_channels=128, out_channels=128)
        self.unit10 = Unit(in_channels=128, out_channels=128)
        self.unit11 = Unit(in_channels=128, out_channels=128)

        self.pool3 = nn.MaxPool2d(kernel_size=2)

        self.unit12 = Unit(in_channels=128, out_channels=128)
        self.unit13 = Unit(in_channels=128, out_channels=128)
        self.unit14 = Unit(in_channels=128, out_channels=128)

        self.avgpool = nn.AvgPool2d(kernel_size=4)

        self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, self.unit5, self.unit6
                                 ,self.unit7, self.pool2, self.unit8, self.unit9, self.unit10, self.unit11, self.pool3,
                                 self.unit12, self.unit13, self.unit14, self.avgpool)

        self.fc = nn.Linear(in_features=128,out_features=num_classes)

    def forward(self, input):
        output = self.net(input)
        output = output.view(-1,128)
        output = self.fc(output)
        return output

关于如何修改此CNN以接受并正确返回输出的任何想法?

Any ideas on how to modify this CNN to accept and properly return outputs?

推荐答案

问题是最后的重塑(视图)不兼容。

The problem is an incompatible reshape (view) at the end.

您正在使用一种扁平化最后,这与全局池不同。两者都对CNN有效,但是只有全局池才可以与任何图像大小兼容。

You're using a sort of "flattening" at the end, which is different from a "global pooling". Both are valid for CNNs, but only the global poolings are compatible with any image size.

,使用展平,您需要跟踪所有图像尺寸,以便知道如何在最后进行重塑。

In your case, with a flatten, you need to keep track of all image dimensions in order to know how to reshape at the end.

所以:


  • 输入64x64

  • Pool1至32x32

  • Pool2至16x16

  • Pool3至8x8

  • AvgPool转换为2x2

  • Enter with 64x64
  • Pool1 to 32x32
  • Pool2 to 16x16
  • Pool3 to 8x8
  • AvgPool to 2x2

然后,最后您得到的形状为(第128、2、2批)。如果图像是32x32,则为最终数字的四倍。

Then, at the end you've got a shape of (batch, 128, 2, 2). Four times the final number if the image were 32x32.

然后,最终的重塑应为 output = output.view(-1,128 * 2 * 2)

Then, your final reshape should be output = output.view(-1,128*2*2).

这是一个具有不同分类层的不同网络,因为 in_features = 512

This is a different net with a different classification layer, though, because in_features=512.

对于任何大于等于32的图像,都可以使用相同的模型,相同的层和相同的权重您用全局池替换了最后一个池:

On the other hand, you could use the same model, same layers and same weights for any image size >= 32 if you replace the last pooling with a global pooling:

def flatChannels(x):
    size = x.size()
    return x.view(size[0],size[1],size[2]*size[3])

def globalAvgPool2D(x):        
    return flatChannels(x).mean(dim=-1)

def globalMaxPool2D(x):
    return flatChannels(x).max(dim=-1)

模型的结尾:

    #removed the pool from here to put it in forward
    self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, 
                             self.unit5, self.unit6, self.unit7, self.pool2, self.unit8, 
                             self.unit9, self.unit10, self.unit11, self.pool3, 
                             self.unit12, self.unit13, self.unit14)

    self.fc = nn.Linear(in_features=128,out_features=num_classes)


def forward(self, input):
    output = self.net(input)
    output = globalAvgPool2D(output) #or globalMaxPool2D
    output = self.fc(output)
    return output

这篇关于如何修改此PyTorch卷积神经网络以接受64 x 64图像并正确输出预测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-12 02:44