摘要

本文介绍我自研的下采样模块。本次改进的下采样模块是一种通用的改进方法,你可以用分类任务的主干网络中,也可以用在分割和超分的任务中。已经有粉丝用来改进ConvNext模型,取得了非常好的效果,配合一些其他的改进,发一篇CVPR、ECCV之类的顶会完全没有问题。

本次我将这个模块用来改进YoloV9,实现大幅度涨点。

自研下采样模块及其变种

第一种改进方法

将输入分成两个分支,一个分支用卷积,一个分支分成两部分,一部分用MaxPool,一部分用AvgPool。然后,在最后合并起来。代码如下:

import torch
import torch.nn as nn

def autopad(k, p=None, d=1):  # kernel, padding, dilation
    """Pad to 'same' shape outputs."""
    if d > 1:
        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p


class Conv(nn.Module):
    """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""

    default_act = nn.SiLU()  # default activation

    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
        """Initialize Conv layer with given arguments including activation."""
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()

    def forward(self, x):
        """Apply convolution, batch normalization and activation to input tensor."""
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):
        """Perform transposed convolution of 2D data."""
        return self.act(self.conv(x))

class DownSimper(nn.Module):
    """DownSimper."""

    def __init__(self, c1, c2):
        super().__init__()
        self.c = c2 // 2
        self.cv1 = Conv(c1, self.c, 3, 2, d=3)
        self.cv2 = Conv(c1, self.c, 1, 1, 0)

    def forward(self, x):
        x1 = self.cv1(x)
        x = self.cv2(x)
        x2, x3 = x.chunk(2, 1)
        x2 = torch.nn.functional.max_pool2d(x2, 3, 2, 1)
        x3 = torch.nn.functional.avg_pool2d(x3, 3, 2, 1)
        return torch.cat((x1, x2, x3), 1)

结构图:
YoloV9改进策略:下采样改进|自研下采样模块(独家改进)|疯狂涨点|附结构图-LMLPHP
左侧卷积中d=3,代表使用空洞卷积或者是膨胀卷积,可以带来更大的感受野。d=3,k=3等同卷积核为9.

YoloV9官方测试结果

yolov9 summary: 580 layers, 60567520 parameters, 0 gradients, 264.3 GFLOPs
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 15/15 00:02
                   all        230       1412      0.878      0.991      0.989      0.732
                   c17        230        131       0.92      0.992      0.994      0.797
                    c5        230         68      0.828          1      0.992      0.807
            helicopter        230         43      0.895      0.977      0.969      0.634
                  c130        230         85      0.955      0.999      0.994      0.684
                   f16        230         57      0.839      0.965      0.966      0.689
                    b2        230          2          1      0.978      0.995      0.647
                 other        230         86       0.91      0.942      0.957      0.525
                   b52        230         70      0.917      0.971      0.979      0.806
                  kc10        230         62      0.958      0.984      0.987      0.826
               command        230         40      0.964          1      0.995      0.815
                   f15        230        123      0.939      0.995      0.995      0.702
                 kc135        230         91      0.949      0.989      0.978      0.691
                   a10        230         27      0.863      0.963      0.982      0.458
                    b1        230         20      0.926          1      0.995      0.712
                   aew        230         25      0.929          1      0.993      0.812
                   f22        230         17      0.835          1      0.995      0.706
                    p3        230        105       0.97          1      0.995      0.804
                    p8        230          1      0.566          1      0.995      0.697
                   f35        230         32      0.908          1      0.995      0.547
                   f18        230        125      0.956      0.992      0.993      0.828
                   v22        230         41      0.921          1      0.995      0.682
                 su-27        230         31      0.925          1      0.994      0.832
                 il-38        230         27      0.899          1      0.995      0.816
                tu-134        230          1      0.346          1      0.995      0.895
                 su-33        230          2       0.96          1      0.995      0.747
                 an-70        230          2      0.718          1      0.995      0.796
                 tu-22        230         98      0.912          1      0.995      0.804

改进方法

测试结果

yolov9 summary: 595 layers, 58708576 parameters, 0 gradients, 274.0 GFLOPs
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 15/15 00:35
                   all        230       1412      0.952      0.974       0.99      0.738
                   c17        230        131      0.981      0.992      0.995      0.832
                    c5        230         68      0.963      0.985      0.995      0.847
            helicopter        230         43      0.968       0.93      0.972      0.635
                  c130        230         85      0.988      0.996      0.995      0.669
                   f16        230         57      0.976      0.947      0.975      0.687
                    b2        230          2      0.767          1      0.995      0.516
                 other        230         86      0.981      0.907      0.968      0.573
                   b52        230         70      0.969      0.971      0.985      0.812
                  kc10        230         62      0.986      0.984      0.989      0.835
               command        230         40      0.988          1      0.995       0.82
                   f15        230        123      0.965      0.992      0.989      0.697
                 kc135        230         91      0.984      0.989      0.981      0.725
                   a10        230         27          1      0.794      0.976      0.495
                    b1        230         20      0.979          1      0.995      0.682
                   aew        230         25      0.944          1      0.995      0.802
                   f22        230         17          1      0.881      0.992      0.717
                    p3        230        105      0.981      0.992      0.995       0.81
                    p8        230          1      0.756          1      0.995      0.697
                   f35        230         32       0.99      0.938      0.982       0.55
                   f18        230        125      0.981      0.992      0.991      0.829
                   v22        230         41       0.99          1      0.995      0.684
                 su-27        230         31      0.985          1      0.995      0.849
                 il-38        230         27      0.987          1      0.995       0.84
                tu-134        230          1      0.756          1      0.995      0.895
                 su-33        230          2       0.99          1      0.995      0.747
                 an-70        230          2      0.838          1      0.995      0.848
                 tu-22        230         98          1          1      0.995      0.832

04-19 06:50