摘要

本文讲解如何实现VGGNet的剪枝操作。剪枝的原理:在BN层网络中加入稀疏因子,训练使得BN层稀疏化,对稀疏训练的后的模型中所有BN层权重进行统计排序,获取指定保留BN层数量即取得排序后权重阈值thres。遍历模型中的BN层权重,制作各层mask(权重>thres值为1,权重<thres值为0)。剪枝操作,根据各层的mask构建新模型结构(各层保留的通道数),获取BN层权重mask非零值的索引,非零索引对应的原始conv层、BN层、linear层各通道的权重、偏置等值赋值给新模型各层。加载剪枝后模型,进行fine-tune。

通过本文你可以学到:
1、如何使用VGGNet训练模型。
2、如何使用VGGNet稀疏训练模型。
3、如何实现剪枝,已及保存剪枝模型和使用剪枝模型预测等操作。
4、如何微调剪枝模型。

剪枝流程分为:
第一步、使用VGGNet训练模型。保存训练结果,方便将来的比对!
第二步、在BN层网络中加入稀疏因子,训练模型。
第三步、剪枝操作。
第四步、fine-tune模型,提高模型的ACC。

接下来,我们一起实现对VGGNet的剪枝。

项目结构

Slimming_Demo
 ├─checkpoints
 │  ├─vgg
 │  ├─vgg_pruned
 │  └─vgg_sp
 ├─data
 │  ├─train
 │  │  ├─Black-grass
 │  │  ├─Charlock
 │  │  ├─Cleavers
 │  │  ├─Common Chickweed
 │  │  ├─Common wheat
 │  │  ├─Fat Hen
 │  │  ├─Loose Silky-bent
 │  │  ├─Maize
 │  │  ├─Scentless Mayweed
 │  │  ├─Shepherds Purse
 │  │  ├─Small-flowered Cranesbill
 │  │  └─Sugar beet
 │  └─val
 │      ├─Black-grass
 │      ├─Charlock
 │      ├─Cleavers
 │      ├─Common Chickweed
 │      ├─Common wheat
 │      ├─Fat Hen
 │      ├─Loose Silky-bent
 │      ├─Maize
 │      ├─Scentless Mayweed
 │      └─Shepherds Purse
 ├─vgg.py
 ├─train.py
 ├─train_sp.py
 ├─prune.py
 └─train_prune.py

train.py:训练脚本,训练VGGNet原始模型
vgg.py:模型脚本
train_sp.py:稀疏训练脚本。
prune.py:模型剪枝脚本。
train_prune.py:微调模型脚本。

测试结果

VGGNet模型

VGGNet在原有的基础做了修改,增加了模型的cfg,定义了每层卷积的channel,“M”代表池化。

import torch
import torch.nn as nn
from torch.autograd import Variable
import math  # init


class VGG(nn.Module):

    def __init__(self, num_classes, init_weights=True, cfg=None):
        super(VGG, self).__init__()
        if cfg is None:
            cfg = [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 'M']
        self.feature = self.make_layers(cfg, True)
        print(self.feature)
        self.classifier = nn.Linear(cfg[-2], num_classes)
        if init_weights:
            self._initialize_weights()

    def make_layers(self, cfg, batch_norm=False):
        layers = []
        in_channels = 3
        for v in cfg:
            if v == 'M':
                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
            else:
                conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1, bias=False)
                if batch_norm:
                    layers += [conv2d, nn.BatchNorm2d(v), nn.LeakyReLU(inplace=True)]
                else:
                    layers += [conv2d, nn.LeakyReLU(inplace=True)]
                in_channels = v
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.feature(x)
        x = nn.AdaptiveAvgPool2d(1)(x)
        x = x.view(x.size(0), -1)
        y = self.classifier(x)
        return y

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(0.5)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                m.weight.data.normal_(0, 0.01)
                m.bias.data.zero_()


if __name__ == '__main__':
    net = VGG(12)
    print(net)
    x = Variable(torch.FloatTensor(1, 3, 224, 224))
    y = net(x)
    print(y.data.shape)

需要安装的库

1、tim库

pip install timm

2、sklearn

pip install -U scikit-learn

3、tensorboard

pip install tensorboard

训练VGGNet

新建train.py脚本。接下来,详解train.py脚本中的代码。

导入项目所需要的库

import torch.optim as optim
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.optim
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
from sklearn.metrics import classification_report
from vgg import VGG
import os
from torchvision import datasets
import json
import matplotlib.pyplot as plt
from timm.data.mixup import Mixup
from timm.loss import SoftTargetCrossEntropy
import warnings
from torch.utils.tensorboard import SummaryWriter
warnings.filterwarnings("ignore")

设置随机因子

设置随机因子后,再次训练时,图像的加载顺序不会改变,能够更好的复现训练结果。代码如下:

def seed_everything(seed=42):
    os.environ['PYHTONHASHSEED'] = str(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True

BN层可视化

使用tensorboard 可视化BN的状态,方便对比!


writer = SummaryWriter(comment='vgg')

def showBN(model):
    # =============== show bn weights ===================== #
    module_list = []
    module_bias_list = []
    for i, layer in model.named_modules():
        if isinstance(layer, nn.BatchNorm2d) :
            bnw = layer.state_dict()['weight']
            bnb = layer.state_dict()['bias']
            module_list.append(bnw)
            module_bias_list.append(bnb)
    size_list = [idx.data.shape[0] for idx in module_list]
    bn_weights = torch.zeros(sum(size_list))
    bnb_weights = torch.zeros(sum(size_list))
    index = 0
    for idx, size in enumerate(size_list):
        bn_weights[index:(index + size)] = module_list[idx].data.abs().clone()
        bnb_weights[index:(index + size)] = module_bias_list[idx].data.abs().clone()
        index += size
    print("bn_weights:", torch.sort(bn_weights))
    print("bn_bias:", torch.sort(bnb_weights))
    writer.add_histogram('bn_weights/hist', bn_weights.numpy(), epoch, bins='doane')
    writer.add_histogram('bn_bias/hist', bnb_weights.numpy(), epoch, bins='doane')

定义全局参数

全局参数包括学习率、批大小、轮数、类别数量等一些模型用到的超参数。

if __name__ == '__main__':
    # 创建保存模型的文件夹
    file_dir = 'checkpoints/vgg'
    if os.path.exists(file_dir):
        print('true')
        os.makedirs(file_dir, exist_ok=True)
    else:
        os.makedirs(file_dir)
    # 设置全局参数
    model_lr = 1e-4
    BATCH_SIZE = 16
    EPOCHS = 300
    DEVICE = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
    classes = 12
    resume = None
    Best_ACC = 0  # 记录最高得分
    SEED = 42
    seed_everything(42)
    start_epoch = 1

数据增强

  t = [transforms.CenterCrop((224, 224)), transforms.GaussianBlur(kernel_size=(5, 5), sigma=(0.1, 3.0)),
         transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5)]
    # 数据预处理7
    transform = transforms.Compose([
        transforms.AutoAugment(),
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.48214436, 0.42969334, 0.33318862], std=[0.2642221, 0.23746745, 0.21696019])

    ])
    transform_test = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.48214436, 0.42969334, 0.33318862], std=[0.2642221, 0.23746745, 0.21696019])
    ])
    mixup_fn = Mixup(
        mixup_alpha=0.8, cutmix_alpha=1.0, cutmix_minmax=None,
        prob=0.1, switch_prob=0.5, mode='batch',
        label_smoothing=0.1, num_classes=classes)

增强选用了AutoAugment,自动增强,默认是ImageNet的增强。在实际的项目中,需要选用适合的方式来增强图像数据。过多增强或带来负面的影响。

声明MIxUp函数,Mix是一种非常有效的增强方法。加入之后不仅可以提高ACC,对泛化性也有很大的提高。

这里注意下Resize的大小,由于选用的ResNet模型输入是224×224的大小,所以要Resize为224×224。

加载数据

 # 读取数据
    dataset_train = datasets.ImageFolder('data/train', transform=transform)
    dataset_test = datasets.ImageFolder("data/val", transform=transform_test)
    with open('class.txt', 'w',encoding='utf-8') as file:
        file.write(str(dataset_train.class_to_idx))
    with open('class.json', 'w', encoding='utf-8') as file:
        file.write(json.dumps(dataset_train.class_to_idx))
        # 导入数据
    train_loader = torch.utils.data.DataLoader(dataset_train, batch_size=BATCH_SIZE, pin_memory=True, shuffle=True,
                                                   drop_last=True)
    test_loader = torch.utils.data.DataLoader(dataset_test, batch_size=BATCH_SIZE, pin_memory=True, shuffle=False)

加载训练集和验证集的数据,并将class_to_idx保存。然后分别声明训练和验证的DataLoader。

定义Loss

多类别分类的loss一般使用交叉熵。代码如下:

 # 实例化模型并且移动到GPU
    criterion_train = SoftTargetCrossEntropy().cuda()
    criterion_val = torch.nn.CrossEntropyLoss().cuda()

SoftTargetCrossEntropy,成为软交叉熵,当Label做了平滑之后,使用SoftTargetCrossEntropy。

定义模型、优化器以及学习率调整策略

    # 设置模型
    model_ft = VGG(classes,True)
    if resume:
        model = torch.load(resume)
        model_ft.load_state_dict(model['state_dict'])
        Best_ACC = model['Best_ACC']
        start_epoch = model['epoch'] + 1
    model_ft.to(DEVICE)
    print(model_ft)
    # 选择简单暴力的Adam优化器,学习率调低
    optimizer = optim.AdamW(model_ft.parameters(), lr=model_lr)
    cosine_schedule = optim.lr_scheduler.CosineAnnealingLR(optimizer=optimizer, T_max=20, eta_min=1e-6)

模型就选用前面定义的VGG模型。
优化器选用AdamW。
学习率调整策略选用CosineAnnealingLR。

训练函数

# 定义训练过程
def train(model, device, train_loader, optimizer, epoch, criterion,epochs):
    model.train()
    sum_loss = 0
    correct = 0
    total_num = len(train_loader.dataset)
    print(total_num, len(train_loader))
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device, non_blocking=True), target.to(device, non_blocking=True)
        samples, targets = mixup_fn(data, target)
        output = model(samples)
        loss = criterion(output, targets)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print_loss = loss.data.item()
        _, pred = torch.max(output.data, 1)
        correct += torch.sum(pred == target)
        sum_loss += print_loss
        if (batch_idx + 1) % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, (batch_idx + 1) * len(data), len(train_loader.dataset),
                       100. * (batch_idx + 1) / len(train_loader), loss.item()))
    ave_loss = sum_loss / len(train_loader)
    correct = correct.data.item()
    acc = correct / total_num
    print('epoch:{},loss:{}'.format(epoch, ave_loss))
    return ave_loss, acc

训练的主要步骤:

等待一个epoch训练完成后,计算平均loss。然后将其打印出来。并返回loss和acc。

验证函数

验证过程增加了对预测数据和Label数据的保存,所以,需要定义两个list保存,然后将其返回!

# 验证过程
def val(model, device, test_loader,criterion):
    model.eval()
    test_loss = 0
    correct = 0
    total_num = len(test_loader.dataset)
    print(total_num, len(test_loader))
    val_list = []
    pred_list = []
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device, non_blocking=True), target.to(device, non_blocking=True)
            for t in target:
                val_list.append(t.data.item())
            output = model(data)
            loss = criterion(output, target)
            _, pred = torch.max(output.data, 1)
            for p in pred:
                pred_list.append(p.data.item())
            correct += torch.sum(pred == target)
            print_loss = loss.data.item()
            test_loss += print_loss
        correct = correct.data.item()
        acc = correct / total_num
        avgloss = test_loss / len(test_loader)
        print('\nVal set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
            avgloss, correct, len(test_loader.dataset), 100 * acc))
    return val_list, pred_list, avgloss, acc

验证的过程和训练的过程大致相似,主要步骤:

5、acc = correct / total_num,计算出acc。 avgloss = test_loss / len(test_loader)计算loss。 最后返回val_list, pred_list,loss,acc

训练、验证、保存模型

    # 训练
    log_dir = {}
    train_loss_list, val_loss_list, train_acc_list, val_acc_list, epoch_list = [], [], [], [], []
    if resume and os.path.isfile("result.json"):
        with open('result.json', 'r', encoding='utf-8') as file:
            logs = json.load(file)
            train_acc_list = logs['train_acc']
            train_loss_list = logs['train_loss']
            val_acc_list = logs['val_acc']
            val_loss_list = logs['val_loss']
            epoch_list = logs['epoch_list']
    for epoch in range(start_epoch, EPOCHS + 1):
        epoch_list.append(epoch)
        log_dir['epoch_list'] = epoch_list
        train_loss, train_acc = train(model_ft, DEVICE, train_loader, optimizer, epoch, criterion_train,epochs=EPOCHS)
        showBN(model_ft)
        train_loss_list.append(train_loss)
        train_acc_list.append(train_acc)
        log_dir['train_acc'] = train_acc_list
        log_dir['train_loss'] = train_loss_list
        val_list, pred_list, val_loss, val_acc = val(model_ft, DEVICE, test_loader, criterion_val)
        print(classification_report(val_list, pred_list, target_names=dataset_train.class_to_idx))
        val_loss_list.append(val_loss)
        val_acc_list.append(val_acc)
        log_dir['val_acc'] = val_acc_list
        log_dir['val_loss'] = val_loss_list
        log_dir['best_acc'] = Best_ACC
        with open('result.json', 'w', encoding='utf-8') as file:
            file.write(json.dumps(log_dir))
        if val_acc >= Best_ACC:
            Best_ACC = val_acc
            torch.save(model_ft, file_dir + "/" + 'best.pth')
        state = {
            'epoch': epoch,
            'state_dict': model_ft.state_dict(),
            'Best_ACC': Best_ACC
        }
        torch.save(state, file_dir + "/" + 'model_' + str(epoch) + '_' + str(round(val_acc, 3)) + '.pth')
        cosine_schedule.step()
        fig = plt.figure(1)
        plt.plot(epoch_list, train_loss_list, 'r-', label=u'Train Loss')
        # 显示图例
        plt.plot(epoch_list, val_loss_list, 'b-', label=u'Val Loss')
        plt.legend(["Train Loss", "Val Loss"], loc="upper right")
        plt.xlabel(u'epoch')
        plt.ylabel(u'loss')
        plt.title('Model Loss ')
        plt.savefig(file_dir + "/loss.png")
        plt.close(1)
        fig2 = plt.figure(2)
        plt.plot(epoch_list, train_acc_list, 'r-', label=u'Train Acc')
        plt.plot(epoch_list, val_acc_list, 'b-', label=u'Val Acc')
        plt.legend(["Train Acc", "Val Acc"], loc="lower right")
        plt.title("Model Acc")
        plt.ylabel("acc")
        plt.xlabel("epoch")
        plt.savefig(file_dir + "/acc.png")
        plt.close(2)

思路:

定义记录log的字典,声明loss和acc的list。如果是接着上次的断点继续训练则读取log文件,然后把log取出来,赋值到对应的list上。
循环调用train函数和val函数,train函数返回train_loss, train_acc,val函数返回val_list, pred_list, val_loss, val_acc。loss和acc用于绘制曲线。
记录BN的权重状态。
将log字典保存到json文件中。
将val_list, pred_list和dataset_train.class_to_idx传入模型,计算模型指标。
判断acc是否大于Best_ACC,如果大于则保存模型,这里保存的是整个模型。
接下来是保存每个epoch的模型,新建state ,字典的参数:
 - epoch:当前的epoch。
 - state_dict:权重参数。 model_ft.state_dict(),只保存模型的权重参数。
 - Best_ACC:Best_ACC的数值。
 然后,调用 torch.save保存模型。
  cosine_schedule.step(),执行学习率调整算法。
 最后使用plt.plot绘制loss和acc曲线图

然后,点击train.py运行脚本。

测试结果

VGGNet剪枝实战:使用VGGNet训练、稀疏训练、剪枝、微调等,剪枝出只有3M的模型-LMLPHP
VGGNet剪枝实战:使用VGGNet训练、稀疏训练、剪枝、微调等,剪枝出只有3M的模型-LMLPHP
最好的模型ACC是95.6%。

稀疏训练VGGNet

微调结果

Val set: Average loss: 0.2845, Accuracy: 457/482 (95%)

                           precision    recall  f1-score   support

              Black-grass       0.79      0.86      0.83        36
                 Charlock       1.00      1.00      1.00        42
                 Cleavers       1.00      0.96      0.98        50
         Common Chickweed       0.94      0.91      0.93        34
             Common wheat       0.93      1.00      0.97        42
                  Fat Hen       0.97      0.97      0.97        34
         Loose Silky-bent       0.88      0.78      0.83        46
                    Maize       0.96      1.00      0.98        45
        Scentless Mayweed       0.93      0.96      0.95        45
          Shepherds Purse       0.97      0.97      0.97        35
Small-flowered Cranesbill       1.00      0.97      0.99        36
               Sugar beet       1.00      1.00      1.00        37

                 accuracy                           0.95       482
                macro avg       0.95      0.95      0.95       482
             weighted avg       0.95      0.95      0.95       482
08-06 20:21