计算机视觉之ResNet-Toy模板网

这篇具有很好参考价值的文章主要介绍了计算机视觉之ResNet。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

1 ResNet介绍

1.1 ResNet概述

RestNet是2015年由微软团队提出的，在当时获得分类任务，目标检测，图像分割第一名。该论文的四位作者何恺明、张祥雨、任少卿和孙剑如今在人工智能领域里都是响当当的名字，当时他们都是微软亚研的一员。实验结果显示，残差网络更容易优化，并且加深网络层数有助于提高正确率。在ImageNet上使用152层的残差网络（VGG net的8倍深度，但残差网络复杂度更低）。对这些网络使用集成方法实现了3.75%的错误率。获得了ILSVRC 2015竞赛的第一名。

论文地址：原文链接

这是一篇计算机视觉领域的经典论文。李沐曾经说过，假设你在使用卷积神经网络，有一半的可能性就是在使用 ResNet 或它的变种。ResNet 论文被引用数量突破了 10 万+。

1.2 ResNet网络结构

ResNet的经典网络结构有：ResNet-18、ResNet-34、ResNet-50、ResNet-101、ResNet-152几种，其中，ResNet-18和ResNet-34的基本结构相同，属于相对浅层的网络，后面3种属于更深层的网络，其中RestNet50最为常用。

残差网络是为了解决深度神经网络（DNN）隐藏层过多时的网络退化问题而提出。退化（degradation）问题是指：当网络隐藏层变多时，网络的准确度达到饱和然后急剧退化，而且这个退化不是由于过拟合引起的。

resnet,计算机视觉,深度学习,计算机视觉,卷积神经网络,resnet

假设一个网络 A，训练误差为 x。在 A 的顶部添加几个层构建网络 B，这些层的参数对于 A 的输出没有影响，我们称这些层为 C。这意味着新网络 B 的训练误差也是 x。网络 B 的训练误差不应高于 A，如果出现 B 的训练误差高于 A 的情况，则使用添加的层 C 学习恒等映射（对输入没有影响）并不是一个平凡问题。

为了解决这个问题，上图中的模块在输入和输出之间添加了一个直连路径，以直接执行映射。这时，C 只需要学习已有的输入特征就可以了。由于 C 只学习残差，该模块叫作残差模块。

此外，和当年几乎同时推出的 GoogLeNet 类似，它也在分类层之后连接了一个全局平均池化层。通过这些变化，ResNet 可以学习 152 个层的深层网络。它可以获得比 VGGNet 和 GoogLeNet 更高的准确率，同时计算效率比 VGGNet 更高。ResNet-152 可以取得 95.51% 的 top-5 准确率。

resnet,计算机视觉,深度学习,计算机视觉,卷积神经网络,resnet

RestNet18和RestNet50网络结构如下：

resnet,计算机视觉,深度学习,计算机视觉,卷积神经网络,resnet

2 基于pytorch在CIFAR10数据下的RestNet50的实现

2.1 cifar-10数据集

Cifar-10 是由 Hinton 的学生 Alex Krizhevsky、Ilya Sutskever 收集的一个用于普适物体识别的计算机视觉数据集，它包含 60000 张 32 X 32 的 RGB 彩色图片，总共 10 个分类。其中，包括 50000 张用于训练集，10000 张用于测试集。
resnet,计算机视觉,深度学习,计算机视觉,卷积神经网络,resnet

CIFAR-10数据集中一共包含10 个类别的RGB 彩色图片：飞机（ airplane ）、汽车（ automobile ）、鸟类（ bird ）、猫（ cat ）、鹿（ deer ）、狗（ dog ）、蛙类（ frog ）、马（ horse ）、船（ ship ）和卡车（ truck ）。

CIFAR-10是一个更接近普适物体的彩色图像数据集。与MNIST数据集相比， CIFAR-10具有以下不同点：

CIFAR-10 是3 通道的彩色RGB 图像，而MNIST 是灰度图像。
CIFAR-10 的图片尺寸为32 × 32 ，而MNIST 的图片尺寸为28 × 28 ，比MNIST 稍大。

相比于手写字符，CIFAR-10含有的是现实世界中真实的物体，不仅噪声很大，而且物体的比例、特征都不尽相同，这为识别带来很大困难。直接的线性模型如Softmax 在CIFAR-10 上表现得很差。

2.2 代码实现

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, utils
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
from torchvision.transforms import transforms
import torch.nn.functional as F
import datetime
import numpy as np


class Bottleneck(nn.Module):
    def __init__(self, in_channels, out_channels, stride=[1, 1, 1], padding=[0, 1, 0], first=False) -> None:
        super(Bottleneck, self).__init__()
        self.bottleneck = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride[0], padding=padding[0], bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride[1], padding=padding[1], bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels, out_channels * 4, kernel_size=1, stride=stride[2], padding=padding[2], bias=False),
            nn.BatchNorm2d(out_channels * 4)
        )

        # 由于存在维度不一致的情况 所以分情况
        self.shortcut = nn.Sequential()
        if first:
            self.shortcut = nn.Sequential(
                # 卷积核为1 进行升降维
                # 注意跳变时 都是stride==2的时候 也就是每次输出信道升维的时候
                nn.Conv2d(in_channels, out_channels * 4, kernel_size=1, stride=stride[1], bias=False),
                nn.BatchNorm2d(out_channels * 4)
            )

    def forward(self, x):
        out = self.bottleneck(x)
        out += self.shortcut(x)
        out = F.relu(out)
        return out


class ResNet50(nn.Module):
    def __init__(self, Bottleneck, num_classes=10) -> None:
        super(ResNet50, self).__init__()
        self.in_channels = 64
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False),
            nn.BatchNorm2d(64),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )

        self.conv2 = self._make_layer(Bottleneck, 64, [[1, 1, 1]] * 3, [[0, 1, 0]] * 3)
        self.conv3 = self._make_layer(Bottleneck, 128, [[1, 2, 1]] + [[1, 1, 1]] * 3, [[0, 1, 0]] * 4)
        self.conv4 = self._make_layer(Bottleneck, 256, [[1, 2, 1]] + [[1, 1, 1]] * 5, [[0, 1, 0]] * 6)
        self.conv5 = self._make_layer(Bottleneck, 512, [[1, 2, 1]] + [[1, 1, 1]] * 2, [[0, 1, 0]] * 3)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(2048, num_classes)

    def _make_layer(self, block, out_channels, strides, paddings):
        layers = []
        flag = True
        for i in range(0, len(strides)):
            layers.append(block(self.in_channels, out_channels, strides[i], paddings[i], first=flag))
            flag = False
            self.in_channels = out_channels * 4

        return nn.Sequential(*layers)

    def forward(self, x):
        out = self.conv1(x)
        out = self.conv2(out)
        out = self.conv3(out)
        out = self.conv4(out)
        out = self.conv5(out)

        out = self.avgpool(out)
        out = out.reshape(x.shape[0], -1)
        out = self.fc(out)
        return out


def get_format_time():
    return datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')


transform = transforms.Compose([ToTensor(),
                                transforms.Normalize(
                                    mean=[0.5, 0.5, 0.5],
                                    std=[0.5, 0.5, 0.5]
                                ),
                                transforms.Resize((224, 224))
                                ])

training_data = datasets.CIFAR10(
    root="data",
    train=True,
    download=True,
    transform=transform,
)

testing_data = datasets.CIFAR10(
    root="data",
    train=False,
    download=True,
    transform=transform,
)


if __name__ == "__main__":
    res50 = ResNet50(Bottleneck)

    batch_size = 128
    train_loader = DataLoader(dataset=training_data, batch_size=batch_size, shuffle=True, drop_last=True)
    test_loader = DataLoader(dataset=testing_data, batch_size=batch_size, shuffle=True, drop_last=True)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = res50.to(device)
    cost = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters())

    epochs = 20
    accuracy_rate = []
    for epoch in range(epochs):
        train_loss = 0.0
        train_correct = 0.0
        model.train()

        print(f"{get_format_time()}, train epoch: {epoch}/{epochs}")
        for step, (images, labels) in enumerate(train_loader, 0):
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            optimizer.zero_grad()
            loss = cost(outputs, labels)

            loss.backward()
            optimizer.step()
            train_loss += loss.item()
            train_correct += torch.sum(predicted == labels.data)

        # 在测试集上进行验证
        model.eval()
        test_correct = 0
        test_total = 0
        test_loss = 0
        with torch.no_grad():
            for images, labels in test_loader:
                images, labels = images.to(device), labels.to(device)
                outputs = model(images).to(device)
                loss = cost(outputs, labels)
                _, predicted = torch.max(outputs, 1)
                test_total += labels.size(0)
                test_correct += torch.sum(predicted == labels.data)
                test_loss += loss.item()

        accuracy = 100 * test_correct / test_total
        accuracy_rate.append(accuracy)

        print("{}, Train Loss is:{:.4f}, Train Accuracy is:{:.4f}%, Test Loss is::{:.4f} Test Accuracy is:{:.4f}%".format(
            get_format_time(),
            train_loss / len(training_data),
            100 * train_correct / len(training_data),
            test_loss / len(testing_data),
            100 * test_correct / len(testing_data)
        ))

    accuracy_rate = torch.tensor(accuracy_rate).detach().cpu().numpy()
    times = np.linspace(1, epochs, epochs)
    plt.xlabel('times')
    plt.ylabel('accuracy rate')
    plt.plot(times, accuracy_rate)
    plt.show()

    print(f"{datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')},accuracy_rate={accuracy_rate}")

2.3 运行环境准备

（1）如果运行环境为cpu，环境准备如下：

conda create -n cv python=3.9
conda activate cv

pip install torchvision==0.9.0
pip install numpy
pip install matplotlib
pip install requests

（2）如果运行环境GPU，环境准备如下：

通过nvidia-smi命令，查找cuda对应的版本：

Tue May 23 15:24:10 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 528.89       Driver Version: 528.89       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4           TCC   | 00000000:01:00.0 Off |                    0 |
| N/A   55C    P8    11W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

构建运行环境，在torch的GPU版本获取对应的版本进行安装

conda create -n cv python=3.9
conda activate cv

pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install numpy
pip install matplotlib
pip install requests

这是通过nvidia-smi命令，看到已经在GPU上运行：

Tue May 23 15:25:25 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 528.89       Driver Version: 528.89       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4           TCC   | 00000000:01:00.0 Off |                    0 |
| N/A   56C    P0    28W /  70W |   1101MiB / 15360MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      6728      C   ...nda\envs\voice\python.exe     1100MiB |
+-----------------------------------------------------------------------------+

2.4 运行结果展示

resnet,计算机视觉,深度学习,计算机视觉,卷积神经网络,resnet 文章来源地址https://www.toymoban.com/news/detail-857073.html

2023-12-22 14:44:39, train epoch: 0/20
2023-12-22 14:46:21, Train Loss is:0.0126, Train Accuracy is:40.9520%, Test Loss is::0.0116 Test Accuracy is:46.3200%
2023-12-22 14:46:21, train epoch: 1/20
2023-12-22 14:48:01, Train Loss is:0.0087, Train Accuracy is:59.5060%, Test Loss is::0.0109 Test Accuracy is:51.6700%
2023-12-22 14:48:01, train epoch: 2/20
2023-12-22 14:49:40, Train Loss is:0.0070, Train Accuracy is:68.1060%, Test Loss is::0.0072 Test Accuracy is:67.8100%
2023-12-22 14:49:40, train epoch: 3/20
2023-12-22 14:51:20, Train Loss is:0.0057, Train Accuracy is:74.2540%, Test Loss is::0.0073 Test Accuracy is:67.7400%
2023-12-22 14:51:20, train epoch: 4/20
2023-12-22 14:53:00, Train Loss is:0.0049, Train Accuracy is:77.9280%, Test Loss is::0.0061 Test Accuracy is:73.7400%
2023-12-22 14:53:00, train epoch: 5/20
2023-12-22 14:54:41, Train Loss is:0.0042, Train Accuracy is:81.3260%, Test Loss is::0.0049 Test Accuracy is:77.9900%
2023-12-22 14:54:41, train epoch: 6/20
2023-12-22 14:56:20, Train Loss is:0.0036, Train Accuracy is:83.9240%, Test Loss is::0.0047 Test Accuracy is:79.0400%
2023-12-22 14:56:20, train epoch: 7/20
2023-12-22 14:58:00, Train Loss is:0.0031, Train Accuracy is:86.0780%, Test Loss is::0.0059 Test Accuracy is:75.6300%
2023-12-22 14:58:00, train epoch: 8/20
2023-12-22 14:59:39, Train Loss is:0.0027, Train Accuracy is:87.7120%, Test Loss is::0.0048 Test Accuracy is:79.7600%
2023-12-22 14:59:39, train epoch: 9/20
2023-12-22 15:01:19, Train Loss is:0.0023, Train Accuracy is:89.3680%, Test Loss is::0.0048 Test Accuracy is:80.5800%
2023-12-22 15:01:19, train epoch: 10/20
2023-12-22 15:02:58, Train Loss is:0.0019, Train Accuracy is:91.2760%, Test Loss is::0.0044 Test Accuracy is:82.3400%
2023-12-22 15:02:58, train epoch: 11/20
2023-12-22 15:04:38, Train Loss is:0.0016, Train Accuracy is:92.4040%, Test Loss is::0.0045 Test Accuracy is:82.6400%
2023-12-22 15:04:38, train epoch: 12/20
2023-12-22 15:06:18, Train Loss is:0.0014, Train Accuracy is:93.7200%, Test Loss is::0.0053 Test Accuracy is:81.7900%
2023-12-22 15:06:18, train epoch: 13/20
2023-12-22 15:07:57, Train Loss is:0.0011, Train Accuracy is:94.7360%, Test Loss is::0.0051 Test Accuracy is:81.7700%
2023-12-22 15:07:57, train epoch: 14/20
2023-12-22 15:09:37, Train Loss is:0.0010, Train Accuracy is:95.1120%, Test Loss is::0.0062 Test Accuracy is:80.6500%
2023-12-22 15:09:37, train epoch: 15/20
2023-12-22 15:11:15, Train Loss is:0.0008, Train Accuracy is:96.1600%, Test Loss is::0.0056 Test Accuracy is:82.0300%
2023-12-22 15:11:15, train epoch: 16/20
2023-12-22 15:12:54, Train Loss is:0.0007, Train Accuracy is:96.6140%, Test Loss is::0.0055 Test Accuracy is:82.4200%
2023-12-22 15:12:54, train epoch: 17/20
2023-12-22 15:14:34, Train Loss is:0.0007, Train Accuracy is:96.8880%, Test Loss is::0.0068 Test Accuracy is:81.1300%
2023-12-22 15:14:34, train epoch: 18/20
2023-12-22 15:16:13, Train Loss is:0.0006, Train Accuracy is:97.0620%, Test Loss is::0.0062 Test Accuracy is:82.1900%
2023-12-22 15:16:13, train epoch: 19/20
2023-12-22 15:17:52, Train Loss is:0.0006, Train Accuracy is:97.4180%, Test Loss is::0.0063 Test Accuracy is:82.7800%
2023-12-22 15:17:53,accuracy_rate=[46.39423  51.752804 67.91867  67.84856  73.85818  78.11498  79.166664
 75.751205 79.887825 80.70914  82.471954 82.77244  81.921074 81.90104
 80.77925  82.16146  82.552086 81.26002  82.32172  82.91266 ]

到了这里，关于计算机视觉之ResNet的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！