Jetson nano部署Yolov5目标检测 + Tensor RT加速(超级详细版)

这篇具有很好参考价值的文章主要介绍了Jetson nano部署Yolov5目标检测 + Tensor RT加速(超级详细版)。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

        一、准备工具

  二、烧录

        三、搭配环境

        四、试跑Yolov5

        五、tensorRT部署yolov5


前言:

在工作或学习中我们需要进行部署,下面这篇文章是我亲自部署jetson nano之后做出的总结,包括自己遇到一些报错和踩坑,希望对你们有所帮助 : )

一、准备工具

  1. 读卡器
  2. SD卡 
  3. 小螺丝刀
  4. 网线(更改语言需要网络)

二、烧录

烧录镜像就是要把SD卡里的东西给完全清除,好比我们电脑重装系统一样,把SD卡格式化。

 插上读卡器后会自动识别U盘,我的电脑会识别很多,弹出很多个U盘选项,这个是正常现象,只格式化一个就可以了。

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

1. 在本地的电脑上下载烧录的镜像,可以去官网自行下载,下载的时候需要看好版本,不然会出问题,我下载的版本是4.4.1,下载的网址如下:

  JetPack SDK 4.4.1 archive | NVIDIA Developer

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

2. 下载好之后进行解压:

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

3. 然后我们需要下载烧录SD卡的工具,网址如下:

Get Started With Jetson Nano Developer Kit | NVIDIA Developer

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

下载完之后是一个.exe可执行文件,运行安装就可以了。

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

4. 开始烧录镜像

运行上面下载好的Etcher之后是这样的,选择完镜像之后会自动识别SD卡,让后点击flash

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

 这个是正在烧录的界面,会自动烧录2遍(大约20分钟左右)

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

5.烧录完成,弹出读卡器,插卡开机。

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

6.开机之后,简单设置一下(密码、地区、时间),然后会就是和我们本地电脑Ubuntu界面差不多。

三、搭配环境

1.  配置CUDA

首先打开终端(ctrl+alt+t),输入以下命令:

sudo gedit ~/.bashrc

输入密码后进入文档:

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

鼠标滚轮滚到文档最下面,输入下面命令,然后按ctrl+s保存:

export CUDA_HOME=/usr/local/cuda-10.2
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-10.2/bin:$PATH

验证CUDA是否安装配置成功 :

nvcc -V

出现这个就证明配置成功了

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

 *报错*

我输出之后会出现读取异常,或者不允许保存的提示,然后运行nvcc -V的时候显示命令不可执行,这个多输入几次或者关闭终端再输入几次命令

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

2.  配置conda

jetson nanoB01的架构是aarch64,与windows和liunx不同不同,所以不能安装Anaconda,可以安装一个替代它的archiconda。

在终端输入以下下载命令:

wget https://github.com/Archiconda/build-tools/releases/download/0.2.3/Archiconda3-0.2.3-Linux-aarch64.sh

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

下载成功。 jetson 安装tensorrt,YOLO,计算机视觉,目标检测

* 报错*

遇到下载一半或者下载到99%之后报错的问题。

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

这个问题是因为堆栈小的原因,改一下堆栈大小就可以了。

先查一以下自己的系统堆栈大小,用这个命令:

ulimit -a

stack size是堆栈的大小,我原来是8192,需要改成102400

 jetson 安装tensorrt,YOLO,计算机视觉,目标检测

直接用代码 ulimit -s 102400 修改,改完之后:

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

下载好之后运行这个命令:

bash Archiconda3-0.2.3-Linux-aarch64.sh

接下来就是傻瓜式的安装:

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

安装完之后配置环境变量: 

sudo gedit ~/.bashrc

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

在最初打开的文档里的最后一行加上这个命令: 

export PATH=~/archiconda3/bin:$PATH

查看conda的版本号:

conda -V

3. 创建你自己的虚拟环境:

conda create -n xxx(虚拟环境名) python=3.6   #创建一个python3.6的虚拟环境
conda activate xxx #进入虚拟环境
conda deactivate #(退出虚拟环境)

4. 换源

首先需要备份一下sources.list文件,执行后终端没有响应

sudo cp /etc/apt/sources.list /etc/apt/sources.list.bak 

进入sources.list内部

sudo gedit /etc/apt/sources.list

进入之后,ctrl+a删除所有内容,之后将以下内容复制进去保存退出。

deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic main multiverse restricted universe
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-security main multiverse restricted universe
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-updates main multiverse restricted universe
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-backports main multiverse restricted universe
deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic main multiverse restricted universe
deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-security main multiverse restricted universe
deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-updates main multiverse restricted universe
deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-backports main multiverse restricted universe

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

 然后更新软件列表,保存到本地:

sudo apt-get update

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

 更新软件:

sudo apt-get upgrade

升级所有的安装包,并解决依赖关系

sudo apt-get dist-upgrade

5. 安装pip

sudo apt-get install python3-pip libopenblas-base libopenmpi-dev

更新pip到最新版本

pip3 install --upgrade pip		#如果pip已是最新,可不执行

 6. 下载torch和torchvision

nano上安装需要去英伟达官网下载所需版本,我这里是1.8.0,网址如下:

PyTorch for Jetson - version 1.10 now available - Jetson Nano - NVIDIA Developer Forums

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

下载好后,这里推荐一个工具(MobaXterm)用于连接板子进行互传,当然你也可以直接下载到U盘,用U盘实现互传。

具体操作可以看这个博主的博客:MobaXterm(终端工具)下载&安装&使用教程_蜗牛也不慢......的博客-CSDN博客

连接成功后把可以在本地下载的东西解压后,传过去,建议传到home目录下。
 

7. 安装torch

torchvision文件夹的目录下右键,进入终端,然后进入你创建的虚拟环境,输入以下命令:

sudo apt-get install python3-pip libopenblas-base libopenmpi-dev 

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

pip install torch-1.8.0-cp36-cp36m-linux_aarch64.whl

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

 如果遇到网络问题下载不了,就在第二个命令下载后面加上清华下载镜像:

-i https://pypi.tuna.tsinghua.edu.cn/simple

安装torch时,一定要安装numpy,不然的话显示你安装torch成功,但是你conda list是找不到的。

sudo apt install python3-numpy

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

 测试torch是否安装成功:

import torch 
print(torch.__version__)

*报错*

在测试torch是否安装成功时报错:非法指令(核心已转储)

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

export OPENBLAS_CORETYPE=ARMV8

8. 安装torchvision,逐个执行以下命令(如果执行第三个命令报错,继续执行第四第五行命令,如果不报错就直接cd ..)

cd torchvision
 
export BUILD_VERSION=0.9.0
 
sudo python setup.py install 
 
python setup.py build
 
python setup.py install
 
cd .. #(中间有个空格)

*报错*

在测试时如果报错PIL,就安装pillow,命令如下(在第二步如果报权限错误,就在开头加sudo,或在结尾加--user)

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

sudo apt-get install libjpeg8 libjpeg62-dev libfreetype6 libfreetype6-dev
 
python3 -m pip install -i https://mirrors.aliyun.com/pypi/simple pillow

jetson 安装tensorrt,YOLO,计算机视觉,目标检测  jetson 安装tensorrt,YOLO,计算机视觉,目标检测

 到这里为止环境就全部搭建完毕了,下面开始部署Yolo5。

 四、试跑Yolov5

1. 去官网上下载需要的版本,我这里下载的是5.0版本,下载的时候要把对应的权重也要下载。网址如下:

ultralytics/yolov5 at v5.0 (github.com)

在这里选择版本:

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

权重网址如下:

Releases · ultralytics/yolov5 (github.com)

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

2. 下载完或者使用工具将yolo5和权重文件拖到主目录之后,使用cd进入yolo5文件目录下,在终端下载依赖项:

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

 *报错*

遇到下面的问题是因为numpy版本过高导致的,降低以下版本就可以了。

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

把版本降低到1.19.4:

pip install numpy==1.19.4 -i https://pypi.tuna.tsinghua.edu.cn/simple

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

 解决完报错之后,在运行上面的命令,会自动下载需要的安装包,其他的都是很快的,但是到opencv的时候需要花费很长很长很长很长的时间......,当安装opencv时会出现Building wheel for opencv-python (pyroject.toml)...  这种情况正常现象,是opencv在编译,继续等就可以了,我编译了2个多小时...

3. 经过N长时间的等待,安装好所有依赖包后,将权重文件拖到yolov5文件夹根目录下,在yolov5的根目录下打开终端,执行以下命令:

python3 detect.py --weights yolov5s.pt 

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

测试一下是没问题的,接下来我们开始部署yolov5。

五、tensorRT部署yolov5

1. tensorRT官网下载yolov5,网址如下,确定下载是v5.0版本:

mirrors / wang-xinyu / tensorrtx · GitCode

mirrors / wang-xinyu / tensorrtx · GitCodeImplementation of popular deep learning networks with TensorRT network definition API 🚀 Github 镜像仓库 🚀 源项目地址https://gitcode.net/mirrors/wang-xinyu/tensorrtx?utm_source=csdn_github_accelerator

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

2. 下载完毕之后在文件夹里找到gen_wts.py,然后复制到yolov5文件夹下,右键打开终端,执行命令后生成yolov5.wts文件:

python3 gen_wts.py --w yolov5s.pt

注:每个人的文件路径不同,这个根据自己的途权重路径而改变,一般路径都在tensorrtx-yolov5-v5.0 -> yolov5 -> gen_wts.pt。

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

 3. 找到yololayer.h文件,打开修改类别数量(根据自己的情况而定),和输入图片大小(修改是尽量是32的倍数)

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

4. 在当前目录下创建文件build,命令如下:

maker build  #创建build文件夹
 
cd build   #进入build
 
cmake ..   #构建项目
 
#将我们上面生成的.wts文件复制到build文件夹中
 
make

5. 将上面生成的yolov5.wts文件拖到tensortx/yolov5下,右键打开终端:

sudo ./yolov5 -s yolov5s.wts yolov5s.engine s

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

6. 在tensorrtx-yolov5-v5.0\yolov5下新建sample文件夹,在里面放一张需要测试的图片(尽量放人的图片)进行测试,命令如下 :

sudo ./yolov5 -d yolov5s.engine ../sample

注:运行之后 ,在build文件夹下会生成一张检测过的图片,但是效果不是很好,这个是正常现象。

7. 测试图片看不出效果,并且真正部署到生产环境,交付给用户使用.是通过调用摄像头.所以要改一下tensorrtx-yolov5-v5.0 -> yolov5.cpp,可以参考网上大神的教程:

#include <iostream>
#include <chrono>
#include "cuda_utils.h"
#include "logging.h"
#include "common.hpp"
#include "utils.h"
#include "calibrator.h"
 
#define USE_FP16  // set USE_INT8 or USE_FP16 or USE_FP32
#define DEVICE 0  // GPU id
#define NMS_THRESH 0.4
#define CONF_THRESH 0.5
#define BATCH_SIZE 1
 
// stuff we know about the network and the input/output blobs
static const int INPUT_H = Yolo::INPUT_H;
static const int INPUT_W = Yolo::INPUT_W;
static const int CLASS_NUM = Yolo::CLASS_NUM;
static const int OUTPUT_SIZE = Yolo::MAX_OUTPUT_BBOX_COUNT * sizeof(Yolo::Detection) / sizeof(float) + 1;  // we assume the yololayer outputs no more than MAX_OUTPUT_BBOX_COUNT boxes that conf >= 0.1
const char* INPUT_BLOB_NAME = "data";
const char* OUTPUT_BLOB_NAME = "prob";
static Logger gLogger;
 
//修改为自己的类别
char *my_classes[]={ "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
         "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
         "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
         "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard","surfboard",
         "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
         "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
         "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
         "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
         "hair drier", "toothbrush" };
 
static int get_width(int x, float gw, int divisor = 8) {
    //return math.ceil(x / divisor) * divisor
    if (int(x * gw) % divisor == 0) {
        return int(x * gw);
    }
    return (int(x * gw / divisor) + 1) * divisor;
}
 
static int get_depth(int x, float gd) {
    if (x == 1) {
        return 1;
    }
    else {
        return round(x * gd) > 1 ? round(x * gd) : 1;
    }
}
 //#创建engine和network
ICudaEngine* build_engine(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, float& gd, float& gw, std::string& wts_name) {
    INetworkDefinition* network = builder->createNetworkV2(0U);
 
    // Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
    ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ 3, INPUT_H, INPUT_W });
    assert(data);
 
    std::map<std::string, Weights> weightMap = loadWeights(wts_name);
 
    /* ------ yolov5 backbone------ */
    auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0");
    auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");
    auto bottleneck_CSP2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw), get_depth(3, gd), true, 1, 0.5, "model.2");
    auto conv3 = convBlock(network, weightMap, *bottleneck_CSP2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3");
    auto bottleneck_csp4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw), get_depth(9, gd), true, 1, 0.5, "model.4");
    auto conv5 = convBlock(network, weightMap, *bottleneck_csp4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5");
    auto bottleneck_csp6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(9, gd), true, 1, 0.5, "model.6");
    auto conv7 = convBlock(network, weightMap, *bottleneck_csp6->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.7");
    auto spp8 = SPP(network, weightMap, *conv7->getOutput(0), get_width(1024, gw), get_width(1024, gw), 5, 9, 13, "model.8");
 
    /* ------ yolov5 head ------ */
    auto bottleneck_csp9 = C3(network, weightMap, *spp8->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.9");
    auto conv10 = convBlock(network, weightMap, *bottleneck_csp9->getOutput(0), get_width(512, gw), 1, 1, 1, "model.10");
 
    auto upsample11 = network->addResize(*conv10->getOutput(0));
    assert(upsample11);
    upsample11->setResizeMode(ResizeMode::kNEAREST);
    upsample11->setOutputDimensions(bottleneck_csp6->getOutput(0)->getDimensions());
 
    ITensor* inputTensors12[] = { upsample11->getOutput(0), bottleneck_csp6->getOutput(0) };
    auto cat12 = network->addConcatenation(inputTensors12, 2);
    auto bottleneck_csp13 = C3(network, weightMap, *cat12->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.13");
    auto conv14 = convBlock(network, weightMap, *bottleneck_csp13->getOutput(0), get_width(256, gw), 1, 1, 1, "model.14");
 
    auto upsample15 = network->addResize(*conv14->getOutput(0));
    assert(upsample15);
    upsample15->setResizeMode(ResizeMode::kNEAREST);
    upsample15->setOutputDimensions(bottleneck_csp4->getOutput(0)->getDimensions());
 
    ITensor* inputTensors16[] = { upsample15->getOutput(0), bottleneck_csp4->getOutput(0) };
    auto cat16 = network->addConcatenation(inputTensors16, 2);
 
    auto bottleneck_csp17 = C3(network, weightMap, *cat16->getOutput(0), get_width(512, gw), get_width(256, gw), get_depth(3, gd), false, 1, 0.5, "model.17");
 
    // yolo layer 0
    IConvolutionLayer* det0 = network->addConvolutionNd(*bottleneck_csp17->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.0.weight"], weightMap["model.24.m.0.bias"]);
    auto conv18 = convBlock(network, weightMap, *bottleneck_csp17->getOutput(0), get_width(256, gw), 3, 2, 1, "model.18");
    ITensor* inputTensors19[] = { conv18->getOutput(0), conv14->getOutput(0) };
    auto cat19 = network->addConcatenation(inputTensors19, 2);
    auto bottleneck_csp20 = C3(network, weightMap, *cat19->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.20");
    //yolo layer 1
    IConvolutionLayer* det1 = network->addConvolutionNd(*bottleneck_csp20->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.1.weight"], weightMap["model.24.m.1.bias"]);
    auto conv21 = convBlock(network, weightMap, *bottleneck_csp20->getOutput(0), get_width(512, gw), 3, 2, 1, "model.21");
    ITensor* inputTensors22[] = { conv21->getOutput(0), conv10->getOutput(0) };
    auto cat22 = network->addConcatenation(inputTensors22, 2);
    auto bottleneck_csp23 = C3(network, weightMap, *cat22->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.23");
    IConvolutionLayer* det2 = network->addConvolutionNd(*bottleneck_csp23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.2.weight"], weightMap["model.24.m.2.bias"]);
 
    auto yolo = addYoLoLayer(network, weightMap, "model.24", std::vector<IConvolutionLayer*>{det0, det1, det2});
    yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
    network->markOutput(*yolo->getOutput(0));
 
    // Build engine
    builder->setMaxBatchSize(maxBatchSize);
    config->setMaxWorkspaceSize(16 * (1 << 20));  // 16MB
#if defined(USE_FP16)
    config->setFlag(BuilderFlag::kFP16);
#elif defined(USE_INT8)
    std::cout << "Your platform support int8: " << (builder->platformHasFastInt8() ? "true" : "false") << std::endl;
    assert(builder->platformHasFastInt8());
    config->setFlag(BuilderFlag::kINT8);
    Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, "./coco_calib/", "int8calib.table", INPUT_BLOB_NAME);
    config->setInt8Calibrator(calibrator);
#endif
 
    std::cout << "Building engine, please wait for a while..." << std::endl;
    ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
    std::cout << "Build engine successfully!" << std::endl;
 
    // Don't need the network any more
    network->destroy();
 
    // Release host memory
    for (auto& mem : weightMap)
    {
        free((void*)(mem.second.values));
    }
 
    return engine;
}
 
ICudaEngine* build_engine_p6(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, float& gd, float& gw, std::string& wts_name) {
    INetworkDefinition* network = builder->createNetworkV2(0U);
 
    // Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
    ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ 3, INPUT_H, INPUT_W });
    assert(data);
 
    std::map<std::string, Weights> weightMap = loadWeights(wts_name);
 
    /* ------ yolov5 backbone------ */
    auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0");
    auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");
    auto c3_2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw), get_depth(3, gd), true, 1, 0.5, "model.2");
    auto conv3 = convBlock(network, weightMap, *c3_2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3");
    auto c3_4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw), get_depth(9, gd), true, 1, 0.5, "model.4");
    auto conv5 = convBlock(network, weightMap, *c3_4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5");
    auto c3_6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(9, gd), true, 1, 0.5, "model.6");
    auto conv7 = convBlock(network, weightMap, *c3_6->getOutput(0), get_width(768, gw), 3, 2, 1, "model.7");
    auto c3_8 = C3(network, weightMap, *conv7->getOutput(0), get_width(768, gw), get_width(768, gw), get_depth(3, gd), true, 1, 0.5, "model.8");
    auto conv9 = convBlock(network, weightMap, *c3_8->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.9");
    auto spp10 = SPP(network, weightMap, *conv9->getOutput(0), get_width(1024, gw), get_width(1024, gw), 3, 5, 7, "model.10");
    auto c3_11 = C3(network, weightMap, *spp10->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.11");
 
    /* ------ yolov5 head ------ */
    auto conv12 = convBlock(network, weightMap, *c3_11->getOutput(0), get_width(768, gw), 1, 1, 1, "model.12");
    auto upsample13 = network->addResize(*conv12->getOutput(0));
    assert(upsample13);
    upsample13->setResizeMode(ResizeMode::kNEAREST);
    upsample13->setOutputDimensions(c3_8->getOutput(0)->getDimensions());
    ITensor* inputTensors14[] = { upsample13->getOutput(0), c3_8->getOutput(0) };
    auto cat14 = network->addConcatenation(inputTensors14, 2);
    auto c3_15 = C3(network, weightMap, *cat14->getOutput(0), get_width(1536, gw), get_width(768, gw), get_depth(3, gd), false, 1, 0.5, "model.15");
 
    auto conv16 = convBlock(network, weightMap, *c3_15->getOutput(0), get_width(512, gw), 1, 1, 1, "model.16");
    auto upsample17 = network->addResize(*conv16->getOutput(0));
    assert(upsample17);
    upsample17->setResizeMode(ResizeMode::kNEAREST);
    upsample17->setOutputDimensions(c3_6->getOutput(0)->getDimensions());
    ITensor* inputTensors18[] = { upsample17->getOutput(0), c3_6->getOutput(0) };
    auto cat18 = network->addConcatenation(inputTensors18, 2);
    auto c3_19 = C3(network, weightMap, *cat18->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.19");
 
    auto conv20 = convBlock(network, weightMap, *c3_19->getOutput(0), get_width(256, gw), 1, 1, 1, "model.20");
    auto upsample21 = network->addResize(*conv20->getOutput(0));
    assert(upsample21);
    upsample21->setResizeMode(ResizeMode::kNEAREST);
    upsample21->setOutputDimensions(c3_4->getOutput(0)->getDimensions());
    ITensor* inputTensors21[] = { upsample21->getOutput(0), c3_4->getOutput(0) };
    auto cat22 = network->addConcatenation(inputTensors21, 2);
    auto c3_23 = C3(network, weightMap, *cat22->getOutput(0), get_width(512, gw), get_width(256, gw), get_depth(3, gd), false, 1, 0.5, "model.23");
 
    auto conv24 = convBlock(network, weightMap, *c3_23->getOutput(0), get_width(256, gw), 3, 2, 1, "model.24");
    ITensor* inputTensors25[] = { conv24->getOutput(0), conv20->getOutput(0) };
    auto cat25 = network->addConcatenation(inputTensors25, 2);
    auto c3_26 = C3(network, weightMap, *cat25->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.26");
 
    auto conv27 = convBlock(network, weightMap, *c3_26->getOutput(0), get_width(512, gw), 3, 2, 1, "model.27");
    ITensor* inputTensors28[] = { conv27->getOutput(0), conv16->getOutput(0) };
    auto cat28 = network->addConcatenation(inputTensors28, 2);
    auto c3_29 = C3(network, weightMap, *cat28->getOutput(0), get_width(1536, gw), get_width(768, gw), get_depth(3, gd), false, 1, 0.5, "model.29");
 
    auto conv30 = convBlock(network, weightMap, *c3_29->getOutput(0), get_width(768, gw), 3, 2, 1, "model.30");
    ITensor* inputTensors31[] = { conv30->getOutput(0), conv12->getOutput(0) };
    auto cat31 = network->addConcatenation(inputTensors31, 2);
    auto c3_32 = C3(network, weightMap, *cat31->getOutput(0), get_width(2048, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.32");
 
    /* ------ detect ------ */
    IConvolutionLayer* det0 = network->addConvolutionNd(*c3_23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.0.weight"], weightMap["model.33.m.0.bias"]);
    IConvolutionLayer* det1 = network->addConvolutionNd(*c3_26->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.1.weight"], weightMap["model.33.m.1.bias"]);
    IConvolutionLayer* det2 = network->addConvolutionNd(*c3_29->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.2.weight"], weightMap["model.33.m.2.bias"]);
    IConvolutionLayer* det3 = network->addConvolutionNd(*c3_32->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.3.weight"], weightMap["model.33.m.3.bias"]);
 
    auto yolo = addYoLoLayer(network, weightMap, "model.33", std::vector<IConvolutionLayer*>{det0, det1, det2, det3});
    yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
    network->markOutput(*yolo->getOutput(0));
 
    // Build engine
    builder->setMaxBatchSize(maxBatchSize);
    config->setMaxWorkspaceSize(16 * (1 << 20));  // 16MB
#if defined(USE_FP16)
    config->setFlag(BuilderFlag::kFP16);
#elif defined(USE_INT8)
    std::cout << "Your platform support int8: " << (builder->platformHasFastInt8() ? "true" : "false") << std::endl;
    assert(builder->platformHasFastInt8());
    config->setFlag(BuilderFlag::kINT8);
    Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, "./coco_calib/", "int8calib.table", INPUT_BLOB_NAME);
    config->setInt8Calibrator(calibrator);
#endif
 
    std::cout << "Building engine, please wait for a while..." << std::endl;
    ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
    std::cout << "Build engine successfully!" << std::endl;
 
    // Don't need the network any more
    network->destroy();
 
    // Release host memory
    for (auto& mem : weightMap)
    {
        free((void*)(mem.second.values));
    }
 
    return engine;
}
 
void APIToModel(unsigned int maxBatchSize, IHostMemory** modelStream, float& gd, float& gw, std::string& wts_name) {
    // Create builder
    IBuilder* builder = createInferBuilder(gLogger);
    IBuilderConfig* config = builder->createBuilderConfig();
 
    // Create model to populate the network, then set the outputs and create an engine
    ICudaEngine* engine = build_engine(maxBatchSize, builder, config, DataType::kFLOAT, gd, gw, wts_name);
    assert(engine != nullptr);
 
    // Serialize the engine
    (*modelStream) = engine->serialize();
 
    // Close everything down
    engine->destroy();
    builder->destroy();
    config->destroy();
}
 
void doInference(IExecutionContext& context, cudaStream_t& stream, void** buffers, float* input, float* output, int batchSize) {
    // DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host
    CUDA_CHECK(cudaMemcpyAsync(buffers[0], input, batchSize * 3 * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));
    context.enqueue(batchSize, buffers, stream, nullptr);
    CUDA_CHECK(cudaMemcpyAsync(output, buffers[1], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));
    cudaStreamSynchronize(stream);
}
 
bool parse_args(int argc, char** argv, std::string& engine) {
    if (argc < 3) return false;
    if (std::string(argv[1]) == "-v" && argc == 3) {
        engine = std::string(argv[2]);
    }
    else {
        return false;
    }
    return true;
}
 
int main(int argc, char** argv) {
    cudaSetDevice(DEVICE);
 
    //std::string wts_name = "";
    std::string engine_name = "";
    //float gd = 0.0f, gw = 0.0f;
    //std::string img_dir;
 
    if (!parse_args(argc, argv, engine_name)) {
        std::cerr << "arguments not right!" << std::endl;
        std::cerr << "./yolov5 -v [.engine] // run inference with camera" << std::endl;
        return -1;
    }
 
    std::ifstream file(engine_name, std::ios::binary);
    if (!file.good()) {
        std::cerr << " read " << engine_name << " error! " << std::endl;
        return -1;
    }
    char* trtModelStream{ nullptr };
    size_t size = 0;
    file.seekg(0, file.end);
    size = file.tellg();
    file.seekg(0, file.beg);
    trtModelStream = new char[size];
    assert(trtModelStream);
    file.read(trtModelStream, size);
    file.close();
 
 
    // prepare input data ---------------------------
    static float data[BATCH_SIZE * 3 * INPUT_H * INPUT_W];
    //for (int i = 0; i < 3 * INPUT_H * INPUT_W; i++)
    //    data[i] = 1.0;
    static float prob[BATCH_SIZE * OUTPUT_SIZE];
    IRuntime* runtime = createInferRuntime(gLogger);
    assert(runtime != nullptr);
    ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size);
    assert(engine != nullptr);
    IExecutionContext* context = engine->createExecutionContext();
    assert(context != nullptr);
    delete[] trtModelStream;
    assert(engine->getNbBindings() == 2);
    void* buffers[2];
    // In order to bind the buffers, we need to know the names of the input and output tensors.
    // Note that indices are guaranteed to be less than IEngine::getNbBindings()
    const int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME);
    const int outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);
    assert(inputIndex == 0);
    assert(outputIndex == 1);
    // Create GPU buffers on device
    CUDA_CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE * 3 * INPUT_H * INPUT_W * sizeof(float)));
    CUDA_CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(float)));
    // Create stream
    cudaStream_t stream;
    CUDA_CHECK(cudaStreamCreate(&stream));
 
     //#读取本地视频
    //cv::VideoCapture capture("/home/nano/Videos/video.mp4");
     //#调用本地usb摄像头,我的默认参数为1,如果1报错,可修改为0.
    cv::VideoCapture capture(1);
    if (!capture.isOpened()) {
        std::cout << "Error opening video stream or file" << std::endl;
        return -1;
    }
 
    int key;
    int fcount = 0;
    while (1)
    {
        cv::Mat frame;
        capture >> frame;
        if (frame.empty())
        {
            std::cout << "Fail to read image from camera!" << std::endl;
            break;
        }
        fcount++;
        //if (fcount < BATCH_SIZE && f + 1 != (int)file_names.size()) continue;
        for (int b = 0; b < fcount; b++) {
            //cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);
            cv::Mat img = frame;
            if (img.empty()) continue;
            cv::Mat pr_img = preprocess_img(img, INPUT_W, INPUT_H); // letterbox BGR to RGB
            int i = 0;
            for (int row = 0; row < INPUT_H; ++row) {
                uchar* uc_pixel = pr_img.data + row * pr_img.step;
                for (int col = 0; col < INPUT_W; ++col) {
                    data[b * 3 * INPUT_H * INPUT_W + i] = (float)uc_pixel[2] / 255.0;
                    data[b * 3 * INPUT_H * INPUT_W + i + INPUT_H * INPUT_W] = (float)uc_pixel[1] / 255.0;
                    data[b * 3 * INPUT_H * INPUT_W + i + 2 * INPUT_H * INPUT_W] = (float)uc_pixel[0] / 255.0;
                    uc_pixel += 3;
                    ++i;
                }
            }
        }
 
        // Run inference
        auto start = std::chrono::system_clock::now();//#获取模型推理开始时间
        doInference(*context, stream, buffers, data, prob, BATCH_SIZE);
        auto end = std::chrono::system_clock::now();//#结束时间
        //std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;
        int fps = 1000.0 / std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
        std::vector<std::vector<Yolo::Detection>> batch_res(fcount);
        for (int b = 0; b < fcount; b++) {
            auto& res = batch_res[b];
            nms(res, &prob[b * OUTPUT_SIZE], CONF_THRESH, NMS_THRESH);
        }
        for (int b = 0; b < fcount; b++) {
            auto& res = batch_res[b];
            //std::cout << res.size() << std::endl;
            //cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);
            for (size_t j = 0; j < res.size(); j++) {
                cv::Rect r = get_rect(frame, res[j].bbox);
                cv::rectangle(frame, r, cv::Scalar(0x27, 0xC1, 0x36), 2);
                std::string label = my_classes[(int)res[j].class_id];
                cv::putText(frame, label, cv::Point(r.x, r.y - 1), cv::FONT_HERSHEY_PLAIN, 1.2, cv::Scalar(0xFF, 0xFF, 0xFF), 2);
                std::string jetson_fps = "FPS: " + std::to_string(fps);
                cv::putText(frame, jetson_fps, cv::Point(11, 80), cv::FONT_HERSHEY_PLAIN, 3, cv::Scalar(0, 0, 255), 2, cv::LINE_AA);
            }
            //cv::imwrite("_" + file_names[f - fcount + 1 + b], img);
        }
        cv::imshow("yolov5", frame);
        key = cv::waitKey(1);
        if (key == 'q') {
            break;
        }
        fcount = 0;
    }
 
    capture.release();
    // Release stream and buffers
    cudaStreamDestroy(stream);
    CUDA_CHECK(cudaFree(buffers[inputIndex]));
    CUDA_CHECK(cudaFree(buffers[outputIndex]));
    // Destroy the engine
    context->destroy();
    engine->destroy();
    runtime->destroy();
 
    return 0;
}
 

修改完.cpp代码之后,tensorrtx-yolov5-v5.0 -> yolov5 - > build文件夹下,运行以下命令:

make
sudo ./yolov5 -v yolov5s.engine

运行之后摄像头调用成功,效果如下:

jetson 安装tensorrt,YOLO,计算机视觉,目标检测

到此我们就大功告成了,以上就是Jeston nano完整的部署过程,希望对你们有所帮助,请大家多多支持。创作不易,希望得到你们的鼓励~文章来源地址https://www.toymoban.com/news/detail-546754.html

到了这里,关于Jetson nano部署Yolov5目标检测 + Tensor RT加速(超级详细版)的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • Jetson AGX Xavier实现TensorRT加速YOLOv5进行实时检测

    link 上一篇:Jetson AGX Xavier安装torch、torchvision且成功运行yolov5算法 下一篇:Jetson AGX Xavier测试YOLOv4         由于YOLOv5在Xavier上对实时画面的检测速度较慢,需要采用TensorRT对其进行推理加速。接下来记录一下我的实现过程。  如果还没有搭建YOLOv5的python环境,按照下文步骤

    2024年02月10日
    浏览(45)
  • 【目标检测】YOLOv5推理加速实验:TensorRT加速

    前两篇博文分别讨论了YOLOv5检测算法的两种加速思路:采用多进程或批量检测,不过效果均收效甚微。本问将讨论使用TensorRT加速以及半精度推理/模型量化等优化策略对检测加速的实际影响。 测试图片分辨率:13400x9528 GPU:RTX4090 Cuda:11.7 YOLOv5版本:最新版(v7.0+) 检测策略:将

    2024年02月05日
    浏览(57)
  • jetson nx目标检测环境配置遇到的一万个坑,安装v1.12.0版本的pytorch和v0.13.0版本的vision torchvision,以及使用TensorRT部署YOLOv5.

    本文参考了许多官网和博客,肯定是存在抄袭的,请各位大哥不要喷我啊。 自己工作找到的是医学信号方向的算法工程师,所以以后和CV可能无缘了,将自己一个多星期的心血历程发表出来,希望大家接起我的CV火炬,接着前行,各位加油!(后面也学习了yolov5-6.0 yolov7的模型

    2024年02月05日
    浏览(50)
  • 【目标检测】YOLOv5多进程/多线程推理加速实验

    最近在研究如何让YOLOv5推理得更快,总体看来,主要有以下这些思路: 使用更快的 GPU,即:P100 - V100 - A100 多卡GPU推理 减小模型尺寸,即YOLOv5x - YOLOv5l - YOLOv5m - YOLOv5s - YOLOv5n 进行半精度FP16推理与 python detect.py --half 减少–img-size,即 1280 - 640 - 320 导出成 ONNX 或 OpenVINO 格式,获

    2023年04月14日
    浏览(44)
  • Jetson Nano配置YOLOv5并实现FPS=25

    JetPack 4.6——2021.8 yolov5-v6.0版本 使用的为yolov5的yolov5n.pt,并利用tensorrtx进行加速推理,在调用摄像头实时检测可以达到FPS=25。 在打开的文档的末尾添加如下: 保持并退出,终端执行 1.打开终端输入: 2.修改nvzramconfig.sh文件 3.重启Jetson Nano 4.终端中输入: 可查看到swap已经变为

    2024年02月13日
    浏览(36)
  • yolov5目标检测多线程C++部署

    下面的代码搭建了简单的一个生产者-消费者模型,在capture()函数中进行入队操作,infer()函数中进行出队操作,为了模拟采图-推理流程,在函数中调用Sleep()函数延时。 输出结果: 现在我们把capture函数中的Sleep(1000)改成Sleep(500)来模拟生产者加速生产,再次执行程序,则输出:

    2024年02月13日
    浏览(44)
  • Jetson nano (4GB B01) 系统安装,官方Demo测试 (目标检测、手势识别)

    此文确保你可以正确搭建jetson nano环境,并跑通官方”hello AI world” demo. 核心的步骤都来自一手官方教程,如果访问不了可以使用代理或者把com域名修改为cn域名: Getting Started With Jetson Nano Developer Kit 注意:官方”hello AI world“ demo 来自jetson-inference这个API仓库,它没有deepstream复

    2024年02月04日
    浏览(56)
  • 【目标检测算法实现之yolov5】 一、YOLOv5环境配置,将yolov5部署到远程服务器上

    在官网:https://github.com/ultralytics/yolov5上下载yolov5源代码 下载成功如下: 在配置基础环境之前,提前压缩自己的代码文件,并通过winscp传输给linux端,传输之后,解压该文件。解压前,先创建一个文件夹,再解压。 winscp下载使用教程参考上一篇博客:使用WinSCP下载和文件传输

    2024年01月15日
    浏览(55)
  • 改进的yolov5目标检测-yolov5替换骨干网络-yolo剪枝(TensorRT及NCNN部署)

    2022.10.30 复现TPH-YOLOv5 2022.10.31 完成替换backbone为Ghostnet 2022.11.02 完成替换backbone为Shufflenetv2 2022.11.05 完成替换backbone为Mobilenetv3Small 2022.11.10 完成EagleEye对YOLOv5系列剪枝支持 2022.11.14 完成MQBench对YOLOv5系列量化支持 2022.11.16 完成替换backbone为EfficientNetLite-0 2022.11.26 完成替换backbone为

    2024年01月17日
    浏览(71)
  • Opencv C++实现yolov5部署onnx模型完成目标检测

    头文件 命名空间 结构体 Net_config 里面存了三个阈值和模型地址,其中 置信度 ,顾名思义,看检测出来的物体的精准度。以测量值为中心,在一定范围内,真值出现在该范围内的几率。 endsWith()函数 判断sub是不是s的子串 anchors_640图像接收数组 根据图像大小,选择相应长度的

    2024年02月13日
    浏览(41)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包