2.10、自定义量化优化过程

这篇具有很好参考价值的文章主要介绍了2.10、自定义量化优化过程。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

introduction

如何自定义量化优化过程,以及如何手动调用优化过程文章来源地址https://www.toymoban.com/news/detail-716038.html

code

from typing import Callable, Iterable

import torch
import torchvision
from ppq import QuantizationSettingFactory, TargetPlatform
from ppq.api import (ENABLE_CUDA_KERNEL, QuantizationSettingFactory,
                     quantize_torch_model)
from ppq.core import QuantizationStates
from ppq.executor.torch import TorchExecutor
from ppq.IR.quantize import QuantableOperation

# ------------------------------------------------------------
# 在这个例子中,我们将向你介绍如何自定义量化优化过程,以及如何手动调用优化过程
# ------------------------------------------------------------

BATCHSIZE   = 32
INPUT_SHAPE = [BATCHSIZE, 3, 224, 224]
DEVICE      = 'cuda'
PLATFORM    = TargetPlatform.TRT_INT8

# ------------------------------------------------------------
# 和往常一样,我们要创建 calibration 数据,以及加载模型
# ------------------------------------------------------------
def load_calibration_dataset() -> Iterable:
    return [torch.rand(size=INPUT_SHAPE) for _ in range(32)]
CALIBRATION = load_calibration_dataset()

def collate_fn(batch: torch.Tensor) -> torch.Tensor:
    return batch.to(DEVICE)

model = torchvision.models.mobilenet.mobilenet_v2(pretrained=True)
model = model.to(DEVICE)

# ------------------------------------------------------------
# 下面,我们将向你展示如何不借助 QSetting 来自定义优化过程
# QSetting 中包含了 PPQ 官方量化过程的配置参数,你可以借助它来调用所有系统内置优化过程
# 但如果你设计了新的优化过程,你将必须在合适的时机手动启动他们
# ------------------------------------------------------------
QSetting = QuantizationSettingFactory.default_setting()
# 不要进行 Parameter Baking 操作,一旦 Parameter 完成 Baking,后续任何对于参数的修改都是不被允许的
# 你可以设置 baking_parameter = True 并再次执行这个脚本,PPQ 系统会拒绝后续修改 scale 的请求
QSetting.quantize_parameter_setting.baking_parameter = False

# ------------------------------------------------------------
# 定义我们自己的优化过程,继承 QuantizationOptimizationPass 基类,实现 optimize 接口
# 在 optimize 接口函数中,你可以修改图的属性从而实现特定目的
# 在这个例子中,我们将图中所有卷积的输入 scale 变换为原来的两倍
# 同时,我们解除最后一个 Gemm 的输入量化
# ------------------------------------------------------------
from ppq import BaseGraph, QuantizationOptimizationPass, TorchExecutor
class MyOptim(QuantizationOptimizationPass):
    def optimize(self, graph: BaseGraph, dataloader: Iterable, 
                 collate_fn: Callable, executor: TorchExecutor, **kwargs) -> None:
        # graph.operations 是一个包含了图中所有 op 的字典
        for name, op in graph.operations.items():
            
            # 从图中找出所有已经量化的卷积算子
            # 对于你的网络而言,并非所有算子最终都会被量化,他们会受到 调度策略 和 Quantizer策略 的双重限制
            # 因此我们要使用 isinstance(op, QuantableOperation) 来判断它是否是一个量化的算子
            if op.type == 'Conv' and isinstance(op, QuantableOperation):
                
                # 对于卷积算子,它可能有 2-3 个输入,其中第二个输入为权重,第三个输入为 bias
                # 我们修改权重量化信息的 scale
                op.input_quant_config[1].scale *= 2                
                print(f'Input scale of Op {name} has been enlarged.')

            # 我们接下来解除 Gemm 的量化,在这里 mobilenet_v2 网络只有一个 Gemm 层
            # 所以我们将所有遇到的 Gemm 的层全部解除量化
            if op.type == 'Gemm' and isinstance(op, QuantableOperation):

                # config_with_variable 接口将返回量化算子的所有量化信息————包括输入与输出
                for cfg, _ in op.config_with_variable:

                    # 在 PPQ 中有许多方法可以切换算子的量化状态
                    # 将量化状态直接设置为 FP32,即解除了算子的量化
                    cfg.state = QuantizationStates.FP32

                # 也可以直接调用算子的 dequantize 方法
                # op.dequantize()

# ------------------------------------------------------------
# 如果你使用 ENABLE_CUDA_KERNEL 方法
# PPQ 将会尝试编译自定义的高性能量化算子,这一过程需要编译环境的支持
# 如果你在编译过程中发生错误,你可以删除此处对于 ENABLE_CUDA_KERNEL 方法的调用
# 这将显著降低 PPQ 的运算速度;但即使你无法编译这些算子,你仍然可以使用 pytorch 的 gpu 算子完成量化
# ------------------------------------------------------------
with ENABLE_CUDA_KERNEL():
    quantized = quantize_torch_model(
        model=model, calib_dataloader=CALIBRATION,
        calib_steps=32, input_shape=INPUT_SHAPE,
        setting=QSetting, collate_fn=collate_fn, platform=PLATFORM,
        onnx_export_file='./model.onnx', device=DEVICE, verbose=0)

    # ------------------------------------------------------------
    # 在完成量化流程之后,我们调用我们自定义的量化优化过程从而修改量化参数
    # ------------------------------------------------------------
    optim = MyOptim(name='My Optimization Procedure')
    optim.optimize(graph=quantized, dataloader=CALIBRATION, 
                   collate_fn=INPUT_SHAPE, executor=TorchExecutor(quantized, device=DEVICE))

result

      ____  ____  __   ____                    __              __
     / __ \/ __ \/ /  / __ \__  ______ _____  / /_____  ____  / /
    / /_/ / /_/ / /  / / / / / / / __ `/ __ \/ __/ __ \/ __ \/ /
   / ____/ ____/ /__/ /_/ / /_/ / /_/ / / / / /_/ /_/ / /_/ / /
  /_/   /_/   /_____\___\_\__,_/\__,_/_/ /_/\__/\____/\____/_/


[31m[Warning] Compling Kernels... Please wait (It will take a few minutes).[0m
[07:08:25] PPQ Quantization Config Refine Pass Running ... Finished.
[07:08:25] PPQ Quantization Fusion Pass Running ...        Finished.
[07:08:25] PPQ Quantize Simplify Pass Running ...          Finished.
[07:08:25] PPQ Parameter Quantization Pass Running ...     Finished.
[07:08:25] PPQ Runtime Calibration Pass Running ...        
Calibration Progress(Phase 1):   0%|          | 0/32 [00:00<?, ?it/s]
Calibration Progress(Phase 1):   3%|▎         | 1/32 [00:00<00:11,  2.74it/s]
Calibration Progress(Phase 1):   6%|▋         | 2/32 [00:00<00:10,  2.86it/s]
Calibration Progress(Phase 1):   9%|▉         | 3/32 [00:01<00:09,  2.97it/s]
Calibration Progress(Phase 1):  12%|█▎        | 4/32 [00:01<00:09,  3.09it/s]
Calibration Progress(Phase 1):  16%|█▌        | 5/32 [00:01<00:08,  3.08it/s]
Calibration Progress(Phase 1):  19%|█▉        | 6/32 [00:01<00:08,  3.13it/s]
Calibration Progress(Phase 1):  22%|██▏       | 7/32 [00:02<00:08,  3.04it/s]
Calibration Progress(Phase 1):  25%|██▌       | 8/32 [00:02<00:07,  3.07it/s]
Calibration Progress(Phase 1):  28%|██▊       | 9/32 [00:02<00:07,  3.01it/s]
Calibration Progress(Phase 1):  31%|███▏      | 10/32 [00:03<00:06,  3.38it/s]
Calibration Progress(Phase 1):  34%|███▍      | 11/32 [00:03<00:06,  3.08it/s]
Calibration Progress(Phase 1):  38%|███▊      | 12/32 [00:03<00:06,  3.13it/s]
Calibration Progress(Phase 1):  41%|████      | 13/32 [00:04<00:06,  3.04it/s]
Calibration Progress(Phase 1):  44%|████▍     | 14/32 [00:04<00:06,  2.88it/s]
Calibration Progress(Phase 1):  47%|████▋     | 15/32 [00:04<00:05,  3.03it/s]
Calibration Progress(Phase 1):  50%|█████     | 16/32 [00:05<00:05,  2.93it/s]
Calibration Progress(Phase 1):  53%|█████▎    | 17/32 [00:05<00:04,  3.22it/s]
Calibration Progress(Phase 1):  56%|█████▋    | 18/32 [00:05<00:04,  3.15it/s]
Calibration Progress(Phase 1):  59%|█████▉    | 19/32 [00:06<00:03,  3.35it/s]
Calibration Progress(Phase 1):  62%|██████▎   | 20/32 [00:06<00:03,  3.07it/s]
Calibration Progress(Phase 1):  66%|██████▌   | 21/32 [00:06<00:03,  3.21it/s]
Calibration Progress(Phase 1):  69%|██████▉   | 22/32 [00:07<00:03,  3.19it/s]
Calibration Progress(Phase 1):  72%|███████▏  | 23/32 [00:07<00:02,  3.16it/s]
Calibration Progress(Phase 1):  75%|███████▌  | 24/32 [00:07<00:02,  3.10it/s]
Calibration Progress(Phase 1):  78%|███████▊  | 25/32 [00:08<00:02,  3.12it/s]
Calibration Progress(Phase 1):  81%|████████▏ | 26/32 [00:08<00:01,  3.10it/s]
Calibration Progress(Phase 1):  84%|████████▍ | 27/32 [00:08<00:01,  3.00it/s]
Calibration Progress(Phase 1):  88%|████████▊ | 28/32 [00:09<00:01,  3.12it/s]
Calibration Progress(Phase 1):  91%|█████████ | 29/32 [00:09<00:01,  2.98it/s]
Calibration Progress(Phase 1):  94%|█████████▍| 30/32 [00:09<00:00,  3.07it/s]
Calibration Progress(Phase 1):  97%|█████████▋| 31/32 [00:10<00:00,  2.89it/s]
Calibration Progress(Phase 1): 100%|██████████| 32/32 [00:10<00:00,  3.02it/s]
Calibration Progress(Phase 1): 100%|██████████| 32/32 [00:10<00:00,  3.07it/s]
Finished.
[07:08:36] PPQ Quantization Alignment Pass Running ...     Finished.
[07:08:36] PPQ Passive Parameter Quantization Running ...  Finished.
--------- Network Snapshot ---------
Num of Op:                    [100]
Num of Quantized Op:          [54]
Num of Variable:              [277]
Num of Quantized Var:         [207]
------- Quantization Snapshot ------
Num of Quant Config:          [214]
ACTIVATED:                    [108]
FP32:                         [106]
Network Quantization Finished.
Input scale of Op Conv_0 has been enlarged.
Input scale of Op Conv_4 has been enlarged.
Input scale of Op Conv_8 has been enlarged.
Input scale of Op Conv_9 has been enlarged.
Input scale of Op Conv_13 has been enlarged.
Input scale of Op Conv_17 has been enlarged.
Input scale of Op Conv_18 has been enlarged.
Input scale of Op Conv_22 has been enlarged.
Input scale of Op Conv_26 has been enlarged.
Input scale of Op Conv_28 has been enlarged.
Input scale of Op Conv_32 has been enlarged.
Input scale of Op Conv_36 has been enlarged.
Input scale of Op Conv_37 has been enlarged.
Input scale of Op Conv_41 has been enlarged.
Input scale of Op Conv_45 has been enlarged.
Input scale of Op Conv_47 has been enlarged.
Input scale of Op Conv_51 has been enlarged.
Input scale of Op Conv_55 has been enlarged.
Input scale of Op Conv_57 has been enlarged.
Input scale of Op Conv_61 has been enlarged.
Input scale of Op Conv_65 has been enlarged.
Input scale of Op Conv_66 has been enlarged.
Input scale of Op Conv_70 has been enlarged.
Input scale of Op Conv_74 has been enlarged.
Input scale of Op Conv_76 has been enlarged.
Input scale of Op Conv_80 has been enlarged.
Input scale of Op Conv_84 has been enlarged.
Input scale of Op Conv_86 has been enlarged.
Input scale of Op Conv_90 has been enlarged.
Input scale of Op Conv_94 has been enlarged.
Input scale of Op Conv_96 has been enlarged.
Input scale of Op Conv_100 has been enlarged.
Input scale of Op Conv_104 has been enlarged.
Input scale of Op Conv_105 has been enlarged.
Input scale of Op Conv_109 has been enlarged.
Input scale of Op Conv_113 has been enlarged.
Input scale of Op Conv_115 has been enlarged.
Input scale of Op Conv_119 has been enlarged.
Input scale of Op Conv_123 has been enlarged.
Input scale of Op Conv_125 has been enlarged.
Input scale of Op Conv_129 has been enlarged.
Input scale of Op Conv_133 has been enlarged.
Input scale of Op Conv_134 has been enlarged.
Input scale of Op Conv_138 has been enlarged.
Input scale of Op Conv_142 has been enlarged.
Input scale of Op Conv_144 has been enlarged.
Input scale of Op Conv_148 has been enlarged.
Input scale of Op Conv_152 has been enlarged.
Input scale of Op Conv_154 has been enlarged.
Input scale of Op Conv_158 has been enlarged.
Input scale of Op Conv_162 has been enlarged.
Input scale of Op Conv_163 has been enlarged.

到了这里,关于2.10、自定义量化优化过程的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • 掌握大语言模型技术: 推理优化

    堆叠 Transformer 层来创建大型模型可以带来更好的准确性、少样本学习能力,甚至在各种语言任务上具有接近人类的涌现能力。 这些基础模型的训练成本很高,并且在推理过程中可能会占用大量内存和计算资源(经常性成本)。 当今最流行的大型语言模型 (LLM) 的参数大小可以

    2024年01月22日
    浏览(52)
  • 【深度学习之模型优化】模型剪枝、模型量化、知识蒸馏概述

            模型部署优化这个方向其实比较宽泛。从模型完成训练,到最终将模型部署到实际硬件上,整个流程中会涉及到很多不同层面的工作,每一个环节对技术点的要求也不尽相同。但本质的工作无疑是通过减小模型大小,提高推理速度等,使得模型能够成功部署在各个硬

    2024年01月23日
    浏览(52)
  • onnx模型转engine并进行推理全过程解析

    深度学习模型在训练好以后,下一步就是部署到不同的设备进行测试,不同设备之间的转换一般可以通过中间件ONNX进行转换,以达到不同平台的通用。本文以模型转为ONNX为起点,分析介绍ONNX转为TensorRT Engine并进行推理的整个流程链路。 ONNX序列化为TRT模型的整个流程可以用

    2024年02月06日
    浏览(42)
  • 2.10 Bootstrap 响应式实用工具

    Bootstrap 提供了一些辅助类,以便更快地实现对移动设备友好的开发。这些可以通过媒体查询结合大型、小型和中型设备,实现内容对设备的显示和隐藏。 需要谨慎使用这些工具,避免在同一个站点创建完全不同的版本。 响应式实用工具目前只适用于块和表切换 。 - 超小屏幕

    2024年02月15日
    浏览(43)
  • 大模型推理性能优化之KV Cache解读

    做大模型性能优化的一定对KV Cache不陌生,那么我们对这个技术了解到什么程度呢?请尝试回答如下问题: KV Cache节省了Self-Attention层中哪部分的计算? KV Cache对MLP层的计算量有影响吗? KV Cache对block间的数据传输量有影响吗?本文打算剖析该技术并给出上面问题的答案。 大模

    2024年02月16日
    浏览(164)
  • ZStack Cloud4.2.10部署手册

    1、环境准备 1.1 准备软件工具 • 系统镜像 ZStack-Cloud-x86_64-DVD-4.2.10-c76.iso • Zstack安装包 ZStack-installer-4.2.10.bin • http://www.zstack.io/product_downloads/ 请准备以上的软件,并通过 MD5 校验工具核对校验码,以确保软件完整无损。 1.2 核对硬件设备 • 以单节点服务器作为部署案例,对

    2023年04月15日
    浏览(35)
  • 2.10 PE结构:重建重定位表结构

    Relocation(重定位)是一种将程序中的一些地址修正为运行时可用的实际地址的机制。在程序编译过程中,由于程序中使用了各种全局变量和函数,这些变量和函数的地址还没有确定,因此它们的地址只能暂时使用一个相对地址。当程序被加载到内存中运行时,这些相对地址需

    2024年02月09日
    浏览(39)
  • GPT 学术优化 (ChatGPT Academic)搭建过程(含ChatGLM cuda INT4量化环境和newbing cookie)

    1、GPT Academic 项目地址:地址 安装部分 2、chatGPT API_KEY的获取可以在openai账号里找到,注册也不必多说了 配置的话,改一下config中的几个地方就可以用了,注意http和https不要打错了。 具体报错可以参考官方的配置 看到额度没有了,其实就算是配置成功了 3、chatGLM 我开始是拿

    2024年02月07日
    浏览(47)
  • 百度商业AI 技术创新大赛赛道二:AIGC推理性能优化TOP10之经验分享

    朋友们,AIGC性能优化大赛已经结束了,看新闻很多队员已经完成了答辩和领奖环节,我根据内幕人了解到,比赛的最终代码及结果是不会分享出来的,因为办比赛的目的就是吸引最优秀的代码然后给公司节省自己开发的成本,相当于外包出去了,应该是不会公开的。抱着技术

    2024年02月11日
    浏览(82)
  • Android 基础知识4-2.10 GridLayout(网格布局)详解

    一、GridLayout(网格布局)概述         GridLayout 布局是 Android 4.0 以后引入的新布局,和 TableLayout(表格布局) 有点类似,不过它功能更多,也更加好用,最大的特点是放置的组件自动占据网格的整个区域,每个组件的 大小相同 , 不能改变组件大小 ,只能改变组件之间的水平

    2024年02月16日
    浏览(38)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包