[读论文][backbone]Knowledge Diffusion for Distillation

10月前作者：计算机视觉-Archer 分类：Toy博客阅读(55) 违法举报

这篇具有很好参考价值的文章主要介绍了[读论文][backbone]Knowledge Diffusion for Distillation。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

DiffKD

摘要文章来源地址https://www.toymoban.com/news/detail-481223.html

The representation gap between teacher and student is an emerging topic in knowledge distillation (KD).
To reduce the gap and improve the performance, current methods often resort to complicated training schemes, loss functions, and feature alignments, which are task-specific and feature-specific.
In this paper, we state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature, and propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
Our approach is based on the observation that student features typically contain more noises than teacher features due to the smaller capacity of student model.
To address this, we propose to denoise student features using a diffusion model trained by teacher features.
This allows us to perform better distillation betwe

到了这里，关于[读论文][backbone]Knowledge Diffusion for Distillation的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：如若内容造成侵权/违法违规/事实不符，请点击违法举报进行投诉反馈，一经查实，立即删除！

分享到：

领支付宝红包赞助服务器费用

知识蒸馏(Knowledge Distillation)

论文：[1503.02531] Distilling the Knowledge in a Neural Network (arxiv.org) 知识蒸馏是一种模型压缩方法，是一种基于“教师-学生网络思想”的训练方式，由于其简单，有效，并且已经在工业界被广泛应用。知识蒸馏使用的是Teacher—Student模型，其中teacher是“知识”的输出者，student是“

2024年02月06日
浏览(43)
知识蒸馏（Knowledge Distillation）

知识蒸馏的概念由Hinton在Distilling the Knowledge in a Neural Network中提出，目的是把一个大模型或者多个模型集成学到的知识迁移到另一个轻量级模型上。 Knowledge Distillation，简称KD，顾名思义，就是将已经训练好的模型包含的知识(Knowledge)，蒸馏(Distill)提取到另一个模型里面去。

2024年02月03日
浏览(32)
【知识蒸馏】知识蒸馏（Knowledge Distillation）技术详解

参考论文：Knowledge Distillation: A Survey 近年来，深度学习在学术界和工业界取得了巨大的成功，根本原因在于其可拓展性和编码大规模数据的能力。但是，深度学习的主要挑战在于，受限制于资源容量，深度神经模型很难部署在资源受限制的设备上。如嵌入式设备

2024年01月23日
浏览(40)
一文搞懂【知识蒸馏】【Knowledge Distillation】算法原理

知识蒸馏就是把一个大的教师模型的知识萃取出来，把他浓缩到一个小的学生模型，可以理解为一个大的教师神经网络把他的知识教给小的学生网络，这里有一个知识的迁移过程，从教师网络迁移到了学生网络身上，教师网络一般是比较臃肿，所以教师网络把知识教给学生网

2024年02月04日
浏览(43)
【论文笔记】SDCL: Self-Distillation Contrastive Learning for Chinese Spell Checking

论文地址：https://arxiv.org/pdf/2210.17168.pdf 论文提出了一种token-level的自蒸馏对比学习(self-distillation contrastive learning)方法。传统方法使用BERT后，会对confusion chars进行聚类，但使用作者提出的方法，会让其变得分布更均匀。 confusion chars: 指的应该是易出错的字。作者提取特征的方

2024年02月02日
浏览(59)
通俗易懂的知识蒸馏 Knowledge Distillation（下）——代码实践（附详细注释）

第一步：导入所需要的包第二步：定义教师模型教师模型网络结构（此处仅举一个例子）：卷积层-卷积层-dropout-dropout-全连接层-全连接层第三步：定义训练教师模型方法正常的定义一个神经网络模型第四步：定义教师模型测试方法正常的定义一个神经网络模型第五步：

2024年02月12日
浏览(41)
深度学习概念（术语）：Fine-tuning、Knowledge Distillation, etc

这里的相关概念都是基于已有预训练模型，就是模型本身已经训练好，有一定泛化能力。需要“再加工”满足别的任务需求。进入后GPT时代，对模型的Fine-tuning也将成为趋势，借此机会，我来科普下相关概念。有些人认为微调和训练没有区别，都是训练模型，但是微调是在原

2024年02月09日
浏览(42)
论文阅读：Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data

目录摘要 Motivation 整体架构流程技术细节雷达和图像数据的同步小结论文地址: [2203.16258] Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data (arxiv.org) 论文代码： GitHub - valeoai/SLidR: Official PyTorch implementation of \\\"Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data\\\"

2024年02月08日
浏览(51)
【论文笔记】KDD2019 | KGAT: Knowledge Graph Attention Network for Recommendation

为了更好的推荐，不仅要对user-item交互进行建模，还要将关系信息考虑进来传统方法因子分解机将每个交互都当作一个独立的实例，但是忽略了item之间的关系（eg：一部电影的导演也是另一部电影的演员）高阶关系：用一个/多个链接属性连接两个item KG+user-item graph+high orde

2024年02月16日
浏览(40)
【论文笔记】Knowledge Is Flat: A Seq2Seq Generative Framework for Various Knowledge Graph Completion

arxiv时间: September 15, 2022 作者单位i: 南洋理工大学来源: COLING 2022 模型名称: KG-S2S 论文链接: https://arxiv.org/abs/2209.07299 项目链接: https://github.com/chenchens190009/KG-S2S 以往的研究通常将 KGC 模型与特定的图结构紧密结合，这不可避免地会导致两个缺点特定结构的 KGC 模型互不兼容现

2024年01月19日
浏览(36)