LLM量化、高保真图生视频、多模态肢体运动生成、高分辨率图像合成、低光图像/视频增强、相机相对姿态估计

这篇具有很好参考价值的文章主要介绍了LLM量化、高保真图生视频、多模态肢体运动生成、高分辨率图像合成、低光图像/视频增强、相机相对姿态估计。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

本文首发于公众号:机器感知

LLM量化、高保真图生视频、多模态肢体运动生成、高分辨率图像合成、低光图像/视频增强、相机相对姿态估计

EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs

LLM量化、高保真图生视频、多模态肢体运动生成、高分辨率图像合成、低光图像/视频增强、相机相对姿态估计,人工智能,深度学习,transformer,stable diffusion,机器学习

Large language models (LLMs) have proven to be very superior to conventional methods in various tasks. However, their expensive computations and high memory requirements are prohibitive for deployment. Model quantization is an effective method for reducing this overhead. The problem is that in most previous works, the quantized model was calibrated using few samples from the training data, which might affect the generalization of the quantized LLMs to unknown cases and tasks. Hence in this work, we explore an important question: Can we design a data-independent quantization method for LLMs to guarantee its generalization performance? In this work, we propose EasyQuant, a training-free and data-independent weight-only quantization algorithm for LLMs. Our observation indicates that two factors: outliers in the weight and quantization ranges, are essential for reducing the quantization error. Therefore, in EasyQuant, we leave the outliers (less than 1%) unchanged and optimize the quantization range to reduce the reconstruction error. With these methods, we surprisingly find that EasyQuant achieves comparable performance to the original model. Since EasyQuant does not depend on any training data, the generalization performance of quantized LLMs is safely guaranteed. Moreover, EasyQuant can be implemented in parallel so that the quantized model could be attained in a few minutes even for LLMs over 100B. To our best knowledge, we are the first work that achieves almost lossless quantization performance for LLMs under a data-independent setting and our algorithm runs over 10 times faster than the data-dependent methods.

Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation

LLM量化、高保真图生视频、多模态肢体运动生成、高分辨率图像合成、低光图像/视频增强、相机相对姿态估计,人工智能,深度学习,transformer,stable diffusion,机器学习

Image-to-video (I2V) generation tasks always suffer from keeping high fidelity in the open domains. Traditional image animation techniques primarily focus on specific domains such as faces or human poses, making them difficult to generalize to open domains. Several recent I2V frameworks based on diffusion models can generate dynamic content for open domain images but fail to maintain fidelity. We found that two main factors of low fidelity are the loss of image details and the noise prediction biases during the denoising process. To this end, we propose an effective method that can be applied to mainstream video diffusion models. This method achieves high fidelity based on supplementing more precise image information and noise rectification. Specifically, given a specified image, our method first adds noise to the input image latent to keep more details, then denoises the noisy latent with proper rectification to alleviate the noise prediction biases. Our method is tuning-free and plug-and-play. The experimental results demonstrate the effectiveness of our approach in improving the fidelity of generated videos. For more image-to-video generated results, please refer to the project website: https://noise-rectification.github.io.

Zero-LED: Zero-Reference Lighting Estimation Diffusion Model for  Low-Light Image Enhancement

LLM量化、高保真图生视频、多模态肢体运动生成、高分辨率图像合成、低光图像/视频增强、相机相对姿态估计,人工智能,深度学习,transformer,stable diffusion,机器学习

Diffusion model-based low-light image enhancement methods rely heavily on paired training data, leading to limited extensive application. Meanwhile, existing unsupervised methods lack effective bridging capabilities for unknown degradation. To address these limitations, we propose a novel zero-reference lighting estimation diffusion model for low-light image enhancement called Zero-LED. It utilizes the stable convergence ability of diffusion models to bridge the gap between low-light domains and real normal-light domains and successfully alleviates the dependence on pairwise training data via zero-reference learning. Specifically, we first design the initial optimization network to preprocess the input image and implement bidirectional constraints between the diffusion model and the initial optimization network through multiple objective functions. Subsequently, the degradation factors of the real-world scene are optimized iteratively to achieve effective light enhancement. In addition, we explore a frequency-domain based and semantically guided appearance reconstruction module that encourages feature alignment of the recovered image at a fine-grained level and satisfies subjective expectations. Finally, extensive experiments demonstrate the superiority of our approach to other state-of-the-art methods and more significant generalization capabilities. We will open the source code upon acceptance of the paper.

MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model

LLM量化、高保真图生视频、多模态肢体运动生成、高分辨率图像合成、低光图像/视频增强、相机相对姿态估计,人工智能,深度学习,transformer,stable diffusion,机器学习

The body movements accompanying speech aid speakers in expressing their ideas. Co-speech motion generation is one of the important approaches for synthesizing realistic avatars. Due to the intricate correspondence between speech and motion, generating realistic and diverse motion is a challenging task. In this paper, we propose MMoFusion, a Multi-modal co-speech Motion generation framework based on the diffusion model to ensure both the authenticity and diversity of generated motion. We propose a progressive fusion strategy to enhance the interaction of inter-modal and intra-modal, efficiently integrating multi-modal information. Specifically, we employ a masked style matrix based on emotion and identity information to control the generation of different motion styles. Temporal modeling of speech and motion is partitioned into style-guided specific feature encoding and shared feature encoding, aiming to learn both inter-modal and intra-modal features. Besides, we propose a geometric loss to enforce the joints' velocity and acceleration coherence among frames. Our framework generates vivid, diverse, and style-controllable motion of arbitrary length through inputting speech and editing identity and emotion. Extensive experiments demonstrate that our method outperforms current co-speech motion generation methods including upper body and challenging full body.

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

LLM量化、高保真图生视频、多模态肢体运动生成、高分辨率图像合成、低光图像/视频增强、相机相对姿态估计,人工智能,深度学习,transformer,stable diffusion,机器学习

Diffusion models create data from noise by inverting the forward paths of data towards noise and have emerged as a powerful generative modeling technique for high-dimensional, perceptual data such as images and videos. Rectified flow is a recent generative model formulation that connects data and noise in a straight line. Despite its better theoretical properties and conceptual simplicity, it is not yet decisively established as standard practice. In this work, we improve existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales. Through a large-scale study, we demonstrate the superior performance of this approach compared to established diffusion formulations for high-resolution text-to-image synthesis. Additionally, we present a novel transformer-based architecture for text-to-image generation that uses separate weights for the two modalities and enables a bidirectional flow of information between image and text tokens, improving text comprehension, typography, and human preference ratings. We demonstrate that this architecture follows predictable scaling trends and correlates lower validation loss to improved text-to-image synthesis as measured by various metrics and human evaluations. Our largest models outperform state-of-the-art models, and we will make our experimental data, code, and model weights publicly available.

FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation

LLM量化、高保真图生视频、多模态肢体运动生成、高分辨率图像合成、低光图像/视频增强、相机相对姿态估计,人工智能,深度学习,transformer,stable diffusion,机器学习

Estimating relative camera poses between images has been a central problem in computer vision. Methods that find correspondences and solve for the fundamental matrix offer high precision in most cases. Conversely, methods predicting pose directly using neural networks are more robust to limited overlap and can infer absolute translation scale, but at the expense of reduced precision. We show how to combine the best of both methods; our approach yields results that are both precise and robust, while also accurately inferring translation scales. At the heart of our model lies a Transformer that (1) learns to balance between solved and learned pose estimations, and (2) provides a prior to guide a solver. A comprehensive analysis supports our design choices and demonstrates that our method adapts flexibly to various feature extractors and correspondence estimators, showing state-of-the-art performance in 6DoF pose estimation on Matterport3D, InteriorNet, StreetLearn, and Map-free Relocalization.

A Spatio-temporal Aligned SUNet Model for Low-light Video Enhancement

LLM量化、高保真图生视频、多模态肢体运动生成、高分辨率图像合成、低光图像/视频增强、相机相对姿态估计,人工智能,深度学习,transformer,stable diffusion,机器学习

Distortions caused by low-light conditions are not only visually unpleasant but also degrade the performance of computer vision tasks. The restoration and enhancement have proven to be highly beneficial. However, there are only a limited number of enhancement methods explicitly designed for videos acquired in low-light conditions. We propose a Spatio-Temporal Aligned SUNet (STA-SUNet) model using a Swin Transformer as a backbone to capture low light video features and exploit their spatio-temporal correlations. The STA-SUNet model is trained on a novel, fully registered dataset (BVI), which comprises dynamic scenes captured under varying light conditions. It is further analysed comparatively against various other models over three test datasets. The model demonstrates superior adaptivity across all datasets, obtaining the highest PSNR and SSIM values. It is particularly effective in extreme low-light conditions, yielding fairly good visualisation results.

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

LLM量化、高保真图生视频、多模态肢体运动生成、高分辨率图像合成、低光图像/视频增强、相机相对姿态估计,人工智能,深度学习,transformer,stable diffusion,机器学习

While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing different attributes and generate them individually. Motivated by it, we propose NaturalSpeech 3, a TTS system with novel factorized diffusion models to generate natural speech in a zero-shot way. Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt. With this factorization design, NaturalSpeech 3 can effectively and efficiently model the intricate speech with disentangled subspaces in a divide-and-conquer way. Experiments show that NaturalSpeech 3 outperforms the state-of-the-art TTS systems on quality, similarity, prosody, and intelligibility. Furthermore, we achieve better performance by scaling to 1B parameters and 200K hours of training data.文章来源地址https://www.toymoban.com/news/detail-838998.html

到了这里,关于LLM量化、高保真图生视频、多模态肢体运动生成、高分辨率图像合成、低光图像/视频增强、相机相对姿态估计的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • 【论文阅读】Neuralangelo:高保真神经表面重建

    paper project 神经表面重建已被证明对于通过基于图像的神经渲染恢复密集的 3D 表面非常有效。然而,当前的方法难以恢复真实场景的详细结构。为了解决这个问题,我们提出了 Neuralangelo,它将多分辨率 3D 哈希网格的表示能力与神经表面渲染相结合。两个关键因素使我们的方

    2024年02月11日
    浏览(26)
  • 【Axure高保真原型】多图表动态切换

    今天和大家分享多图表动态切换的原型模板,点击不同的图标可以动态切换对应的表,包括柱状图、条形图、饼图、环形图、折线图、曲线图、面积图、阶梯图、雷达图;而且图表数据可以在左侧表格中动态维护,包括增加修改和删除,维护表格信息后对应图表也会动态更新

    2024年02月10日
    浏览(37)
  • 【Axure高保真原型】树控制内联框架

    今天和大家分享树控制内联框架的原型模板,点击树的箭头可以打开或者收起子节点,点击最后一级人物节点,可以切换右侧内联框到对应的页面,左侧的树是通过中继器制作的,使用简单,只需要按要求填写中继器表格即可,案例中最高6级树,具体效果可以观看下方视频或

    2024年02月01日
    浏览(43)
  • 【Axure高保真原型】日期时间选择器

    今天和大家分享日期时间下拉列表选择器的原型模板,该模板用中继器结合时间函数制作,所以可以获取真实的日历效果,具体包括哪一年二月份有29天,几号对应星期几,都是真实的。这个原型用Axure原生元件组成,所以样式以及后续的交互都可以根据需要修改 【原型预览

    2024年02月12日
    浏览(54)
  • 【Axure高保真原型】物理架构图模板

    今天和粉丝们免费分享物理架构图模板的原型模板~~~ 物理架构图是指在计算机系统、网络、软件应用等领域中,用于表示物理组件、设备、连接方式以及它们之间关系的图示。它以图形化的方式展示了系统的实际物理结构,常见的物理架构图元素包括: 1、服务器:表示物理

    2024年02月13日
    浏览(39)
  • 【计算机视觉|生成对抗】用于高保真自然图像合成的大规模GAN训练用于高保真自然图像合成的大规模GAN训练(BigGAN)

    本系列博文为深度学习/计算机视觉论文笔记,转载请注明出处 标题: Large Scale GAN Training for High Fidelity Natural Image Synthesis 链接:[1809.11096] Large Scale GAN Training for High Fidelity Natural Image Synthesis (arxiv.org) 尽管在生成图像建模方面取得了近期的进展,但成功地从诸如ImageNet之类的复

    2024年02月11日
    浏览(45)
  • 【Axure高保真原型】卡片_拖动摆放换位效果

    今天和大家分享卡片_拖动摆放换位效果的原型模板,可以通过鼠标拖动任意卡片,对应卡片可以跟随鼠标移动,其他卡片会自动让出位置,松开鼠标后全部卡片自动对齐摆放。那这个原型模板是用中继器制作的,所以使用也很简单,只需要维护中继器表格的内容即可,具体效

    2024年01月24日
    浏览(41)
  • 【计算机视觉|生成对抗】用于高保真自然图像合成的大规模GAN训练(BigGAN)用于高保真自然图像合成的大规模GAN训练(BigGAN)

    本系列博文为深度学习/计算机视觉论文笔记,转载请注明出处 标题: Large Scale GAN Training for High Fidelity Natural Image Synthesis 链接:[1809.11096] Large Scale GAN Training for High Fidelity Natural Image Synthesis (arxiv.org) 尽管在生成图像建模方面取得了近期的进展,但成功地从诸如ImageNet之类的复

    2024年02月11日
    浏览(33)
  • 【Axure高保真原型】中继器表格合并单元格

    今天和大家分享合并单元格的原型模板,包括两种模式的合并方式,转置和从下往上合并,两种方式都可以实现合并的效果,都是用中继器制作的,维护时只需要修改中继器表格即可生成对应效果,如果需要增加列(转置表格对应的是行),也可以参考原型,自行增加元件和

    2024年02月09日
    浏览(37)
  • 【Axure高保真原型】中继器网格图片拖动摆放

    今天和大家分享中继器网格图片拖动摆放的原型模板,我们可以通过鼠标拖动来移动图片,拖动过程其他图标会根据图片拖动自动排列,松开鼠标是图片停放在指定位置,其他图标自动排列。那这个模板是用中继器制作的,所以使用也很方便,我们只需维护中继器表格的信息

    2024年02月10日
    浏览(42)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包