CameraCtrl、EDTalk、Sketch3D、Diffusion^2、FashionEngine

这篇具有很好参考价值的文章主要介绍了CameraCtrl、EDTalk、Sketch3D、Diffusion^2、FashionEngine。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

本文首发于公众号:机器感知

CameraCtrl、EDTalk、Sketch3D、Diffusion^2、FashionEngine,sketch,ui,stable diffusion,深度学习,人工智能,transformer,3d

CameraCtrl、EDTalk、Sketch3D、Diffusion^2、FashionEngine

NVINS: Robust Visual Inertial Navigation Fused with NeRF-augmented  Camera Pose Regressor and Uncertainty Quantification

CameraCtrl、EDTalk、Sketch3D、Diffusion^2、FashionEngine,sketch,ui,stable diffusion,深度学习,人工智能,transformer,3d

In recent years, Neural Radiance Fields (NeRF) have emerged as a powerful tool for 3D reconstruction and novel view synthesis. However, the computational cost of NeRF rendering and degradation in quality due to the presence of artifacts pose significant challenges for its application in real-time and robust robotic tasks, especially on embedded systems. This paper introduces a novel framework that integrates NeRF-derived localization information with Visual-Inertial Odometry(VIO) to provide a robust solution for robotic navigation in a real-time. By training an absolute pose regression network with augmented image data rendered from a NeRF and quantifying its uncertainty, our approach effectively counters positional drift and enhances system reliability. We also establish a mathematically sound foundation for combining visual inertial navigation with camera localization neural networks, considering uncertainty under a Bayesian framework. Experimental validation in the photore......

Is Model Collapse Inevitable? Breaking the Curse of Recursion by  Accumulating Real and Synthetic Data

CameraCtrl、EDTalk、Sketch3D、Diffusion^2、FashionEngine,sketch,ui,stable diffusion,深度学习,人工智能,transformer,3d

The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops discovered that such loops can lead to model collapse, a phenomenon where performance progressively degrades with each model-fitting iteration until the latest model becomes useless. However, several recent papers studying model collapse assumed that new data replace old data over time rather than assuming data accumulate over time. In this paper, we compare these two settings and show that accumulating data prevents model collapse. We begin by studying an analytically tractable setup in which a sequence of linear models are fit to the previous models' predictions. Previous work showed if data are replaced, the test error increases linearly with the number of model-fitting iterations; we extend this result by proving that if data instead acc......

Octopus: On-device language model for function calling of software APIs

CameraCtrl、EDTalk、Sketch3D、Diffusion^2、FashionEngine,sketch,ui,stable diffusion,深度学习,人工智能,transformer,3d

In the rapidly evolving domain of artificial intelligence, Large Language Models (LLMs) play a crucial role due to their advanced text processing and generation abilities. This study introduces a new strategy aimed at harnessing on-device LLMs in invoking software APIs. We meticulously compile a dataset derived from software API documentation and apply fine-tuning to LLMs with capacities of 2B, 3B and 7B parameters, specifically to enhance their proficiency in software API interactions. Our approach concentrates on refining the models' grasp of API structures and syntax, significantly enhancing the accuracy of API function calls. Additionally, we propose \textit{conditional masking} techniques to ensure outputs in the desired formats and reduce error rates while maintaining inference speeds. We also propose a novel benchmark designed to evaluate the effectiveness of LLMs in API interactions, establishing a foundation for subsequent research. Octopus, the fine-tuned model, is ......

EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis

CameraCtrl、EDTalk、Sketch3D、Diffusion^2、FashionEngine,sketch,ui,stable diffusion,深度学习,人工智能,transformer,3d

Achieving disentangled control over multiple facial motions and accommodating diverse input modalities greatly enhances the application and entertainment of the talking head generation. This necessitates a deep exploration of the decoupling space for facial features, ensuring that they a) operate independently without mutual interference and b) can be preserved to share with different modal input, both aspects often neglected in existing methods. To address this gap, this paper proposes a novel Efficient Disentanglement framework for Talking head generation (EDTalk). Our framework enables individual manipulation of mouth shape, head pose, and emotional expression, conditioned on video or audio inputs. Specifically, we employ three lightweight modules to decompose the facial dynamics into three distinct latent spaces representing mouth, pose, and expression, respectively. Each space is characterized by a set of learnable bases whose linear combinations define specific motions.......

FashionEngine: Interactive Generation and Editing of 3D Clothed Humans

CameraCtrl、EDTalk、Sketch3D、Diffusion^2、FashionEngine,sketch,ui,stable diffusion,深度学习,人工智能,transformer,3d

We present FashionEngine, an interactive 3D human generation and editing system that allows us to design 3D digital humans in a way that aligns with how humans interact with the world, such as natural languages, visual perceptions, and hand-drawing. FashionEngine automates the 3D human production with three key components: 1) A pre-trained 3D human diffusion model that learns to model 3D humans in a semantic UV latent space from 2D image training data, which provides strong priors for diverse generation and editing tasks. 2) Multimodality-UV Space encoding the texture appearance, shape topology, and textual semantics of human clothing in a canonical UV-aligned space, which faithfully aligns the user multimodal inputs with the implicit UV latent space for controllable 3D human editing. The multimodality-UV space is shared across different user inputs, such as texts, images, and sketches, which enables various joint multimodal editing tasks. 3) Multimodality-UV Aligned Sampler ......

GEARS: Local Geometry-aware Hand-object Interaction Synthesis

CameraCtrl、EDTalk、Sketch3D、Diffusion^2、FashionEngine,sketch,ui,stable diffusion,深度学习,人工智能,transformer,3d

Generating realistic hand motion sequences in interaction with objects has gained increasing attention with the growing interest in digital humans. Prior work has illustrated the effectiveness of employing occupancy-based or distance-based virtual sensors to extract hand-object interaction features. Nonetheless, these methods show limited generalizability across object categories, shapes and sizes. We hypothesize that this is due to two reasons: 1) the limited expressiveness of employed virtual sensors, and 2) scarcity of available training data. To tackle this challenge, we introduce a novel joint-centered sensor designed to reason about local object geometry near potential interaction regions. The sensor queries for object surface points in the neighbourhood of each hand joint. As an important step towards mitigating the learning complexity, we transform the points from global frame to hand template frame and use a shared module to process sensor features of each individual......

Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation

CameraCtrl、EDTalk、Sketch3D、Diffusion^2、FashionEngine,sketch,ui,stable diffusion,深度学习,人工智能,transformer,3d

Recently, image-to-3D approaches have achieved significant results with a natural image as input. However, it is not always possible to access these enriched color input samples in practical applications, where only sketches are available. Existing sketch-to-3D researches suffer from limitations in broad applications due to the challenges of lacking color information and multi-view content. To overcome them, this paper proposes a novel generation paradigm Sketch3D to generate realistic 3D assets with shape aligned with the input sketch and color matching the textual description. Concretely, Sketch3D first instantiates the given sketch in the reference image through the shape-preserving generation process. Second, the reference image is leveraged to deduce a coarse 3D Gaussian prior, and multi-view style-consistent guidance images are generated based on the renderings of the 3D Gaussians. Finally, three strategies are designed to optimize 3D Gaussians, i.e., structural optimiz......

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

CameraCtrl、EDTalk、Sketch3D、Diffusion^2、FashionEngine,sketch,ui,stable diffusion,深度学习,人工智能,transformer,3d

Co-speech gestures, if presented in the lively form of videos, can achieve superior visual effects in human-machine interaction. While previous works mostly generate structural human skeletons, resulting in the omission of appearance information, we focus on the direct generation of audio-driven co-speech gesture videos in this work. There are two main challenges: 1) A suitable motion feature is needed to describe complex human movements with crucial appearance information. 2) Gestures and speech exhibit inherent dependencies and should be temporally aligned even of arbitrary length. To solve these problems, we present a novel motion-decoupled framework to generate co-speech gesture videos. Specifically, we first introduce a well-designed nonlinear TPS transformation to obtain latent motion features preserving essential appearance information. Then a transformer-based diffusion model is proposed to learn the temporal correlation between gestures and speech, and performs gener......

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

CameraCtrl、EDTalk、Sketch3D、Diffusion^2、FashionEngine,sketch,ui,stable diffusion,深度学习,人工智能,transformer,3d

Controllability plays a crucial role in video generation since it allows users to create desired content. However, existing models largely overlooked the precise control of camera pose that serves as a cinematic language to express deeper narrative nuances. To alleviate this issue, we introduce CameraCtrl, enabling accurate camera pose control for text-to-video(T2V) models. After precisely parameterizing the camera trajectory, a plug-and-play camera module is then trained on a T2V model, leaving others untouched. Additionally, a comprehensive study on the effect of various datasets is also conducted, suggesting that videos with diverse camera distribution and similar appearances indeed enhance controllability and generalization. Experimental results demonstrate the effectiveness of CameraCtrl in achieving precise and domain-adaptive camera control, marking a step forward in the pursuit of dynamic and customized video storytelling from textual and camera pose inputs. Our proje......

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of  Orthogonal Diffusion Models

CameraCtrl、EDTalk、Sketch3D、Diffusion^2、FashionEngine,sketch,ui,stable diffusion,深度学习,人工智能,transformer,3d

Recent advancements in 3D generation are predominantly propelled by improvements in 3D-aware image diffusion models which are pretrained on Internet-scale image data and fine-tuned on massive 3D data, offering the capability of producing highly consistent multi-view images. However, due to the scarcity of synchronized multi-view video data, it is impractical to adapt this paradigm to 4D generation directly. Despite that, the available video and 3D data are adequate for training video and multi-view diffusion models that can provide satisfactory dynamic and geometric priors respectively. In this paper, we present Diffusion$^2$, a novel framework for dynamic 3D content creation that leverages the knowledge about geometric consistency and temporal smoothness from these models to directly sample dense multi-view and multi-frame images which can be employed to optimize continuous 4D representation. Specifically, we design a simple yet effective denoising strategy via score composi......文章来源地址https://www.toymoban.com/news/detail-855116.html

到了这里,关于CameraCtrl、EDTalk、Sketch3D、Diffusion^2、FashionEngine的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • 矢量绘图设计 -- Sketch Mac

    Sketch是一款矢量绘图工具,专为设计简洁直观、高效且易上手的界面而打造。其核心功能包括矢量绘图、组件化设计、插件生态系统以及实时协作。Sketch文件可在Mac电脑上打开,但其他系统无法体验。尽管Sketch在UI设计和矢量图形设计方面表现出色,但仍存在一些局限性,如

    2024年01月24日
    浏览(38)
  • Mac矢量绘图工具 Sketch

    Sketch是一款适用于 UI/UX 设计、网页设计、图标制作等领域的矢量绘图软件, 其主要特点如下: 1. 简单易用的界面设计:Sketch 的用户界面简洁明了,使得用户可以轻松上手操作,不需要复杂的学习过程。 2. 强大的矢量绘图功能:Sketch 提供了丰富的矢量绘图工具和特性,如形

    2024年02月13日
    浏览(38)
  • Stable Diffusion|线稿上色/图片转3D模型

    今天分享一个线稿上色的小教程。最近我一直在尝试在完成线稿上色后,如何使用同一张已经上色的图片生成多种姿势。虽然我原本打算在本文中分享这个方法,但是最终效果并不理想,所以后续如果有更好的解决方案,再跟大家分享。如果各位有好的解决方法,欢迎交流,

    2024年04月27日
    浏览(35)
  • Stable Diffusion 用2D图片制作3D动态壁纸

    如果想让我们的2D图片动起来可以使用 stable-diffusion-webui-depthmap-script 插件在SD中进行加工让图片动起来。 这是一个可以从单个图像创建深度图,现在也可以生成3D立体图像对的插件,无论是并排还是浮雕。生成的结果可在3D或全息设备(如VR耳机或Looking Glass显示器)上查看,也

    2024年02月16日
    浏览(47)
  • 掌握Sketch:软件介绍与实用技巧分享

    Sketch是最好的UI软件之一。它可以快速交互迭代,每个页面之间的小部件可以直接复制粘贴并修改。在整体架构布局中,可以直接下载很多Mocaup模板,所以非常快。这个工具完全是为应用程序设计的,比PS好得多。 如果你不知道sketch软件是做什么的?作为一款专门为Macios开发

    2024年01月16日
    浏览(55)
  • 基于Sketch Up软件校园建模案例分享

                    由衷感谢覃婉柔、赵泽昊同学在本次课程实习中做出的巨大贡献,感谢本团队成员一起努力奋斗的岁月。         中国地质大学(武汉)未来城图书馆介绍 图书馆位于中国地质大学(武汉)未来城新校区核心位置,南北向入口空间主轴和东西向景 观

    2023年04月19日
    浏览(100)
  • 使用AI制作 3d 模型初学者指南,如何在 Blender 3d 中使用stable diffusion

    安装 Stability for Blender 只需这些简单的步骤即可快速简便: 在这里,前往Addon Releases页面,然后单击“stability-blender-addon”链接下载 ZIP 文件(而不是源代码链接) 或者,您可以从我们的 Blender Market 页面免费下载最新版本。

    2024年02月03日
    浏览(101)
  • 3D 生成重建004-DreamFusion and SJC :TEXT-TO-3D USING 2D DIFFUSION

    3D 生成重建004-DreamFusion and SJC :TEXT-TO-3D USING 2D DIFFUSION 0 论文工作 对于生成任务,我们是需要有一个数据样本,让模型去学习数据分布 p ( x ) p(x) p ( x ) ,但是对于3d的生成来说,有两个挑战:1)一个完善的很大的3d数据数据集,对比2d的扩散模型是一个几亿的图像文本对上训

    2024年02月07日
    浏览(39)
  • 十五)Stable Diffusion使用教程:另一个线稿出3D例子

    案例:黄金首饰出图 1)线稿,可以进行色阶加深,不易丢失细节; 2)文生图,精确材质、光泽、工艺(抛光、拉丝等)、形状(包括深度等,比如镂空)和渲染方式(3D、素描、线稿等)提示词,负面提示词; 3)seed调-1,让ai随机出图; 4)开启controlnet,上传线稿图,选择

    2024年02月07日
    浏览(43)
  • Stable Diffusion AI绘画系列【25】:3D可爱风格系列图片

    《博主简介》 小伙伴们好,我是阿旭。专注于人工智能、AIGC、python、计算机视觉相关分享研究。 ✌ 更多学习资源,可关注公-仲-hao:【阿旭算法与机器学习】,共同学习交流~ 👍 感谢小伙伴们点赞、关注! 《------往期经典推荐------》 一、AI应用软件开发实战专栏【链接】

    2024年02月03日
    浏览(62)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包