【读论文】3D Gaussian Splatting for Real-Time Radiance Field Rendering

这篇具有很好参考价值的文章主要介绍了【读论文】3D Gaussian Splatting for Real-Time Radiance Field Rendering。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

1. What:

What kind of thing is this article going to do (from the abstract and conclusion, try to summarize it in one sentence)

To simultaneously satisfy the requirements of efficiency and quality, this article begins by establishing a foundation with sparse points using 3D Gaussian distributions to preserve desirable space. It then progresses to optimizing anisotropic covariance to achieve an accurate representation. Lastly, it introduces a cutting-edge, visibility-aware rendering algorithm designed for rapid processing, thereby achieving state-of-the-art results in the field.

2. Why:

Under what conditions or needs this research plan was proposed (Intro), what problems/deficiencies should be solved at the core, what others have done, and what are the innovation points? (From Introduction and related work)

Maybe contain Background, Question, Others, Innovation:

Three aspects of related work can explain this question.

  1. Traditional reconstructions such as SfM and MVS need to re-project and
    blend the input images into the novel view camera, and use the
    geometry to guide this re-projection(From 2D to 3D).

    Sad: Cannot completely recover from unreconstructed regions, or from “over-reconstruction”, when MVS generates inexistent geometry.

  2. Neural Rendering and Radiance Fields

    Neural rendering represents a broader category of techniques that leverage deep learning for image synthesis, while radiance field is a specific technique within neural rendering focused on the scene representation of light and color in 3D spaces.

  • Deep Learning was mainly used on MVS-based geometry before, which is also its major drawback.

  • Nerf is along the way of volumetric representation, which introduced positional encoding and importance sampling.

  • Faster training methods focus on the use of spatial data structures to store (neural) features that are subsequently interpolated during volumetric ray-marching, different encodings, and MLP capacity.

  • Today, notable works include InstantNGP and Plenoxels both rely on Spherical Harmonics.

    Understand Spherical Harmonics as a set of basic functions to fit a geometry in a 3D spherical coordinate system.

    球谐函数介绍(Spherical Harmonics) - 知乎 (zhihu.com)

  1. Point-Based Rendering and Radiance Fields
  • The methods in human performance capture inspired the choice of 3D Gaussians as scene representation.
  • Point-based and spherical rendering is achieved before.

3. How:

【读论文】3D Gaussian Splatting for Real-Time Radiance Field Rendering,读论文NeRF,NeRF学习,NeRF

Through the Gradient Flow in this paper’s pipeline, we are trying to connect Part4, 5, and 6 in this paper.

Firstly, start from the loss function, which is combined by a L 1 {\mathcal L}_{1} L1 loss and a S S I M SSIM SSIM index, just as shown below:

L = ( 1 − λ ) L 1 + λ L D − S S I M . (1) {\mathcal L}=(1-\lambda){\mathcal L}_{1}+\lambda{\mathcal L}_{\mathrm{D-SSIM}}.\tag{1} L=(1λ)L1+λLDSSIM.(1)

It found a relation between the actual image and the rendering image. So to finish the optimization, we need to dive into the process of rendering. From the chapter on related work, we know Point-based α \alpha α-blending and NeRF-style volumetric rendering share essentially the same image formation model. That is

C = ∑ i = 1 N T i ( 1 − exp ⁡ ( − σ i δ i ) ) c i w i t h T i = exp ⁡ ( − ∑ j = 1 i − 1 σ j δ j ) . (2) C=\sum_{i=1}^{N}T_{i}(1-\exp(-\sigma_{i}\delta_{i}))c_{i}\quad\mathrm{with}\quad T_{i}=\exp\left(-\sum_{j=1}^{i-1}\sigma_{j}\delta_{j}\right).\tag{2} C=i=1NTi(1exp(σiδi))ciwithTi=exp(j=1i1σjδj).(2)

And this paper actually uses a typical neural point-based approach just like (2), which can be represented as:

C = ∑ i ∈ N c i α i ∏ j = 1 i − 1 ( 1 − α j ) (3) C=\sum_{i\in N}c_{i}\alpha_{i}\prod_{j=1}^{i-1}(1-\alpha_{j}) \tag{3} C=iNciαij=1i1(1αj)(3)

From this formulation, we can know what the representation of volume should contain the information of color c c c and transparency α \alpha α. These are attached to the gaussian, where Spherical Harmonics was used to represent color, just like Plenoxels. The other attributes used are the position and covariance matrix. So, now we have introduced the four attributes to represent the scene, that is positions 𝑝, 𝛼, covariance Σ, and SH coefficients representing color 𝑐 of each Gaussian.
After knowing the basic elements we need to use, now let’s work backward, starting with rendering, which was addressed in the author’s previous paper.

3.1 Real-time rendering

This method is independent of the propagation of gradients but is critical for real-time performance, which was published in the author’s paper before.
【读论文】3D Gaussian Splatting for Real-Time Radiance Field Rendering,读论文NeRF,NeRF学习,NeRF

In the previous game, someone had tried to model the world in ellipsoid and render it. This is the same as the render process of Gaussian splatting. But the latter uses lots of techniques in the utilization of threads and GPU.

  • Firstly, it starts by splitting the screen into 16×16 tiles and then proceeds to cull 3D Gaussians against the view frustum and each tile, only keeping Gaussians with a 99% confidence interval intersecting the view frustum.
  • Then instantiate each Gaussian according to the number of tiles they overlap and assign each instance a key that combines view space depth and tile ID.
  • Then sort Gaussians based on these keys using a single fast GPU Radix sort.
  • Finally, launching one thread block for each tile, for a given pixel, accumulate color and transparency values by traversing the lists front-to-back, until α \alpha α goes to one.

3.2 Adaptive Control of Gaussians

In the process of fitting gaussian to the scene, we should utilize the number and volume of gaussian to strengthen the representation of the scene. It contained two methods named clone and split, as shown below.

【读论文】3D Gaussian Splatting for Real-Time Radiance Field Rendering,读论文NeRF,NeRF学习,NeRF

These were judged by the view-space positional gradients. Both under-reconstruction and over-construction have large view-space positional gradients. We will clone or split the gaussian according to different conditions.

3.3 Differentiable 3D Gaussian splatting

We have known the process of rendering and control of gaussian. Finally, we will talk about how to backward the gradients to where we can optimize. This is mainly about the processing of Gaussian function.

The basic simplified formulation of 3D Gaussain can be represented as:

G ( x ) = e − 1 2 ( x ) T Σ − 1 ( x ) . (4) G(x)=e^{-\frac{1}{2}(x)^{T}\Sigma^{-1}(x)}.\tag{4} G(x)=e21(x)TΣ1(x).(4)

We will use α \alpha α-blending to combine it to generate the rendering picture, so that we can calculate the loss function and finish the optimization. So now we need to know how to optimize and calculate the gradients of Gaussian.

When rasterizing, the three-dimensional scene needs to be transformed into a two-dimensional space. The author hopes that the 3D Gaussian will maintain its distribution during the transformation (otherwise, if the raster finish has nothing to do with Gaussian, all the efforts will be in vain). So we should choose a method to transfer the covariance matrix to camera coordinate without change the affine relation. That is

Σ ′ = J W Σ W T J T , (5) \Sigma'=JW\Sigma W^{T}J^{T},\tag{5} Σ=JWΣWTJT,(5)

where J J J is the Jacobian of the affine approximation of the projective transformation.

Another problem is that the covariance matrix must be semi-definite. So we use a scaling matrix 𝑆 and rotation matrix 𝑅 to assure it. That is

Σ = R S S T R T (6) \Sigma=RSS^{T}R^{T}\tag{6} Σ=RSSTRT(6)

And then we can use a 3D vector 𝑠 for scaling and a quaternion 𝑞 to represent rotation. The gradients will backward to them. These are the whole process of optimization.

4. Self-thoughts

  1. Summary of different representation
  • Explicit representation: Mesh, Point Cloud
  • Implicit representation
    • Volumetric representation: Nerf

      The density value returned by the sample points reflects whether there is geometric occupancy here.

    • Surface representation: SDF(Signed Distance Function)

      Outputs the distance to the nearest surface in the space from this point, where a positive value indicates outside the surface, and a negative value indicates inside the surface.

Refer:

[1]: 3D Gaussian Splatting:用于实时的辐射场渲染-CSDN博客

[2]: 【三维重建】3D Gaussian Splatting:实时的神经场渲染-CSDN博客

[3]: 3D Gaussian Splatting中的数学推导 - 知乎 (zhihu.com)

[4]: [NeRF坑浮沉记]3D Gaussian Splatting入门:如何表达几何 - 知乎 (zhihu.com)文章来源地址https://www.toymoban.com/news/detail-845756.html

到了这里,关于【读论文】3D Gaussian Splatting for Real-Time Radiance Field Rendering的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • 3D Gaussian Splatting for Real-Time Radiance Field Rendering 阅读笔记

    感谢B站意の茗的讲解。 论文地址:https://arxiv.org/abs/2308.04079 项目主页:https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/ 从已有点云模型出发(sfm),以每个点为中心建立可学习的3D高斯表达,Splatting方法进行渲染,实现高分辨率实时渲染。(推动NERF加速方向) 能用训练好的点云

    2024年01月16日
    浏览(39)
  • [SIGGRAPH-23] 3D Gaussian Splatting for Real-Time Radiance Field Rendering

    pdf | proj | code 本文提出一种新的3D数据表达形式3D Gaussians。每个Gaussian由以下参数组成:中心点位置、协方差矩阵、可见性、颜色。通过世界坐标系到相机坐标系,再到图像坐标系的仿射关系,可将3D Gaussian映射到相机坐标系,通过对z轴积分,可得到对应Splatting 2D分布。 针对

    2024年02月04日
    浏览(39)
  • 3DGS 其一:3D Gaussian Splatting for Real-Time Radiance Field Rendering

    Reference: 深蓝学院:NeRF基础与常见算法解析 GitHub: gaussian-splatting 原文官网 A Survey on 3D Gaussian Splatting 开始弃用NeRF?为什么Gaussian Splatting在自动驾驶场景如此受欢迎? 相关文章: NeRF 其一:NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF 其二:Mip-NeRF NeRF 其三:In

    2024年01月18日
    浏览(63)
  • GPS-Gaussian:Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

    1)仿射变换 quad 所谓仿射变换,就是向量经过一次线性变换加一次平移变换,用公式可以表示为: quad 其中,p为变换前原始向量,q为变换后目标向量,A为线性变换矩阵,b为平移变换向量。 quad 对于二维图像而言,p和q分别是某个像素点在原图和仿射变换后的图中的未知(

    2024年02月03日
    浏览(35)
  • NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields 论文阅读

    题目 :NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields 作者 :Antoni Rosinol, John J. Leonard, Luca Carlone 代码 :https://github.com/ToniRV/NeRF-SLAM 来源 :arxiv 时间 :2022 我们提出了一种新颖的几何和光度 3D 映射流程,用于从单目图像进行准确、实时的场景重建。 为了实现这一目标

    2024年02月14日
    浏览(39)
  • Training-Time-Friendly Network for Real-Time Object Detection 论文学习

    目前的目标检测器很少能做到快速训练、快速推理,并同时保持准确率。直觉上,推理越快的检测器应该训练也很快,但大多数的实时检测器反而需要更长的训练时间。准确率高的检测器大致可分为两类:推理时间久的的训练时间久的。 推理时间久的检测器一般依赖于复杂的

    2024年02月15日
    浏览(33)
  • 【语音增强论文解读 03】TCNN: TEMPORAL CONVOLUTIONAL NEURAL NETWORK FOR REAL-TIME SPEECHENHANCEMENT IN THE TIME

    作者:Ashutosh Pandey and DeLiang Wang 文末附文章地址及其开源代码地址         尽管使用 T-F 表示是最流行的方法,但它也有一些缺点。首先,这些方法通常忽略干净的相位信息,并使用噪声相位进行时域信号重建。         受成功实现用于序列建模的 TCNN 以及基于编解码

    2024年02月02日
    浏览(33)
  • Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes 论文笔记&环境配置

    发布于 CVPR 2021 论文介绍了一种具有神经SDF的复杂几何实时渲染方法。 论文提出了一种神经SDF表示,可以有效地捕获多个LOD,并以最先进的质量重建3D几何图形。 论文中的架构可以以比传统方法具有更高视觉保真度的压缩格式表示 3D 形状,并且即使在单个学习示例中也能跨不

    2024年01月24日
    浏览(29)
  • Perceptual Loss(感知损失)&Perceptual Losses for Real-Time Style Transferand Super-Resolution论文解读

    由于传统的L1,L2 loss是针对于像素级的损失计算,且L2 loss与人眼感知的图像质量并不匹配,单一使用L1或L2 loss对于超分等任务来说恢复出来的图像往往细节表现都不好。 现在的研究中,L2 loss逐步被人眼感知loss所取代。人眼感知loss也被称为perceptual loss(感知损失),它与MSE(

    2023年04月20日
    浏览(37)
  • LiDAR SLAM 闭环——BoW3D: Bag of Words for Real-time Loop Closing in 3D LiDAR SLAM

    标题:BoW3D: Bag of Words for Real-Time Loop Closing in 3D LiDAR SLAM 作者:Yunge Cui,Xieyuanli Chen,Yinlong Zhang,Jiahua Dong,Qingxiao Wu,Feng Zhu 机构:中科院沈阳自动化研究所 来源:2022 RAL 现算法已经开源,代码链接: GitHub - YungeCui/BoW3D: [RA-L] BoW3D: Bag of Words for Real-Time Loop Closing in 3D LiDAR SLAM.      

    2024年02月12日
    浏览(34)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包