Learn to Optimize Denoising Scores for 3D Generation - A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting

Xiaofeng Yang*1, Yiwen Chen*1, Cheng Chen1, Chi Zhang1, Yi Xu2, Xulei Yang3, Fayao Liu3 and Guosheng Lin1
1Nanyang Technological University, 2OPPO US Research Center, 3A*STAR, Singapore

Abstract

We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks. Despite the critical importance of these tasks, existing methodologies often struggle to generate high-caliber results. We begin by examining the inherent limitations in previous diffusion priors. We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation. To address this issue, we propose a novel, unified framework that iteratively optimizes both the 3D model and the diffusion prior. Leveraging the different learnable parameters of the diffusion prior, our approach offers multiple configurations, affording various trade-offs between performance and implementation complexity. Notably, our experimental results demonstrate that our method markedly surpasses existing techniques, establishing new state-of-the-art in the realm of text-to-3D generation. Furthermore, our approach exhibits impressive performance on both NeRF and the newly introduced 3D Gaussian Splatting backbones. Additionally, our framework yields insightful contributions to the understanding of recent score distillation methods, such as the VSD and DDS loss.

Performance on T3Bench (with NeRF)

Dataset Dreamfusion Magic3D LatentNeRF Fantasia3D SJC ProlificDreamer LODS Emb. LODS LoRA
Single Obj. 24.4 37.0 33.1 26.4 24.7 49.4 52.3 51.3
Surroundings 24.6 35.4 30.6 27.0 19.8 44.8 49.8 47.3
Multi. Obj. 16.1 25.7 20.6 18.5 11.7 35.8 39.7 37.5
Average 21.7 32.7 28.1 24.0 18.7 43.3 47.3 45.4

We achieve state-of-the-art performance on T3Bench.

Generation Results (NeRF)

A red and white lighthouse on a cliff
A red and white lighthouse on a cliff
An intricately-carved wooden chess set
An intricately-carved wooden chess set
A cactus with pink flowers
A cactus with pink flowers
A vintage porcelain doll with a frilly dress
A vintage porcelain doll with a frilly dress

Generation Results (3D Gaussian Splatting)

A DSLR image of a hamburger
A DSLR image of a hamburger
A pair of worn-in blue jeans
A pair of worn-in blue jeans
A worn-out leather briefcase
A worn-out leather briefcase
An ivory candlestick holder
An ivory candlestick holder

Citation

@article{yang2023lods,
  title={Learn to Optimize Denoising Scores for 3D Generation},
  author={Xiaofeng Yang, Yiwen Chen, Cheng Chen, Chi Zhang, Yi Xu, Xulei Yang, Fayao Liu and Guosheng Lin},
  journal={arXiv:2312.04820},
  year={2023}
}