CVPR Poster Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data

Poster

Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data

Zhiyuan Ma · Xinyue Liang · Rongyuan Wu · Xiangyu Zhu · Zhen Lei · Lei Zhang

ExHall D Poster #37

[ Abstract ]

Sat 14 Jun 8:30 a.m. PDT — 10:30 a.m. PDT

Abstract:

It is desirable to obtain a model that can generate high-quality 3D meshes from text prompts in just seconds. While recent attempts have adapted pre-trained text-to-image diffusion models, such as Stable Diffusion (SD), into generators of 3D representations (e.g., Triplane), they often suffer from poor quality due to the lack of sufficient high-quality 3D training data. Aiming at overcoming the data shortage, we propose a novel training scheme, termed as Progressive Rendering Distillation (PRD), which eliminates the need for 3D ground-truths by distilling multi-view diffusion models, and adapts SD into a native 3D generator. In each iteration of training, PRD uses the U-Net to progressively denoise the latent from random noise for a few steps, and in each step it decodes the denoised latent into 3D output. Multi-view diffusion models, including MVDream and RichDreamer, are used in joint with SD to distill text-consistent textures and geometries to the 3D outputs through score distillation. Our PRD scheme also accelerates the inference speed by training the model to generate 3D contents in just four steps. We use PRD to train a Triplane generator, namely TriplaneTurbo, which adds only 2.5% trainable parameters to adapt SD for Triplane generation. TriplaneTurbo outperforms previous text-to-3D generators in both quality and efficiency. Specifically, it can produce high-quality 3D meshes in 0.6 seconds.

Live content is unavailable. Log in and register to view live content