Poster
Dense-To-Sparse Video Diffusion For High-fidelity Multi-View Images Synthesis
Fan Yang · Jianfeng Zhang · Jun Hao Liew · Chaoyue Song · Zhongcong Xu · Xiu Li · Jiashi Feng · Guosheng Lin
Multi-view image synthesis models are limited by a lack of training data. Fine-tuning well-trained video generative models to generate 360-degree videos of objects offers a promising solution, as they inherit the strong generative priors from the pretrained knowledge. However, these methods often face computational bottlenecks due to the large number of viewpoints, with temporal attention mechanisms often used to mitigate this. Unfortunately, such techniques can introduce artifacts like 3D inconsistency and over-smoothing. To overcome this, we propose a novel sparsification approach that reduces the video diffusion model into sparse view synthesis. We first extract rich geometric priors from pretrained video diffusion models and then conduct high-fidelity sparse multi-view synthesis to improve the 3D consistency. Extensive experiments show that our approach achieves superior efficiency, generalization, and consistency, outperforming state-of-the-art multi-view synthesis methods
Live content is unavailable. Log in and register to view live content