Skip to yearly menu bar Skip to main content


Poster

Dense-To-Sparse Video Diffusion For High-fidelity Multi-View Images Synthesis

Fan Yang · Jianfeng Zhang · Jun Hao Liew · Chaoyue Song · Zhongcong Xu · Xiu Li · Jiashi Feng · Guosheng Lin


Abstract:

Multi-view image synthesis models are limited by a lack of training data. Fine-tuning well-trained video generative models to generate 360-degree videos of objects offers a promising solution, as they inherit the strong generative priors from the pretrained knowledge. However, these methods often face computational bottlenecks due to the large number of viewpoints, with temporal attention mechanisms often used to mitigate this. Unfortunately, such techniques can introduce artifacts like 3D inconsistency and over-smoothing. To overcome this, we propose a novel sparsification approach that reduces the video diffusion model into sparse view synthesis. We first extract rich geometric priors from pretrained video diffusion models and then conduct high-fidelity sparse multi-view synthesis to improve the 3D consistency. Extensive experiments show that our approach achieves superior efficiency, generalization, and consistency, outperforming state-of-the-art multi-view synthesis methods

Live content is unavailable. Log in and register to view live content