Skip to yearly menu bar Skip to main content


Poster

Zero-1-to-A: Zero-shot One image to Animatable Head Avatars using Video Diffusion

Zhenglin Zhou · Fan Ma · Hehe Fan · Tat-seng Chua


Abstract: Animatable head avatar generation typically requires extensive data for training. To reduce the data requirements, a natural solution is to leverage existing data-free static avatar generation methods, such as pre-trained diffusion models with score distillation sampling (SDS), which align avatars with pseudo-ground-truth outputs from the diffusion model. However, directly distilling 4D avatars from video diffusion often leads to over-smooth results due to spatial and temporal inconsistencies in the video diffusion generation.To address this issue, we propose Symbiotic GENeration (SymGEN), a robust method that synthesizes spatial and temporal consistency datasets for 4D avatar reconstruction using the video diffusion model.Specifically, SymGEN iteratively constructs video datasets and optimizes animatable avatars in a progressive manner, ensuring that avatar quality increases smoothly and consistently throughout the learning process. This progressive learning involves two stages: (1) Spatial Consistency Learning fixes expressions and learns from front-to-side views, and (2) Temporal Consistency Learning fixes views and learns from relaxed to exaggerated expressions, generating 4D avatars in a simple-to-complex manner. Extensive experiments demonstrate that SymGEN improves fidelity, animation quality, and rendering speed compared to existing diffusion-based methods, providing a solution for lifelike avatar creation.The code will be publicly available.

Live content is unavailable. Log in and register to view live content