Poster
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
Yiyu Zhuang · Jiaxi Lv · Hao Wen · Qing Shuai · Ailing Zeng · Hao Zhu · Shifeng Chen · Yujiu Yang · Xun Cao · Wei Liu
Creating a high-fidelity, animatable 3D full-body avatar from a single image is a challenging task due to the diverse appearance and poses of humans and the limited availability of high-quality training data. To achieve fast and high-quality human reconstruction, this work rethinks the task from the perspectives of dataset, model, and representation. First, we introduce a large-scale HUman GEnerated training dataset, HuGe100K, consisting of 100K diverse, photorealistic human images with corresponding 24-view in a static pose or dynamic pose frames generated via a pose-controllable image-to-video model. Next, leveraging the diversity in views, poses, and appearances within HuGe100K, we develop a scalable feed-forward transformer model to predict a 3D human Gaussian representation in a uniform space of a given human image. This model is trained to disentangle human pose, shape, clothing geometry, and texture. Accordingly, the estimated Gaussians can be animated robustly without post-processing. We conduct comprehensive experiments to validate the effectiveness of the proposed dataset and method. Our model demonstrates the generalizable ability to efficiently reconstruct photorealistic humans in under 1 second using a single GPU. Additionally, it seamlessly supports various applications, including animation, shape, and texture editing tasks.
Live content is unavailable. Log in and register to view live content