Skip to yearly menu bar Skip to main content


Poster

Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model

Yuxiang Mao · Zhenfeng Fan · Zhijie Zhang · Zhiheng Zhang · Shihong Xia

[ ] [ Paper PDF ]
[ Poster
Fri 13 Jun 2 p.m. PDT — 4 p.m. PDT

Abstract:

Training a generic 3D face reconstruction model in a self-supervised manner using large-scale, in-the-wild 2D face image datasets enhances robustness to varying lighting conditions and occlusions while allowing the model to capture animatable wrinkle details across diverse facial expressions. However, a generic model often fails to adequately represent the unique characteristics of specific individuals. In this paper, we propose a method to train a generic base model and then transfer it to yield person-specific models by integrating lightweight adapters within the large-parameter ViT-MAE base model. These person-specific models excel at capturing individual facial shapes and detailed features while preserving the robustness and prior knowledge of detail variations from the base model. During training, we introduce a silhouette vertex re-projection loss to address boundary "landmark marching" issues on the 3D face caused by pose variations. Additionally, we employ an innovative teacher-student loss to leverage the inherent strengths of UNet in feature boundary localization for training our detail MAE. Quantitative and qualitative experiments demonstrate that our approach achieves state-of-the-art performance in face alignment, detail accuracy, and richness. The code will be released to the public upon the acceptance of this paper.

Chat is not available.