Attribute-Preserving Pseudo-Labeling for Diffusion-Based Face Swapping
Abstract
Face swapping aims to transfer the identity of a source face onto a target face while preserving target-specific attributes such as pose, expression, lighting, skin tone, and makeup. However, since real face-swapping ground truth is unavailable, achieving both accurate identity transfer and high-quality attribute preservation remains challenging. Although recent diffusion-based approaches attempt to improve visual fidelity through conditional inpainting on masked target images, the masked condition removes crucial appearance cues, resulting in plausible yet misaligned attributes due to the lack of explicit supervision. To address these limitations, we propose APPLE (Attribute-Preserving Pseudo-Labeling for Diffusion-Based Face Swapping) a diffusion-based teacher–student framework that enhances attribute fidelity through attribute-aware pseudo-label supervision. First, we reformulate face swapping as a conditional deblurring task to more faithfully preserve target-specific attributes such as lighting, skin tone, and makeup. In addition, we introduce an attribute-aware inversion scheme to further improve detailed attribute preservation. Through an elaborate attribute-preserving design for teacher learning, APPLE produces high-quality pseudo triplets that explicitly provide the student with direct face-swapping supervision. Overall, APPLE achieves state-of-the-art performance in terms of attribute preservation and identity transfer, producing more photorealistic and target-faithful results. Code will be made publicly available.