CVPR Poster FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning

Poster

FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning

Gaojian Wang · Feng Lin · Tong Wu · Zhenguang Liu · Zhongjie Ba · Kui Ren

ExHall D Poster #319

[ Abstract ] [ Project Page ] [ Paper PDF ]

Sun 15 Jun 8:30 a.m. PDT — 10:30 a.m. PDT

Abstract: This work asks: with abundant, unlabeled real faces, how to learn a robust and transferable facial representation that boosts various face security tasks with respect to generalization performance? We make the first attempt and propose a self-supervised pretraining framework to learn fundamental representations of real face images, $\textbf{FSFM}$, that leverages the synergy between masked image modeling (MIM) and instance discrimination (ID). We explore various facial masking strategies for MIM and present a simple yet powerful CRFR-P masking, which explicitly forces the model to capture meaningful intra-region $\textbf{C}$onsistency and challenging inter-region $\textbf{C}$oherency. Furthermore, we devise the ID network that naturally couples with MIM to establish underlying local-to-global $\textbf{C}$orrespondence via tailored self-distillation. These three learning objectives, namely $\textbf{3C}$, empower encoding both local features and global semantics of real faces. After pretraining, a vanilla ViT serves as a universal vision $\textbf{F}$oundation $\textbf{M}$odel for downstream $\textbf{F}$ace $\textbf{S}$ecurity tasks: cross-dataset deepfake detection, cross-domain face anti-spoofing, and unseen diffusion facial forgery detection. Extensive experiments on 10 public datasets demonstrate that our model transfers better than supervised pretraining, visual and facial self-supervised learning arts, and even outperforms task-specialized SOTA methods.

Live content is unavailable. Log in and register to view live content