Skip to yearly menu bar Skip to main content


Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

Renshuai Liu · Bowen Ma · Wei Zhang · Zhipeng Hu · Changjie Fan · Tangjie Lv · Yu Ding · Xuan Cheng

Arch 4A-E Poster #188
award Highlight
[ ]
Wed 19 Jun 10:30 a.m. PDT — noon PDT


In human-centric content generation, the pre-trained text-to-image models struggle to produce user-wanted portrait images, which retain the identity of individual while exhibit diverse expressions. This paper introduces our efforts towards the personalized face generation. To this end, we propose a novel multi-modal face generation framework, capable of simultaneous identity-expression control and more fine-grained expression synthesis. Our expression control is so sophisticated that it can be specialized by the fine-grained emotional vocabulary. We devise a novel diffusion model which can undertake the task of simultaneously face swapping and reenactment. Due to the entanglement of identity and expression, it's nontrivial to separately and precisely control them in one framework, thus has not been explored yet. To overcome this, we propose several innovative designs in conditional diffusion model, including balancing identity and expression encoder, improved midpoint sampling and explicitly background conditioning. Extensive experiments have demonstrated the controllability and scalability of the proposed framework, in comparison with state-of-the-art text-to-image, face swapping and face reenactment methods.

Live content is unavailable. Log in and register to view live content