Skip to yearly menu bar Skip to main content


PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Zhen Li · Mingdeng Cao · Xintao Wang · Zhongang Qi · Ming-Ming Cheng · Ying Shan

Arch 4A-E Poster #372
[ ] [ Project Page ]
Wed 19 Jun 5 p.m. PDT — 6:30 p.m. PDT


Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing personalized generation methods cannot simultaneously satisfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information.Such an embedding also empowers our method to be applied in many interesting scenarios, such as when replacing the corresponding class word and when combining the characteristics of different identities. Besides, to better drive the training of our PhotoMaker, we propose an ID-oriented data creation pipeline to assemble the training data. Under the nourishment of the dataset constructed through the proposed pipeline, our PhotoMaker demonstrates comparable performance to test-time fine-tuning-based methods, yet provides significant speed improvements, strong generalization capabilities, and a wide range of applications.

Live content is unavailable. Log in and register to view live content