Skip to yearly menu bar Skip to main content


InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

Jing Shi · Wei Xiong · Zhe Lin · HyunJoon Jung

Arch 4A-E Poster #363
[ ]
Wed 19 Jun 5 p.m. PDT — 6:30 p.m. PDT


Recent advances in personalized image generation have enabled pre-trained text-to-image models to learn new concepts from specific image sets. However, these methods often necessitate extensive test-time finetuning for each new concept, leading to inefficiencies in both time and scalability. To address this challenge, we introduce InstantBooth, an innovative approach leveraging existing text-to-image models for instantaneous text-guided image personalization, eliminating the need for test-time finetuning. This efficiency is achieved through two primary innovations. Firstly, we utilize an image encoder that transforms input images into a global embedding to grasp the general concept. Secondly, we integrate new adapter layers into the pre-trained model, enhancing its ability to capture intricate identity details while maintaining language coherence. Significantly, our model is trained exclusively on text-image pairs, without reliance on concept-specific paired images. When benchmarked against existing finetuning-based personalization techniques like DreamBooth and Textual-Inversion, InstantBooth not only shows comparable proficiency in aligning language with image, maintaining image quality, and preserving the identity but also boasts a 100-fold increase in generation speed. Project Page:

Live content is unavailable. Log in and register to view live content