Skip to yearly menu bar Skip to main content


Poster

The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation

Bingjie Gao · Xinyu Gao · Xiaoxue Wu · yujie zhou · Yu Qiao · Li Niu · Xinyuan Chen · Yaohui Wang


Abstract:

Text-to-video (T2V) generative models trained on large-scale datasets have made remarkable advancements. Well-performed prompts are often model-specific and coherent with the distribution in training prompts for T2V models. Previous studies utilize large language models to optimize user-provided prompts consistent with distribution of training prompts directly, lack of refined guidance considering both prompt vocabulary and specific sentence format. In this paper, we introduce a retrieval-augmented prompt optimization framework for T2V generation. The user-provided prompts are augmented with relevant and diverse modifiers retrieved from a built relation graph, and then refactored into the format of training prompts through a fine-tuned language model. To balance the retrieval speed and vocabulary diversity of relation graph, we propose a two branches optimization mechanism to determine the better prompt optimized from our method or large language model directly. Extensive experiments demonstrate that our proposed method can effectively enhance the both static and dynamic dimensions of generated videos, demonstrating the significance of prompt optimization for simple user-provided prompts.

Live content is unavailable. Log in and register to view live content