Paper
in
Workshop: AI for Creative Visual Content Generation, Editing and Understanding
T3V2V: Test Time Training for Domain Adaptation in Video-to-Video Editing
Zezhou Wang · Jing Tang
In the realm of generative AI, state-of-the-art Video-to- Video (V2V) editing models can perform diverse edits based on different conditions and generate new videos. Despite their ability to generate various video edits, these mod- els still face significant frame inconsistencies, such as mo- tion discrepancies and unnatural background changes. This paper addresses these issues by analyzing video inconsis- tencies through domain shifts and implementing domain control based on this theory. Furthermore, a test-time compute-optimal sampling method for better representation of different video domains is proposed, which is a high- performance test-time training (TTT) method. By leverag- ing this TTT method, we propose T3V2V (TTT-V2V edit- ing). Our method utilizes frame-level information to estab- lish an unsupervised TTT learning process, providing more precise guidance for the image-to-video (I2V) generation process and enhancing video consistency through effective self-supervised parameter optimization and domain adap- tation. Extensive experiments on the DAVIS-EDIT bench- mark show that T3V2V outperforms previous state-of-the- art models. The self-supervised nature of our TTT approach further enables robust generalization to diverse V2V editing tasks, establishing a new paradigm for V2V synthesis.