Poster
Goku: Generative Flow Kit for Unified Image-Video Creation
Shoufa Chen · Chongjian GE · Yuqi Zhang · Yida Zhang · Fengda Zhu · Hao Yang · Hongxiang Hao · hui wu · Zhichao Lai · Yifei Hu · Ting-Che Lin · Shilong Zhang · Fu Li · Chuan Li · Xing Wang · Yanghua Peng · Peize Sun · Ping Luo · Yi Jiang · Zehuan Yuan · BINGYUE PENG · Xiaobing Liu
This paper presents our latest advancements, Goku, a new family of joint image-and-video generation models based on rectified flow Transformers to achieve industry-grade performance. We present the foundational elements required for high-quality visual generation, including data curation, model design, flow formulation, etc. Key contributions inclued a meticulous data filtering pipeline that ensures high-quality, fine-grained image and video data curation; and the pioneering use of rectified flow for enhanced interaction among video and image tokens. Goku models achieve superior performance in both qualitative and quantitative assessments. Notably, \ours achieves top scores on major benchmarks: 0.76 on GenEval and 83.65 on DPG-Bench for text-to-image generation, alongside 82.7 on VBench for text-to-video tasks. We hope this report offers valuable insights into joint image-and-video generation models for the research community.
Live content is unavailable. Log in and register to view live content