Poster Sat, Jun 6, 2026 • 10:45 AM – 12:45 PM PDT ExHall F 615

Humanoid Generative Pre-Training for Zero-Shot Motion Tracking

Zekun Qi ⋅ Xuchuan Chen ⋅ Jilong Wang ⋅ Chenghuai Lin ⋅ Yunrui Lian ⋅ Wenyao Zhang ⋅ XinQiang Yu ⋅ He Wang ⋅ Li Yi

Paper PDF

Abstract

We introduce Humanoid-GPT, the first GPT-style humanoid motion Transformer trained with causal attention on a billion-scale motion corpus for whole-body control. Unlike prior shallow MLP trackers constrained by scarce data and an agility–generalization trade-off, Humanoid-GPT is pre-trained on a 2B-frame retargeted corpus that unifies all major mocap datasets with large-scale in-house recordings. Scaling both data and model capacity yields a single generative Transformer that tracks arbitrary humans executing highly dynamic behaviors while achieving unprecedented zero-shot generalization to unseen motions and control tasks. Extensive experiments and scaling analyses show that our model establishes a new performance frontier, demonstrating robust zero-shot generalization to unseen tasks while simultaneously tracking highly dynamic and complex motions.