Open the Motion Door: Atomic Motion Decomposition and Recomposition for Open-Vocabulary Motion Generation
Abstract
Text-to-motion generation is a fundamental task in computer vision, aiming to synthesize 3D human motion sequences from natural language descriptions. However, due to the limited scale and diversity of existing datasets, models trained to directly map raw text to motion often struggle to generalize to out-of-domain textual inputs. We observe that although high-level motion semantics vary widely, many motions share a common set of underlying atomic motions—that is, simple, reusable body-part movements. Building on this insight, we introduce an Atomic Motion Decomposition and Recomposition framework for open-vocabulary text-to-motion generation. Our approach consists of two key components: a Textual Decomposition module that parses out-of-domain descriptions into atomic motion units, and an Atomic Recomposition module that integrates these units to produce the final motion sequence. Our model achieves a competitive performance on the in-domain HumanML3D dataset, and extensive experiments on two out-of-domain datasets (IDEA400 and Mixamo) demonstrate that our method substantially outperforms state-of-the-art approaches in open-vocabulary motion generation.