Poster
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
Jieming Cui · Tengyu Liu · Ziyu Meng · Jiale Yu · Ran Song · Wei Zhang · Yixin Zhu · Siyuan Huang
Learning open-vocabulary physical skills for simulated agents remains challenging due to the limitations of reinforcement learning approaches: manually designed rewards lack scalability, while demonstration-based methods struggle to cover arbitrary tasks. We propose GROVE, a generalized reward framework for open-vocabulary physical skill learning without manual reward design or task-specific demonstrations. GROVE uniquely combines Large Language Models (LLMs) for generating precise constraints with Vision Language Models (VLMs) for semantic evaluation. Through an iterative reward design process, VLM-based feedback guides the refinement of LLM-generated constraints, significantly enhancing the reliability of our method. Central to our approach is Pose2CLIP, a lightweight pose-to-semantic feature mapper that significantly enhances the quality and efficiency of VLM evaluation. Extensive experiments demonstrate GROVE's versatility across diverse tasks and learning paradigms. Our approach achieves 22.2% higher naturalness and 25.7% better task completion score while training 8.4 times faster than previous open-vocabulary methods, establishing a new foundation for scalable physical skill acquisition.
Live content is unavailable. Log in and register to view live content