Poster
R2C: Mapping Room to Chessboard to Unlock LLM As Low-Level Action Planner
Ziyi Bai · Hanxuan Li · Bin Fu · Chuyan Xiong · Ruiping Wang · Xilin Chen
This paper explores leveraging large language models (LLMs) as low-level action planners for general embodied instruction following tasks. LLMs excel at serving as the “brain” of robots by handling high-level task planning but lack the ability to directly generate precise low-level actions to guide the “body”. This limitation arises from a disconnect between high-level conceptual understanding and low-level spatial perception. We address this challenge by bridging the gap, enabling LLMs to not only understand complex instructions but also produce precise, actionable plans. To achieve this, we introduce Room to Chessboard (R2C), a semantic representation that maps environments onto a grid-based chessboard, enabling LLMs to generate specific low-level coordinates and effectively guide robots as if playing a game of chess. We further propose a Chain-of-Thought Decision (CoT-D) paradigm to enhance the LLMs’ decision-making ability by improving interpretability and context-awareness. By jointly training LLMs for high-level task decomposition and low-level action generation, we create a unified “brain-body” system capable of handling complex, free-form instructions while producing precise low-level actions to make robot adapt to dynamic environments in real time. We validate R2C using both fine-tuned open-source LLMs and GPT-4, demonstrating effectiveness on the challenging ALFRED benchmark. Results show that with our R2C framework, LLMs can effectively act as low-level planners, generalizing across diverse settings and open-vocabulary robotic tasks. View the demos on our project page: https://anonymous4cv.github.io/Room2Chessboard/.
Live content is unavailable. Log in and register to view live content