Skip to yearly menu bar Skip to main content


Poster

RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete

Yuheng Ji · Huajie Tan · Jiayu Shi · Xiaoshuai Hao · Yuan Zhang · Hengyuan Zhang · Pengwei Wang · Mengdi Zhao · Yao Mu · Pengju An · Xinda Xue · Qinghang Su · Huaihai Lyu · Xiaolong Zheng · Jiaming Liu · Zhongyuan Wang · Shanghang Zhang


Abstract:

Recent advancements in Multimodal Large Language Models (MLLMs) have shown remarkable capabilities across various multimodal contexts. However, their application in robotic scenarios, particularly for long-horizon manipulation tasks, reveals significant limitations. These limitations arise from the current MLLMs lacking three essential robotic brain capabilities: \textbf{Planning Capability}, which involves decomposing complex manipulation instructions into manageable sub-tasks; \textbf{Affordance Perception}, the ability to recognize and interpret the affordances of interactive objects; and \textbf{Trajectory Prediction}, the foresight to anticipate the complete manipulation trajectory necessary for successful execution. To enhance the robotic brain's core capabilities from abstract to concrete, we introduce \textbf{ShareRobot}, a high-quality heterogeneous dataset that labels multi-dimensional information such as task planning, object affordance, and end-effector trajectory. ShareRobot's diversity and accuracy have been meticulously refined by three human annotators. Building on this dataset, we developed \textbf{RoboBrain}, an MLLM-based model that combines robotic and general multi-modal data, utilizes a multi-stage training strategy, and incorporates long videos and high-resolution images to improve its robotic manipulation capabilities.Extensive experiments demonstrate that RoboBrain achieves state-of-the-art performance across various obotic tasks, highlighting its potential to advance robotic brain capabilities.

Live content is unavailable. Log in and register to view live content