Skip to yearly menu bar Skip to main content


Poster

LASO: Language-guided Affordance Segmentation on 3D Object

Yicong Li · Na Zhao · Junbin Xiao · Chun Feng · Xiang Wang · Tat-seng Chua


Abstract:

Affordance detection in 3D data is key for bridging perception and action in robots. Existing efforts mostly focus on the visual aspect and overlook the interaction with linguistic instructions, thus limiting their generalization to new objects and integration with large language models (LLMs) which are excellent instruction interpreters. With this regard, we propose a novel task, Language-guided Affordance Segmentation on 3D Object (LASO). LASO challenges a model to segment a 3D object's part relevant to a given affordance question. To facilitate the task, we contribute a dataset comprising 19,751 point-question pairs, covering 8434 object shapes and 870 expert-crafted questions. As a pioneer solution, we further propose PointRefer, which highlights an adaptive fusion module to identify target affordance regions at different scales. To ensure a text-aware segmentation, we adopt a set of affordance queries conditioned on linguistic cues to generate dynamic kernels. These kernels are further used to convoluted with point features and generate a segmentation mask. Comprehensive experiments and analyses validate PointRefer's effectiveness. With these efforts, We hope that LASO can steer the direction of 3D affordance, guiding it towards enhanced integration with the evolving capabilities of LLMs. Code is available at https://anonymous.4open.science/r/LASO/.

Live content is unavailable. Log in and register to view live content