Skip to yearly menu bar Skip to main content


Poster

Vision-Guided Action: Enhancing 3D Human Motion Prediction with Gaze-informed Affordance in 3D Scenes

Ting Yu · Yi Lin · Jun Yu · Zhenyu Lou · Qiongjie Cui


Abstract:

Recent advances in human motion prediction (HMP) have shifted focus from isolated motion data to integrating human-scene correlations. In particular, the latest methods leverage human gaze points, using their spatial coordinates to indicate intent—where a person might move within a 3D environment. Despite promising trajectory results, these methods often produce inaccurate poses by overlooking the semantic implications of gaze, specifically the affordances of observed objects, which indicate the possible interactions. To address this, we propose GAP3DS, an affordance-aware HMP model that utilizes gaze-informed object affordances to improve HMP in complex 3D environments. GAP3DS incorporates a gaze-guided affordance learner to identify relevant objects in the scene and infer their affordances based on human gaze, thus contextualizing future human-object interactions. This affordance information, enriched with visual features and gaze data, conditions the generation of multiple human-object interaction poses, which are subsequently decoded into final motion predictions. Extensive experiments on two real-world datasets demonstrate that GAP3DS outperforms state-of-the-art methods in both trajectory and pose accuracy, producing more physically consistent and contextually grounded predictions.

Live content is unavailable. Log in and register to view live content