Skip to yearly menu bar Skip to main content


Poster

MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation

Aviral Chharia · Wenbo Gou · Haoye Dong


Abstract:

Though single-view 3D human pose estimation has gained much attention, 3D multi-view multi-person pose estimation faces several challenges including the presence of occlusions and generalizability to new camera arrangements or scenarios. Existing transformer-based approaches often struggle to accurately model joint spatial sequences, especially in occluded scenarios. To address this, we present a novel Multi-View State Space Modeling framework, named MV-SSM for robustly reconstructing 3D human poses, by explicitly modeling the joint spatial sequence at two distinct levels: the feature level from multi-view images and the joint level of the person. Specifically, we propose a Projective State Space (PSS) block to learn the joint spatial sequences using state space modeling. Furthermore, we modify Mamba's unidirectional scanning into an effective Grid token-guided Bidirectional scan (GTBS) which is integral to the PSS block. Experiments on multiple challenging benchmarks demonstrate that MV-SSM archives highly accurate 3D pose estimation and is generalizable across the number of cameras (+10.8 on AP25 on the challenging 3 camera setting in CMU Panoptic), varying camera arrangements (+7.0 on AP25), and cross-datasets (+15.3 PCP on Campus A1), significantly outperforming SOTAs. The code has been submitted and will be open-sourced with model weights upon acceptance.

Live content is unavailable. Log in and register to view live content