FisherPoser: Human Motion Estimation from Sparse Observations with Hierarchical Region-Wise Fisher-Matrix Uncertainty Modeling
Abstract
Full-body motion estimation from sparse VR observations is an inherently under-constrained problem, with only three 6-DoF trackers (HMD and controllers) available to infer a full skeletal pose. To address this ambiguity, we introduce a probabilistic framework that models joint orientations as distributions on SO(3) using the Matrix Fisher distribution. Instead of predicting a single deterministic pose, our network outputs a distribution for each joint, whose mode and concentration directly quantify prediction uncertainty on the rotation manifold. This enables likelihood-based training and principled uncertainty propagation. At the core of our model is a causal Transformer encoder that fuses sparse observations with motion history. We further propose region-wise tokens for the torso, arms, and legs, obtained via attention pooling over local joint features and semantic VR anchors. These tokens guide compact, per-region Fisher regression. To ensure kinematic coherence efficiently, we employ a limb refinement module, where each child joint's Fisher parameters are conditioned on its parent's distribution and the regional context, propagating pose and uncertainty hierarchically. Extensive experiments on standard sparse-VR benchmarks show that our approach achieves state-of-the-art performance, while providing well-calibrated joint-wise uncertainty.