CVPR Poster HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation

Poster

HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation

Hongwei Zheng · Han Li · Wenrui Dai · Ziyang Zheng · Chenglin Li · Junni Zou · Hongkai Xiong

ExHall D Poster #94

[ Abstract ] [ Paper PDF ]

Sat 14 Jun 3 p.m. PDT — 5 p.m. PDT

Abstract:

Existing 2D-to-3D human pose estimation (HPE) methods struggle with the occlusion issue by enriching information like temporal and visual cues in the lifting stage. In this paper, we argue that these methods ignore the limitation of the sparse skeleton 2D input representation, which fundamentally restricts the 2D-to-3D lifting and worsens the occlusion issue. To address these, we propose a novel two-stage generative densification method, named Hierarchical Pose AutoRegressive Transformer (HiPART), to generate hierarchical 2D dense poses from the original sparse 2D pose. Specifically, we first develop a multi-scale skeleton tokenization module to quantize the highly dense 2D pose into hierarchical tokens and propose a skeleton-aware alignment to strengthen token connections. We then develop a hierarchical autoregressive modeling scheme for hierarchical 2D pose generation. With generated hierarchical poses as inputs for 2D-to-3D lifting, the proposed method shows strong robustness in occluded scenarios and achieves state-of-the-art performance on the single-frame-based 3D HPE. Moreover, it outperforms numerous multi-frame methods while reducing parameter and computational complexity and can also complement them to further enhance performance and robustness.

Live content is unavailable. Log in and register to view live content