Skip to yearly menu bar Skip to main content


DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach

Dayi Tan · Hansheng Chen · Wei Tian · Lu Xiong

Arch 4A-E Poster #199
[ ]
Wed 19 Jun 10:30 a.m. PDT — noon PDT

Abstract: This paper presents the DiffusionRegPose, a novel approach to multi-person pose estimation that converts a one-stage, end-to-end keypoint regression model into a diffusion-based sampling process. Existing one-stage deterministic regression methods, though efficient, are often prone to missed or false detections in crowded or occluded scenes, due to their inability to reason pose ambiguity. To address these challenges, we handle ambiguous poses in a generative fashion, i.e., sampling from the image-conditioned pose distributions characterized by a diffusion probabilistic model. Specifically, with initial pose tokens extracted from the image, noisy pose candidates are progressively refined by interacting with the initial tokens via attention layers. Extensive evaluations on the COCO and CrowdPose datasets show that DiffusionRegPose clearly improves the pose accuracy in crowded scenarios, as evidenced by a notable 3.3 AP increase in the $AP_H$ metric on the CrowdPose dataset. This demonstrates the model's potential for robust and precise human pose estimation in real-world applications.

Live content is unavailable. Log in and register to view live content