PoseMaster: A Unified 3D Native Framework for Stylized Pose Generation
Abstract
Pose stylization is a fundamental task across the 2D, 3D, or video fields, which aim to output a stylized image or 3D mesh with the expected pose. In the 3D domains, existing pose stylization methods typically rely on 2D foundational models to modify the pose of an image before generating the corresponding 3D assets, which limits the ability of these methods to achieve rich and precise 3D pose stylization. To address this challenge, we propose a novel paradigm for 3D pose stylization that unifies pose stylization and 3D generation within a cohesive framework. This integration minimizes the risk of cumulative errors and enhances the model's efficiency and effectiveness. In addition, instead of a 2D skeleton used in previous works, we directly utilize the 3D skeleton because it can provide a more accurate representation of 3D spatial and topological relationships, which significantly enhances the model's capacity to achieve richer and more precise pose stylization. Additionally, we establish a comprehensive data engine to create a large-scale dataset that includes pairs of image-body misalignment and skeleton-body alignment. This dataset encourages 3D generative models to concurrently learn both the style of images and the pose-related 3D structures. Building on these innovations, we present PoseMaster, a unified 3D native method for stylized pose generation. Extensive experimental evaluations demonstrate that PoseMaster significantly outperforms current state-of-the-art techniques in both qualitative and quantitative assessments.