OrienPose: Orientation-Guided Novel View Synthesis for Single-Image Unseen Object Pose Estimation
Abstract
Estimating the 3D pose of unseen objects from a single image remains a fundamental yet challenging problem in computer vision, especially under a CAD model-free setting.Pioneering attempts address this issue by matching templates generated through Novel View Synthesis (NVS), which essentially aims to learn the geometric transformation from a reference to a target view. While promising, these methods can only approximate this transformation under pixel-level supervision, as the starting orientation remains undefined. In the absence of explicit geometric constraints to verify the correctness of the predicted transformation, existing methods often synthesize novel views with geometry-distorted structures or severely blurred local textures, leading to unreliable template matching and suboptimal pose estimation results. To this end, we propose OrienPose, a novel object pose estimation framework via orientation-aware NVS from a single image. Specifically, we introduce the Orientation-Aware Guidance, which explicitly injects object orientation cues into the reference latent embedding to enhance orientation awareness during viewpoint transformation. We also introduce an orientation consistency loss that supervises viewpoint transformation at the geometric level, establishing sufficient supervision for explicit and geometry-consistent transformation guidance beyond pixel-level similarity. This loss justifies estimating the reference orientation rather than using its ground-truth pose, thereby ensuring the alignment of coordinate domains between the injected and supervised priors. Extensive experiments demonstrate that OrienPose achieves state-of-the-art performance in single-view unseen object pose estimation and impressive robustness to image degradations.