PoseGaussian: 6D Pose Estimation for Unseen Objects via Sparse-View Object-Level 3D Gaussian Splatting
Abstract
6D pose estimation is a key technology in computer vision and robotic manipulation. However, many methods remain heavily dependent on CAD models that are difficult to obtain. Object-level 3D reconstruction provides an alternative route, and 3D Gaussian Splatting (3DGS) shows convincing potential owing to its training and rendering efficiency. Nevertheless, under sparse reference views, 3DGS is prone to floating artifacts and appearance overfitting, which weakens the stability of pose estimation. We present PoseGaussian, a method for sparse-view 6D pose estimation for unseen object that builds on improved 3DGS. First, we use sparse RGB-D views to inject a depth structure prior into the 3DGS initialization for stable structure, and we adopt adaptive density control, view-warping augmentation, and joint photometric–depth supervision to reduce floaters and appearance overfitting under sparse reference views. Next, in the pose estimation stage, we apply a two-stage learning-guided ICP initializer that exploits geometric features to obtain a stable initial pose. Finally, we introduce a 3DGS-based iterative pose refiner that aligns rendered and query images in both appearance and geometry, further improving pose estimation accuracy. Experiments on LINEMOD, GenMOP, and our real-world datasets show that PoseGaussian achieves significant improvements over baseline methods under model-free and sparse-view settings, demonstrating strong generalization to unseen objects and robustness to view sparsity.