Learning 3D Shape Fidelity Metric from Real-world Distortions
Abstract
3D generation and reconstruction have become essential in many computer vision applications, where the reconstructed or generated 3D shapes need to appear realistic to human perception. However, traditional metrics like Chamfer Distance to compare two 3D shapes focus primarily on matching accuracy of the shape geometry and fail to capture perceptual fidelity in the shape. While frequency-based metrics attempt to analyze shape details in the spectral domain, they still do not fully encapsulate the complexity of human perception. To address this gap, we propose a human-aligned fidelity metric that leverages local shape connectivity through a local attention mechanism to capture rich, detailed shape information. We also introduce the two-branch Real Shape Fidelity (RSF) dataset, including a main subset and test-only subset. This dataset generates 3D mesh distortions using real-world reconstruction and generation methods and annotated by hundreds of human subjects. Our metric named Local-Connection-based Shape Evaluation (LoCaSE), utilizes a PointNet-based backbone combined with Low-Rank Adaptation (LoRA)-style pretraining and finetuning to reduce model bias, while maintaining translation, rotation, and scale invariance. Experiments demonstrate that our approach achieves superior alignment with human perception compared to previous metrics.