Mining Attribute Subspaces for Efficient Fine-tuning of 3D Foundation Models
Abstract
With the emergence of 3D foundation models, such as DUSt3R, VGGT, and their variants, there is a growing interest in fine-tuning them for various downstream tasks, where using LoRA is the dominant fine-tuning paradigm. As 3D datasets exhibit distinct variations in geometry, texture, camera motion, and lighting, there are interesting fundamental questions: 1) Are there LoRA sub-spaces associated with each type of variation? 2) Are these sub-spaces disentangled (i.e., orthogonal to each other)? 3) How do we compute them effectively? This paper provides answers to all these questions. We introduce a robust approach that generates synthetic datasets with controlled variations, fine-tunes a LoRA adapter on each dataset, and extracts a LoRA sub-space associated with each type of variation. We show that these sub-spaces are approximately disentangled. Integrating them leads to a reduced LoRA sub-space that enables efficient LoRA fine-tuning with improved prediction accuracy for downstream tasks. In particular, we show that such a reduced LoRA sub-space, despite derived entirely from synthetic data, generalizes to real datasets. An ablation study validates the effectiveness of the choices in our approach.