Circular-DPO: Aligning Multi-Stage 3D Generative Models via Preference Feedback Loop
Abstract
Multi-stage generative models have shown great promise in 3D content creation due to focused generation of structure or texture in different stages, but their outputs often fail to align with human preferences. The key bottleneck to apply alignment methods is the presence of non-differentiable operations between generative stages.This disconnection stops preference signals applied to the final output from being backpropagated to the crucial, early stages of generation, while simple separated stage-wise alignment leads to texture-geometry inconsistency.To address this challenge, we introduce Circular-DPO, which builds a preference feedback loop to align multi-stage 3D generation models to human preference.Our method first applies Direct Preference Optimization (DPO) to refine the final 3D asset.We then construct new preference pairs by sampling and decoding the assets generated by the optimized model.These newly-formed pairs are used to train the preceding generative stage, effectively creating a feedback loop that bridges the non-differentiable gap. Furthermore, to enhance robustness against noisy data, we introduce a quality-aware weighting mechanism that prioritizes reliable preference pairs during training. Experiments demonstrate that our approach improves the alignment of generated 3D content with human preferences by enabling holistic, multi-stage optimization.