Multimodal Semantic Bias Mitigation for Diverse Text-To-3D Generation
Abstract
The latest progress in text-to-3D generative models makes it possible to generate high-quality 3D content. Recent text-to-3D large model have achieved remarkable breakthroughs in multi-view consistency. However, their effectiveness is often affected by inherent biases, resulting in sensitivity to design settings such as prompt format, leading to difficulty understanding complex prompts. To help text-to-3D generative models understand more diverse prompts, we propose a framework to localize and mitigate the bias in the current text-to-3D large model. Specifically, we first use the existing model to generate 3D content and use the quality evaluation model to identify the cross-modality bias. Then, we use the predicted quality score to quantify the contribution of the prompt text to the bias. Finally, in order to reduce these biases, we construct diverse pairwise examples to help the current text-to-3D large model construct unbiased visual-text connections. The experiment shows that our method has achieved competitive results and can provide higher quality, more diverse 3D content compared to existing methods.