Skip to yearly menu bar Skip to main content


TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes

Xuying Zhang · Bo-Wen Yin · yuming chen · Zheng Lin · Yunheng Li · Qibin Hou · Ming-Ming Cheng

Arch 4A-E Poster #1
[ ]
Fri 21 Jun 10:30 a.m. PDT — noon PDT


Recent progress in text-driven 3D object stylization has been considerably promoted by the Contrastive Language-Image Pre-training (CLIP) model. However, the stylization of multi-object 3D scenes is impeded in that the image-text pairs used for pre-training CLIP mostly consist of an object. Meanwhile, the local details of multiple objects may be susceptible to omission due to the existing supervision manner primarily relying on coarse-grained contrast of image-text pairs. To overcome these challenges, we present a novel framework, dubbed TeMo, to parse multi-object 3D scenes and edit their styles under the contrast supervision at multiple levels. We first propose a Graph-based Cross Attention (GCA) module to distinguishably reinforce the features of 3D surface points. Particularly, a cross-modal graph is constructed to align the object points accurately and noun phrases decoupled from the 3D mesh and textual description. Then, we develop a Cross-Grained Contrast (CGC) supervision system, where a fine-grained loss between the words in the textual description and the randomly rendered images are constructed to complement the coarse-grained loss. Extensive experiments show that our method can synthesize desired styles and outperform the existing methods over a wide range of multi-object 3D meshes. Our codes and results will be made publicly available.

Live content is unavailable. Log in and register to view live content