ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis

Xiangjun Gao · Xiaoyu Li · Chaopeng Zhang · Qi Zhang · Yan-Pei Cao · Ying Shan · Long Quan

Thu 20 Jun 10:30 a.m. PDT — noon PDT


In this work, we propose a method to address the challenge of rendering a 3D human from a single image in a free-view manner. Some existing approaches could achieve this by using generalizable pixel-aligned implicit fields to reconstruct a textured mesh of a human or by employing a 2D diffusion model as guidance with the Score Distillation Sampling (SDS) method, to lift the 2D image into 3D space. However, a generalizable implicit field often results in an over-smooth texture field, while the SDS method tends to lead to a texture-inconsistent novel view with the input image. In this paper, we introduce a texture-consistent back view synthesis method that could transfer the reference image content to the back view through depth-guided mutual self-attention. With this method, we could achieve high-fidelity and texture-consistent human rendering from a single image. Moreover, to alleviate the color distortion that occurs in the side region, we propose a \xy{visibility-aware patch consistency regularization} combined with the synthesized back view texture. Experiments conducted on both real and synthetic data demonstrate the effectiveness of our method and show that our approach outperforms previous baseline methods.

