Representing 3D Faces with Learnable B-Spline Volumes
Prashanth Chandran ⋅ Daoye Wang ⋅ Timo Bolkart
Abstract
We present CUBE (Control-based Unified B-Splinie Encoding), a new geometric representation for digital humans that combines B-Spline volumes with learned features, and demonstrate its use as decoder for 3D scan registration and monocular 3D face reconstruction. Unlike existing B-Spline representations that use 3D control points, CUBE is parametrized by a lattice (e.g., $8 \times 8 \times 8$) of high-dimensional control features, increasing the models' expressivity. These control features define a continuous mapping from a 3D parametric domain to 3D Euclidean space through an intermediate feature space, which is evaluated in two stages. First, high-dimensional control features are locally blended using the B-Spline bases, yielding a high-dimensional feature vector, where the first three values are the 3D coordinates of a coarse base mesh. This feature vector is input to a small MLP to predict a residual from the base shape, resulting in refined 3D point coordinates. To reconstruct 3D surfaces in dense semantic correspondence, we query CUBE at 3D coordinates sampled from a fixed template mesh. Crucially, CUBE retains the local support of traditional B-spline representations, enabling us to locally edit the surface by updating individual control features. We demonstrate the strengths of this representation by training two transformer-based encoders to predict CUBE's control features from unstructured point clouds and monocular images, achieving state-of-the-art scan registration results compared to recent geometric and multi-view baselines.
Successful Page Load