TokenHand: Discrete Token Representation for Efficient Hand Mesh Reconstruction
Xinguo He ⋅ Yixin Shen ⋅ Rahul Chaudhari
Abstract
Hand mesh reconstruction has attracted growing attention in recent years.Despite significant progress, existing methods often struggle to balance reconstruction quality and inference efficiency.In this work, we propose TokenHand, a novel framework for single-view 3D hand mesh reconstruction that achieves both high accuracy and real-time inference.Our method represents a 3D hand model using $M$ discrete tokens, each describing a specific sub-structure of the hand.This compositional representation enables efficient modeling with minimal reconstruction error.Furthermore, we reformulate hand mesh reconstruction as a classification problem rather than a regression task.Specifically, a classifier predicts the categories of the $M$ tokens from an input image, and a pre-trained decoder network subsequently reconstructs the 3D hand mesh from the predicted tokens without any post-processing.Extensive experiments demonstrate that TokenHand achieves comparable or superior performance to existing methods across standard benchmarks, while maintaining high efficiency in practical scenarios.
Successful Page Load