Real-Time Multimodal Fingertip Contact Detection via Depth and Motion Fusion for Vision-Based Human–Computer Interaction
Mukhiddin Toshpulatov ⋅ Wookey Lee ⋅ Suan Lee ⋅ Geehyuk Lee
Abstract
Precise fingertip contact detection is a fundamental challenge for natural and immersive virtual reality (VR)interaction. However, existing vision-based methods suffer from insufficient accuracy, with typical depth errors(12-25mm) being too large to reliably distinguish between hovering and true contact $(<3mm)$. While commercial motioncapture systems provide sub-millimeter accuracy, their prohibitive cost limits widespread adoption. This paperaddresses this critical gap by developing a highly accurate and cost-effective system for fingertip contact detection.We introduce a novel, specialized dataset of 53,300 RGB-depth pairs capturing millimeter-scale, hand-table typinginteractions. By systematically fine-tuning six state-of-the-art depth estimation architectures on this dataset, wereduce the mean absolute error (MAE) by 68\%, from 12.3mm to a state-of-the-art 3.8mm. Our complete VR keyboard system,TapBoard-X, achieves 95.96\% contact detection accuracy and enables typing speeds of 45.6 WPM with a low 3.1% charactererror rate, rivaling physical keyboards. This performance is achieved at over a 90\% cost reduction compared tocommercial systems, democratizing high-precision hand tracking for the broader research community and paving the wayfor the next generation of tactile VR experiences.
Successful Page Load