Real2Sim2Real: RetinalDepth-64K for Depth Estimation in Posterior Segment Ophthalmic Surgery
Abstract
Accurate depth estimation is crucial for 3D reconstruction and precise navigation in ophthalmic fundus surgery. However, acquiring annotated data remains challenging due to the impracticality of depth sensors under surgical microscopes.To overcome this limitation, we introduce RetinalDepth-64K, a novel synthetic dataset comprising 64,000 stereo image pairs across 1,280 diverse scenes, developed through a Real2Sim2Real pipeline that transforms real-world fundus surgery videos into synthetic data and facilitates model deployment in real scenarios. We analyzed key characteristics such as intricate retinal textures from real-world videos to guide the Real-to-Sim phase, enabling realistic data synthesis.To improving dataset fidelity for depth estimation, we created 3D eye models using Blender with ultra-wide-field retinal textures, glass-modeled aqueous humor, and dynamic instrument trajectories, enhanced by post-processing to ensure photorealism.The dataset provides RGB images, depth maps, normal maps, and instrument segmentation masks from binocular view, supporting the training of monocular, binocular, and video-based depth estimation models to enhance robustness. In the Sim-to-Real phase, quantitative and qualitative experiments show that finetuning foundation models with RetinalDepth-64K produces accurate depth predictions for synthetic data. Comparative analysis on results of zeroshot and finetuned models further validates robust generalization to real fundus surgery scenes, offering significant potential to enhance surgical precision and support the training of novice surgeons through reliable depth cues.As the first dataset of its kind for retinal surgery, RetinalDepth-64K offers a vital resource for advancing 3D reconstruction and surgical navigation in ophthalmology.