Skip to yearly menu bar Skip to main content


Poster

Learned Binocular-Encoding Optics for RGBD Imaging Using Joint Stereo and Focus Cues

Yuhui Liu · Liangxun Ou · Qiang Fu · Hadi Amata · Wolfgang Heidrich · YIFAN PENG


Abstract:

Extracting high-fidelity RGBD information from two-dimensional (2D) images is essential for various visual computing applications. Stereo imaging, as a reliable passive imaging technique for obtaining three-dimensional (3D) scene information, has benefited greatly from deep learning advancements. However, existing stereo depth estimation algorithms struggle to perceive high-frequency information and resolve high-resolution depth maps in realistic camera settings with large depth variations. These algorithms commonly neglect the hardware parameter configuration, limiting the potential for achieving optimal solutions solely through software-based design strategies.This work presents a hardware-software co-designed RGBD imaging framework that leverages both stereo and focus cues to reconstruct texture-rich color images along with detailed depth maps over a wide depth range. A pair of rank-2 parameterized diffractive optical elements (DOEs) is employed to encode perpendicular complementary information optically during stereo acquisitions. Additionally, we employ an IGEV-UNet-fused neural network tailored to the proposed rank-2 encoding for stereo matching and image reconstruction. Through prototyping a stereo camera with customized DOEs, our deep stereo imaging paradigm has demonstrated superior performance over existing monocular and stereo imaging systems in both image PSNR by 2.96 dB gain and depth accuracy in high-frequency details across distances from 0.67 to 8 meters.

Live content is unavailable. Log in and register to view live content