LiDeRe: A Lightweight Readout for Fast and Data-Efficient Dense Prediction
Abstract
Parameter-efficient fine-tuning (PEFT) methods have recently gained popularity for applying deep neural networks on small datasets as they reduce overfitting, simplify deployment, and enable fast training. We demonstrate that for dense image prediction tasks, a well-designed and lightweight dense readout on top of a frozen large backbone can surpass state-of-the-art PEFT methods in both efficiency and accuracy. Our parameter-efficient readout module combines interpolation and attention for fine-grained dense prediction. It integrates seamlessly with a wide range of pretrained vision backbones such as DINOv3. We achieve competitive or superior performance in semantic segmentation, object detection, pose estimation and semantic contour prediction, offering an efficient alternative to current PEFT techniques. Code: https://to.be.released