YOLO-ULM: Ultra-Lightweight Models for Real-Time Object Detection
Shasha Han ⋅ Chong Li ⋅ Xinning Wang ⋅ Xuebo Li
Abstract
YOLO series lead object detection with superior accuracy and speed. However, both convolutional and self-attention based architectures suffer from parameter redundancy and insufficient computational efficiency. Existing lightweight methods excessively pursue speed while ignoring the loss of important information during feature extraction and spatial transformation across different stages. Thus, effective lightweighting is crucial for detection performance. We propose YOLO-ULM, an ultra-lightweight real-time detector that achieves accelerated inference while preserving high accuracy. We innovatively design a variety of dual efficiency- and accuracy-driven modules, including efficient feature aggregation and multi-scale downsampling modules, as well as a more focused complete-IoU loss function. To validate our approach, we train it from scratch on COCO dataset without pretrained weights. By refining backbone parameters, we extend it to YOLO-ULM-Turbo for accelerated inference. YOLO-ULM surpasses state-of-the-art real-time detectors like YOLOv11/YOLOv12/YOLOv13 and RT-DETR. On a T4 GPU, YOLO-ULM-N achieves 41.6\% mAP with an inference latency of 1.52 ms, outperforming YOLOv11-N (2.2\%$\uparrow$) and YOLOv12-N (1.0\%$\uparrow$). YOLO-ULM-S exceeds RT-DETR-R18 by 1.6\% mAP with 64.7\% fewer FLOPs and 63\% fewer parameters. YOLO-ULM-L / X surpass YOLOv13-L / X by 0.7\% and 0.8\% respectively in mAP. YOLO-ULM-Turbo matches YOLOv12-Turbo's performance but uses less computation, with Turbo-N variant achieving 0.3\% higher mAP and 16\% fewer parameters than YOLOv12-Turbo-N.
Successful Page Load