AKCMamba-YOLO: Selective State Space Models For Real-Time Object Detection
Abstract
The YOLO (You Only Look Once) series has been a cornerstone in real-time object detection, renowned for its efficient convolutional design and rapid inference. However, its reliance on convolutional operations inherently limits its ability to capture long-range dependencies and rich contextual information, leading to suboptimal performance in complex scenes. Recently, SSM (State Space Models) have emerged as an efficient alternative to attention mechanisms, offering global representation with linear time complexity. In this paper, we propose AKCMamba-YOLO, a novel object detector that incorporates SSM into the YOLO architecture. We introduce 3CAKCMamba and 4CAKCMamba modules to a novel object detection framework, enabling enhanced channel interaction and cross-layer semantic fusion. This design improves multi-scale feature modeling while maintaining computational efficiency. To support safety-critical applications, we provide railway pedestrian Detection datasets with 2,975 annotated images under complex scenarios. Experiments on COCO2017, power tower foreign object detection datasets, and our custom dataset show that AKCMamba-YOLO achieves superior accuracy and speed compared to state-of-the-art baselines, making it well-suited for real-time and resource-constrained environments.