ProgTrack: A Multi-Object Tracking Algorithm with Progressive Matching Strategy
Abstract
Multi-object tracking (MOT) based on unmanned aerial vehicle (UAV) aims to identify and continuously track the positions of multiple ground targets during UAV flight. Current mainstream methods utilize appearance matching and motion matching to match targets in consecutive frames. However, these methods often fail in the following scenarios: First, scenarios with multi-scale targets, where small targets have weak appearance features and small bounding boxes; second, scenarios with complex backgrounds or occlusions, where the background or occlusions interfere with the appearance features and change the bounding box size of targets; third, scenarios where the UAV lens shakes, rotates, or zooms, leading to misalignment between consecutive frames; and fourth, scenarios with high targets similarity, where the appearance features between targets are difficult to distinguish, such as vehicles on a road. To address these issues, we propose a multi-object tracking algorithm, ProgTrack, based on a multi-stage progressive matching mechanism. This algorithm simulates human eye tracking strategies, employing a progressive process of "first matching easily matched large targets, then matching difficult-to-match small targets, and finally matching the remaining mixed-scale targets." Similarly, ProgTrack employs three strategies for target matching at different scales and appearances: a simple Local Motion Information (LMI) matching strategy for large targets, a complex Context Enhancement Feature (CE-Feature) matching strategy for small targets, and a Global Motion Information (GMI) matching strategy for multi-scale targets matching, thereby achieving target matching. On the VisDrone2019 UAV tracking dataset, ProgTrack achieves MOTA, MOTP, and IDF1 scores of 40.2, 77.5, and 52.8, respectively, demonstrating state-of-the-art performance among ten methods.