Dual-branch Distilled Transformer for Efficient Asymmetric UAV Tracking
Abstract
Given the real-time demands of UAV tracking, many methods simplify the backbone to reduce computation, but this often weakens feature representation and degrades performance in complex scenarios. To alleviate this issue, we propose EATrack, an efficient and asymmetric UAV tracking framework centered around a teacher-guided dual-branch distillation strategy that enhances the feature expressiveness of the lightweight student model. Specifically, EATrack investigates two complementary perspectives of knowledge transfer: a spatially focused feature-level distillation that compensates for weakened representations by guiding the student to learn strong target representations, and a prediction-level distillation that enhances spatial localization by learning the teacher’s capability of accurate target localization. Furthermore, to enhance robustness against appearance variations, we introduce a fine-grained target-aware distillation strategy that selectively transfers the teacher’s target modeling capacity to the student. While the asymmetric architecture improves efficiency, it limits temporal adaptability. To mitigate this, a temporal adaptation module is incorporated at inference to enhance robustness over time. Experiments on five UAV benchmarks demonstrate that EATrack achieves a favorable balance between accuracy and speed, with EATrack-DeiT improving average success rate by 1.2\% over the previous SOTA while running at 241.9 FPS on GPU.