Skip to yearly menu bar Skip to main content


Poster

Multiple Object Tracking as ID Prediction

Ruopeng Gao · Ji Qi · Limin Wang


Abstract:

Multi-Object Tracking has been a long-standing challenge in video understanding. A natural and intuitive approach is to split this task into two parts: object detection and association. Most mainstream methods employ meticulously crafted heuristic techniques to maintain trajectory information and compute cost matrices for object matching. Although these methods can achieve notable tracking performance, they commonly encounter issues in complex scenarios, thereby often requiring a series of elaborate handcrafted modifications. We believe that manually assumed priors limit the method's adaptability and flexibility, preventing it from directly learning optimal tracking capabilities from domain-specific data. Therefore, we propose a new perspective that treats Multiple Object Tracking as an in-context ID Prediction task, transforming the aforementioned object association into an end-to-end trainable task. Based on this, we proposed a straightforward method termed MOTIP. Without using tailored or sophisticated architectures, our method achieved state-of-the-art results across multiple benchmarks by solely leveraging object-level features as tracking cues. The simplicity and impressive results of MOTIP leave substantial room for future advancements, thereby making it a promising baseline for subsequent research.

Live content is unavailable. Log in and register to view live content