Skip to yearly menu bar Skip to main content


DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking

Fei Xie · Zhongdao Wang · Chao Ma

Arch 4A-E Poster #435
[ ]
Thu 20 Jun 5 p.m. PDT — 6:30 p.m. PDT


Existing Siamese or transformer trackers commonly pose visual object tracking as a one-shot detection problem, i.e., locating the target object in a \textbf{single forward evaluation} scheme. Despite the demonstrated success, these trackers may easily drift towards distractors with similar appearance due to the single forward evaluation scheme lacking self-correction. To address this issue, we cast visual tracking as a point-set based denoising diffusion process and propose a novel generative learning based tracker, dubbed DiffusionTrack. Our DiffusionTrack possesses two appealing properties: 1) It takes a novel noise-to-target tracking paradigm that leverages multiple denoising diffusion steps to localize the target in a dynamic searching manner per frame. 2) It learns diffusion models on point-set, which can better handle appearance variations for more precise localization. One side benefit is that DiffusionTrack greatly simplifies the post-processing of locating the target in bounding boxes as the widely used window penalty scheme is no longer needed for point-set. Without bells and whistles, our proposed DiffusionTrack achieves the leading performance over the state-of-the-art trackers and runs in real-time.

Live content is unavailable. Log in and register to view live content