Generative Point Tracking and Forecasting
Xuanchen Lu ⋅ Ang Cao ⋅ Chao Feng ⋅ Andrew Owens
Abstract
Motion forecasting predicts where points will move in the future, while motion tracking predicts where they are in the present. Despite these conceptual similarities, existing approaches to these two problems are quite different. In this paper, we propose a unified model that can address both tasks. We train a causal, video-conditioned flow matching model to predict point positions. The resulting model can easily toggle between point tracking to forecasting by changing its visual signal. Despite our model's simplicity, we find that it outperforms prior work in point forecasting and obtains performance that is competitive with the state-of-the-art on the TAP-Vid DAVIS benchmark.
Successful Page Load