RAG-TP: A General Framework for Vehicle Trajectory Prediction via Retrieval-Augmented Generation
Abstract
Vehicle trajectory prediction is a critical technology for safe and efficient autonomous driving.However, its generalization and scalability have long been hindered by a heavy reliance on real-time, online priors.To break this bottleneck, we introduce RAG-TP, a general framework that reframes the problem from relying on uncertain online perception to retrieving from a large-scale, structured, offline knowledge base.The core of RAG-TP is to enhance predictions at inference time by dynamically querying a pre-built, heterogeneous knowledge base rich with scene topologies and motion patterns, using the retrieved historical experiences as priors.We further design a dynamic fusion module based on a learnable Mixture-of-Experts (MoE), which intelligently weights and integrates the multi-source retrieved knowledge via cross-attention to generate a high-density context for the final multi-modal trajectory decoding.By decoupling online inference from offline knowledge, this retrieval-augmented approach grounds predictions in a vast structured database, thereby mitigating model hallucination and compensating for unreliable priors to significantly enhance robustness and domain adaptation.Extensive experiments demonstrate that RAG-TP achieves excellent performance in both map-based and map-free settings, surpassing existing map-free methods while achieving performance comparable to state-of-the-art (SOTA) map-based models.It demonstrates significant advantages, particularly in cross-domain and zero-shot generalization tasks.Our work provides a promising and effective technical pathway toward building more scalable and robust prediction systems for autonomous driving.