Rare-E2E: Rare Events Dataset for End-to-End Driving in Challenging Long-tail Scenarios
Abstract
Vision-based end-to-end (E2E) driving has garnered interest in the research community due to its scalability and synergy with multimodal large language models (MLLMs). However, current E2E driving benchmarks primarily feature nominal scenarios paired with existing open-loop evaluation metrics that fall short in capturing the multimodal nature of driving or effectively evaluating performance in long-tail scenarios. To address these gaps, we introduce the Rare Events Dataset for End-to-End Driving (Rare-E2E). Rare-E2E contains 4,021 driving segments (approximately 12 hours), specifically curated for challenging long-tail scenarios that that are rare in daily life with an occurring frequency of less than 0.03%. Each segment in Rare-E2E includes the high-level routing information, ego states, and 360-degree camera views from 8 surrounding cameras. To evaluate E2E driving performance on these long-tail situations, we propose a novel open-loop evaluation metric: Rater Feedback Score (RFS). Unlike conventional distance-based metrics, RFS measures how closely a predicted trajectory matches rater-annotated trajectory preference labels. Rare-E2E includes rater preference labels for validation, and a separate held out test set is used for the 2025 Rare-E2E benchmark leaderboard.