SMAP: Semantic Route Planning with Map-Grounded Multimodal Alignment
Abstract
Semantic route planning involves generating itineraries that align with user intent while respecting real-world spatial constraints. However, text-only large language models (LLMs) often hallucinate geographically implausible routes due to poor spatial grounding. Inspired by how humans use maps for route planning, we propose the SMAP, which is the first multimodal framework combining user queries, POI metadata, and map tiles to produce spatially coherent, preference-aware routes. To enhance the spatial consistency, the SMAP features a two-stage anti-hallucination mechanism: (1) a map-grounded self-editing pipeline where a multimodal LLM (MLLM) drafts routes and a second MLLM verifies and refines them using geographic evidence; and (2) hallucination-penalized Direct Preference Optimization (HDPO) that steers the route generator toward spatially plausible routes by using verified routes as accepted responses and hallucinated drafts as rejected ones. Additionally, we introduce MM-Route, the first multimodal dataset for semantic route planning, with 3,000 diverse queries annotated with POI metadata and map tiles, covering a broad spectrum of geographic granularities and user intents. Experimental results demonstrate that SMAP significantly reduces geographical hallucinations and outperforms strong baselines in spatial plausibility and user alignment. The code and dataset will be made publicly available.