NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning
Abstract
Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. Current VLAs face two challenges: (1) they require extensive datasets annotated with reasoning traces, and (2) these traces greatly increase token counts, inflating training and inference costs. We propose NoRD (No Reasoning for Driving), a data- and inference-efficient VLA that addresses both. Compared to existing VLAs, NoRD achieves competitive performance while being fine-tuned on atleast <60% of the data and no reasoning annotations, resulting in 3x fewer tokens. Our approach applies Reinforcement Learning (RL) to fine-tune a Supervised fine-tuning (SFT) policy trained on a small, reasoning-free dataset. However, we observe that the standard RL algorithm, Group Relative Policy Optimization (GRPO), fails to yield significant improvements over this data-efficient SFT policy. We find that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. NoRD overcomes this limitation by incorporating Dr.GRPO, a recent algorithm designed to mitigate difficulty bias in LLMs. As a result, NoRD achieves competitive performance on Waymo and NAVSIM without large datasets, reasoning or additional inputs, enabling scalable, data-efficient training, and fast inference.