Reliable Policy Transfer for Safety-Aware End-to-End Driving with Deep Reinforcement Learning
Abstract
End-to-End (E2E) Reinforcement Learning (RL) for autonomous driving still struggles with safety and generalization under distribution shift, as perception-heavy encoders, sparse rewards, and ad hoc uncertainty handling often yield brittle closed-loop behavior. This work introduces a unified Deep RL (DRL) framework addressing key gaps: causal ego-centric state design, dense differentiable rewards, joint uncertainty estimation with entropy gating, and control-level policy transfer. An ego-centric relational graph encodes agent influence via uncertainty-weighted attention over kinematics, lane geometry, and semantics, producing a compact control state. A multi-objective differentiable reward stabilizes optimization by shaping safety, progress, and comfort with an uncertainty term. Aleatoric and epistemic uncertainty-captured through per-edge heteroscedastic variance and a critic ensemble-are aggregated into a calibrated confidence signal that modulates policy entropy for risk-aware exploration. A causal-semantic transfer objective aligns actions, attention, and uncertainty statistics across domains, combined with meta-learned initialization for few-shot adaptation. In closed-loop urban driving across varied towns, traffic, and weather, the framework improves success rate, reduces infractions per kilometer, and achieves higher time-to-conflict with lower lateral deviation and comfort cost compared to strong baselines.