Efficient Equivariant Transformer for Self-Driving Agent Modeling
Scott Xu ⋅ Dian Chen ⋅ Kelvin Wong ⋅ Chris Zhang ⋅ Kion Fallah ⋅ Raquel Urtasun
Abstract
Accurately modeling agent behaviors is an important task in self-driving.It is also a task with many symmetries, such as equivariance to the order ofagents and objects in the scene or equivariance to arbitrary roto-translationsof the entire scene as a whole; i.e., SE(2)-equivariance.The transformer architecture is a ubiquitous tool for modeling these symmetries.While standard self-attention is inherently permutation equivariant,explicit pairwise relative positional encodings have been the standard for introducing SE(2)-equivariance.However, this approach introduces an additional cost that is quadratic in the number of agents,limiting its scalability to larger scenes and batch sizes.In this work, we propose DriveGATr, a novel transformer-based architecture for agent modeling that achievesSE(2)-equivariance without the computational cost of existing methods.Inspired by recent advances in geometric deep learning, DriveGATr encodes scene elementsas multivectors in the 2D projective geometric algebra $\mathbb{R}^*_{2,0,1}$ and processesthem with a stack of equivariant transformer blocks.Crucially, DriveGATr models geometric relationships using standard attentionbetween multivectors, eliminating the need for costly explicit pairwise relative positional encodings.Experiments on the Waymo Open Motion Dataset demonstrate that DriveGATr is comparable to thestate-of-the-art in traffic simulation and establishes a superior Pareto front for performancevs computational cost.
Successful Page Load