Skip to yearly menu bar Skip to main content


Uncertainty-aware Action Decoupling Transformer for Action Anticipation

Hongji Guo · Nakul Agarwal · Shao-Yuan Lo · Kwonjoon Lee · Qiang Ji

Arch 4A-E Poster #389
award Highlight
[ ]
Thu 20 Jun 5 p.m. PDT — 6:30 p.m. PDT


Human action anticipation aims at predicting what people will do in the future based on past observations. In this paper, we introduce Uncertainty-aware Action Decoupling Transformer (UADT) for action anticipation. Unlike existing methods that directly predict action in a verb-noun pair format, we decouple the action anticipation task into verb and noun anticipations separately. The objective is to make the two decoupled tasks assist each other and eventually improve the action anticipation task. Specifically, we propose a two-stream Transformer-based architecture which is composed of a verb-to-noun model and a noun-to-verb model. The verb-to-noun model leverages the verb information to improve the noun prediction and the other way around. We extend the model in a probabilistic manner and quantify the predictive uncertainty of each decoupled task to select features. In this way, the noun prediction leverages the most informative and redundancy-free verb features and verb prediction works similarly. Finally, the two streams are combined dynamically based on their uncertainties to make the joint action anticipation. We demonstrate the efficacy of our method by achieving state-of-the-art performance on action anticipation benchmarks including EPIC-KITCHENS, EGTEA Gaze+, and 50 Salads.

Live content is unavailable. Log in and register to view live content