Skip to yearly menu bar Skip to main content


Poster

PHGC: Procedural Heterogeneous Graph Completion for Natural Language Task Verification in Egocentric Videos

Xun Jiang · Zhiyi Huang · Xing Xu · Jingkuan Song · Fumin Shen · Heng Tao Shen


Abstract:

Natural Language-based Egocentric Task Verification (NLETV) aims to equip agents with the ability to determine if operation flows of procedural tasks in egocentric videos align with natural language instructions. Describing rules with natural language provides generalizable applications, but also raises cross-modal heterogeneity and hierarchical misalignment challenges. In this paper, we proposed a novel approach termed Procedural Heterogeneous Graph Completion (PHGC), which addresses these challenges with heterogeneous graphs representing the logic in rules and operation flows. Specifically, our PHGC method mainly consists of three key components: (1) Heterogeneous Graph Construction module that defines objective states and operation flows as vertices, with temporal and sequential relations as edges. (2) Cross-Modal Path Finding module that aligns semantic relations between hierarchical video and text elements. (3) Discriminative Entity Representation module excavating hidden entities that integrate the general logical relations and discriminative cues to reveal final verification results. Additionally, we further constructed a new dataset called CSV-NL comprised of realistic videos. Extensive experiments on the two benchmark datasets covering both digital and physical scenarios, i.e., EgoTV and CSV-NL, demonstrate that our proposed PHGC establishes state-of-the-art performance across different settings. Our implementation is available at https://anonymous.4open.science/r/PHGC-7A1B.

Live content is unavailable. Log in and register to view live content