Skip to yearly menu bar Skip to main content


Poster

Boosting Point-Supervised Temporal Action Localization through Integrating Query Reformation and Optimal Transport

Mengnan Liu · Le Wang · Sanping Zhou · Kun Xia · Xiaolong Sun · Gang Hua


Abstract:

Point-supervised Temporal Action Localization poses significant challenges due to the difficulty of identifying complete actions with a single-point annotation per action. Existing methods typically employ Multiple Instance Learning, which struggles to capture global temporal context and requires heuristic post-processing. In research on fully-supervised tasks, DETR-based structures have effectively addressed these limitations. However, it is nontrivial to merely adapt DETR to this task, encountering two major bottlenecks. (1) How to integrate point label information into the model and (2) How to select optimal decoder proposals for training in the absence of complete action segment annotations. To address this issue, we introduce an end-to-end framework by integrating Query Reformation and Optimal Transport (QROT). Specifically, we encode point labels through a set of semantic consensus queries, enabling effective focus on action-relevant snippets. Furthermore, we integrate an optimal transport mechanism to generate high-quality pseudo labels. These pseudo-labels facilitate precise proposals selection based on Hungarian algorithm, significantly enhancing localization accuracy in point-supervised settings. Extensive experiments on the THUMOS14 and ActivityNet-v1.3 datasets demonstrate that our method outperforms existing MIL-based approaches, offering more stable and accurate temporal action localization in point-level supervision. The code will be publicly available.

Live content is unavailable. Log in and register to view live content