MPL: Match-guided Prototype Learning for Few-shot Action Recognition
Abstract
Current few-shot action recognition methods achieve impressive performance by learning representative prototypes and designing diverse video matching strategies. However, these approaches typically face two critical limitations: i) prototypes learned through implicit sample interactions lack clear semantic correspondence between query-support pairs, limiting their class representativeness; ii) the independent design of prototype learning and matching mechanisms creates a potential incompatibility between prototype representations and matching strategies. To address these limitations, we propose a Match-guided Prototype Learning (MPL) method comprising two key components: enhanced match (E-Match) and key-frame extraction match (K-Match). E-Match explicitly enhances prototype learning in class-specific embeddings by incorporating the matched semantics of query samples, while K-Match further refines the prototype representation through key-frame matching at the fine-grained frame level. Additionally, we propose a Cross-Shot Attention Aggregator (CSA-Aggregator) that dynamically aggregates adjacent frames across support samples, thereby obtaining a prototype representation that captures intra-class shared action patterns. In this way, the proposed MPL effectively mines coarse-to-fine, match-guided semantic information from query-support pairs to generate discriminative class prototypes, and improve the compatibility of prototype representation with the match mechanism. Extensive evaluations on four public datasets confirm that MPL achieves superior performance over leading few-shot action recognition techniques.