Skip to yearly menu bar Skip to main content


Poster

Semantic-guided Cross-Model Prompt Learning for skeleton-based zero-shot action recognition

Anqi Zhu · Jingmin Zhu · James Bailey · Mingming Gong · Qiuhong Ke


Abstract:

Skeleton-based human action recognition has emerged as a promising approach due to its privacy preservation, robustness to visual challenges, and computational efficiency. While current methods predominantly rely on fully supervised learning, the practical necessity to recognize unseen actions has led to increased interest in zero-shot skeleton-based action recognition (ZSSAR). Existing ZSSAR approaches often rely on manually crafted action descriptions and movement assumptions, limiting their flexibility across diverse action classes. To overcome this, we introduce Semantic-guided Cross-Model Prompt Learning (SCoPLe), a novel framework that replaces manual guidance with data-driven prompt learning for skeletal and textual knowledge refinement and alignment. Specifically, we introduce a dual-stream language prompting module that selectively preserves the original semantic context, effectively enhancing the prompting features. We also introduce a semantic-guided adaptive skeleton prompting module that learns joint-level prompts for skeleton features and incorporates an adaptive visual representation sampler that leverages text semantics to strengthen the cross-modal prompting interactions during skeleton-to-text embedding projection. Experimental results on the NTU-RGB+D 60, NTU-RGB+D 120, and PKU-MMD datasets demonstrate the state-of-the-art performance of our method in both ZSSAR and Generalized ZSSAR scenarios.

Live content is unavailable. Log in and register to view live content