Gamba: Mamba-based graph convolutional network with dynamic graph topology learning for action recognition
Abstract
The graph convolutional network has been an important tool for skeleton-based action recognition. However, existing graph models predominantly utilize self-attention mechanisms to model feature correlations between the joints of each sample, which not only neglects dynamic relation dependencies in temporal dimension but also leads to redundant computation as well as to the difficulty in establishing a unified framework for joint relation representation. To address these problems, this paper develops a Mamba-based graph convolution network (Gamba) with dynamic graph topology learning. Specifically, in order to capture local motion patterns through aggregation of intra-class information, a classification-based Mamba module is developed to categorize motion joints into distinct types. To the best of our knowledge, this is the first work to assign motion joints with label information to facilitate correlation learning. To capture the underlying relation of the joints of different categories, the state space model is introduced to the proposed method to process enhanced temporal features, aiming at learning dynamic adjacency matrices for long-range dependencies of the joints across different categories. The proposed framework not only facilitates an adaptive focus on the spatio-temporal feature modeling, but also has less computation complexity than traditional self-attention-based approaches. Extensive experiments on the public NTU RGB+D 60/120 and NW-UCLA benchmark datasets demonstrate the superiority of the proposed model over state-of-the-art methods in recognition accuracy. The proposed framework provides new insights into effective and efficient skeleton-based action recognition and can be potentially applied to a variety of real-world applications.