MS^2Gait: A Multi-Scale Spatio-Temporal Fusion Network for LiDAR-based Gait Recognition
Abstract
3D LiDAR-based gait recognition has gained increasing attention due to its robustness to illumination, privacy preservation, and capability for long-range and non-contact identity verification. However, existing point cloud-based methods suffer from two critical limitations: they fail to model semantically distant correlations across spatial scales and employ simplistic temporal aggregation that cannot handle gait's inherent heterogeneity. To address these limitations, we propose MS^2Gait, a multi-scale spatio-temporal framework tailored for raw point cloud gait recognition. Our Hierarchical Spatial Feature Extraction module introduces four complementary interaction strategies to explicitly capture long-range semantic dependencies and recover structural information under blockage. Additionally, a Similarity-based Temporal Enhancement Transformer strategy leverages multi-scale aggregation to dynamically weight frames based on motion coherence, effectively handling temporal heterogeneity without explicit supervision. Extensive evaluations on SUSTech1K and FreeGait demonstrate that MS^2Gait achieves 93.5% and 83.1% in Rank-1 accuracy, respectively, outperforming prior state-of-the-art methods, while exhibiting significant robustness against non-gait nuisance factors.