Reading Your Actions: Learning Generalizable Action Representations via Pre-training AEMG
Abstract
Electromyography (EMG) is crucial for decoding human motor intentions and achieving natural human-computer interaction, but its generalization ability across subjects, devices, and tasks has long been limited by data heterogeneity, scarce annotations, and the lack of a unified representation paradigm. In this work, we introduce a novel perspective on EMG signals, treating muscle contractions as words and activation sequences as sentences. Based on this perspective, we design a Neuromuscular Contraction Tokenizer (NCT) that generates semantically consistent EMG sentences from raw signals. Building on this, we propose the first large-scale pre-training framework for EMG—Any Electromyography (AEMG), a general EMG representation learning framework based on self-supervised pre-training. Furthermore, we construct the largest cross-device EMG vocabulary to date, which supports seamless transfer across arbitrary channel topologies and sampling rates. Extensive experiments demonstrate that AEMG outperforms state-of-the-art baselines by 5.79–9.25% in zero-shot leave-one-subject-out accuracy, and achieves over 90% few-shot adaptation performance with only 5% of the target user’s data. Our work has proposed the concept of electromyography signals as a cross-device physiological language, learned their grammar from massive amounts of data, and laid the groundwork for a single-training, universally applicable EMG foundation model.