Skip to yearly menu bar Skip to main content


Poster

VSNet: Focusing on the Linguistic Characteristics of Sign Language

Yuhao Li · Xinyue Chen · Hongkai Li · Xiaorong Pu · Peng Jin · Yazhou Ren


Abstract:

Sign language is a visual language expressed through complex movements of the upper body. The human skeleton plays a critical role in sign language recognition due to its good separation from the video background. However, mainstream skeleton-based sign language recognition models often overly focus on the natural connections between joints, treating sign language as ordinary human movements, which neglects its linguistic characteristics. We believe that just as letters form words, each sign language gloss can also be decomposed into smaller visual symbols. To fully harness the potential of skeleton data, this paper proposes a novel joint fusion strategy and a visual symbol attention model. Specifically, we first input the complete set of skeletal joints, and after dynamically exchanging joint information, we discard the parts with the weakest connections to other joints, resulting in a fused, simplified skeleton. Then, we group the joints most likely to express the same visual symbol and discuss the joint movements within each group separately. To validate the superiority of our method, we conduct extensive experiments on multiple public benchmark datasets. The results show that, without complex pre-training, we still achieve new state-of-the-art performance.

Live content is unavailable. Log in and register to view live content