Skip to yearly menu bar Skip to main content


Poster

Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues

Youngjoon Jang · Haran Raajesh · Liliane Momeni · Gul Varol · Andrew Zisserman


Abstract:

Our objective is to translate continuous sign language into spoken language text. Inspired by the way human interpreters rely on context for accurate translation, we incorporate additional contextual cues together with the signing video, into a novel translation model. Specifically, besides visual sign recognition features that encode the input video, we integrate complementary textual information from (I) captions describing the background show; (ii) pseudo-glosses transcribing the signing; and (iii) translation of previous sentences. These are automatically extracted and inputted along with the visual features to a pre-trained large language model (LLM), which we fine-tune to generate spoken language translations in text form. Through extensive ablation studies, we show the positive contribution of each input cue to the translation performance. We train and evaluate our approach on BOBSL -- the largest British Sign Language dataset currently available. We show that our contextual approach significantly enhances the quality of the translations compared to previously reported results on BOBSL, and also to state-of-the-art methods that we implement as baselines. Furthermore, we demonstrate the generality of our approach by applying it also to How2Sign, an American Sign Language dataset, and achieve competitive results.

Live content is unavailable. Log in and register to view live content