Target-Side Paraphrase Augmentation for Sign Language Translation with Large Language Models
Summary
A study introduces LLM-based target-side paraphrase augmentation for Sign Language Translation (SLT), utilizing GPT-4o to generate three controlled paraphrase variants for each reference sentence while keeping the sign input unchanged. A Signformer-style pose-based Transformer is trained in two stages: pre-training on the augmented corpus and fine-tuning on original references. The approach was evaluated on PHOENIX14T (German Sign Language), GSL (Greek Sign Language), and LSA-T (Argentinian Sign Language). On PHOENIX14T, BLEU-4 improved from 9.56 to 10.33. However, the method showed limited benefit on the near-saturated GSL baseline and the extremely sparse LSA-T dataset. A semantic evaluation using GPT-5.2 as an LLM-as-a-Judge revealed fidelity gains on PHOENIX14T (from 2.51 to 3.65, +45%) and GSL (from 7.72 to 8.77, +13.6%) that lexical overlap metrics like BLEU-4 understated.
Key takeaway
For NLP Engineers developing Sign Language Translation systems, if you are working with moderately diverse corpora like PHOENIX14T, consider implementing LLM-based target-side paraphrase augmentation. This can significantly improve semantic fidelity, even if BLEU scores show modest gains. However, for highly repetitive or extremely sparse datasets, this approach may offer limited benefits. Evaluate using LLM-as-a-Judge to capture true semantic improvements.
Key insights
LLM-generated target-side paraphrases can augment Sign Language Translation, improving semantic fidelity, but effectiveness varies by corpus characteristics.
Principles
- Augmentation benefits depend on corpus lexical diversity.
- Semantic evaluation reveals gains beyond lexical metrics.
- Two-stage training broadens and re-centers decoder.
Method
GPT-4o generates three paraphrases per video-text pair, filtered by surface-form similarity (0.3-0.95). A Signformer is pre-trained on this augmented data, then fine-tuned on original references.
In practice
- Use GPT-4o for target-side paraphrase generation.
- Filter paraphrases by 0.3-0.95 similarity.
- Employ LLM-as-a-Judge for semantic evaluation.
Topics
- Sign Language Translation
- Large Language Models
- Data Augmentation
- GPT-4o
- Signformer
- Semantic Evaluation
- PHOENIX14T
Best for: Research Scientist, AI Scientist, NLP Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.