Target-Side Paraphrase Augmentation for Sign Language Translation with Large Language Models

2026-06-19 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, long

Summary

A study introduces LLM-based target-side paraphrase augmentation for Sign Language Translation (SLT), utilizing GPT-4o to generate three controlled paraphrase variants for each reference sentence while keeping the sign input unchanged. A Signformer-style pose-based Transformer is trained in two stages: pre-training on the augmented corpus and fine-tuning on original references. The approach was evaluated on PHOENIX14T (German Sign Language), GSL (Greek Sign Language), and LSA-T (Argentinian Sign Language). On PHOENIX14T, BLEU-4 improved from 9.56 to 10.33. However, the method showed limited benefit on the near-saturated GSL baseline and the extremely sparse LSA-T dataset. A semantic evaluation using GPT-5.2 as an LLM-as-a-Judge revealed fidelity gains on PHOENIX14T (from 2.51 to 3.65, +45%) and GSL (from 7.72 to 8.77, +13.6%) that lexical overlap metrics like BLEU-4 understated.

Key takeaway

For NLP Engineers developing Sign Language Translation systems, if you are working with moderately diverse corpora like PHOENIX14T, consider implementing LLM-based target-side paraphrase augmentation. This can significantly improve semantic fidelity, even if BLEU scores show modest gains. However, for highly repetitive or extremely sparse datasets, this approach may offer limited benefits. Evaluate using LLM-as-a-Judge to capture true semantic improvements.

Key insights

LLM-generated target-side paraphrases can augment Sign Language Translation, improving semantic fidelity, but effectiveness varies by corpus characteristics.

Principles

Augmentation benefits depend on corpus lexical diversity.
Semantic evaluation reveals gains beyond lexical metrics.
Two-stage training broadens and re-centers decoder.

Method

GPT-4o generates three paraphrases per video-text pair, filtered by surface-form similarity (0.3-0.95). A Signformer is pre-trained on this augmented data, then fine-tuned on original references.

In practice

Use GPT-4o for target-side paraphrase generation.
Filter paraphrases by 0.3-0.95 similarity.
Employ LLM-as-a-Judge for semantic evaluation.

Topics

Sign Language Translation
Large Language Models
Data Augmentation
GPT-4o
Signformer
Semantic Evaluation
PHOENIX14T

Best for: Research Scientist, AI Scientist, NLP Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.