AlignAtt4LLM: Fast AlignAtt for Decoder-Only LLMs at IWSLT 2026 Simultaneous Speech Translation Task

2026-06-02 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation, Software Development & Engineering · Depth: Expert, quick

Summary

AlignAtt4LLM is a simultaneous speech translation system developed for the IWSLT 2026 task, translating English into German, Italian, and Chinese. This system employs a synchronous cascade, utilizing Qwen3-ASR for incrementally updated source transcripts via forced alignment, and Gemma-4 E4B-it for translating these prefixes under an MT-side AlignAtt policy. Notably, this marks the first application of AlignAtt to a decoder-only LLM, addressing the absence of encoder-decoder cross-attention found in prior AlignAtt systems. The approach recovers a usable policy through four key proposals: an explicit source span in the prompt, offline selection of translation-specific alignment heads, selective qk-fast replay of the draft-to-source attention block, and runtime query/key capture that preserves bit-identical model outputs. On the IWSLT 2026 development set, AlignAtt4LLM surpassed supplied baselines for English to German and Italian in both low-latency (around 2 seconds) and high-latency (below 4 seconds CU-LongYAAL) regimes, though results for English to Chinese were mixed. The method is generalizable, requiring only a deterministic prompt layout, calibrated attention heads, and query/key capture for reapplication to other decoder-only MT backbones.

Key takeaway

For NLP Engineers developing simultaneous speech translation systems, AlignAtt4LLM demonstrates a viable path to adapt advanced alignment policies to decoder-only LLMs like Gemma-4. You should consider integrating prompt-based source span definition and selective attention head calibration to overcome the lack of traditional encoder-decoder cross-attention. This approach offers competitive latency and accuracy for European languages, suggesting a robust framework for extending real-time translation capabilities with modern LLM architectures.

Key insights

Adapting AlignAtt for decoder-only LLMs enables simultaneous speech translation by re-engineering attention mechanisms.

Principles

Explicit source span improves decoder-only LLM alignment.
Selective attention head use enhances translation quality.
Query/key capture ensures output fidelity.

Method

The system uses Qwen3-ASR for incremental transcription and Gemma-4 E4B-it for translation, applying an AlignAtt policy via prompt-based source span, selected attention heads, qk-fast replay, and runtime query/key capture.

In practice

Apply AlignAtt to Gemma-4 E4B-it for simultaneous MT.
Use prompt engineering for source span definition.
Select specific attention heads for translation tasks.

Topics

Simultaneous Speech Translation
Decoder-Only LLMs
AlignAtt Policy
Gemma-4 E4B-it
Qwen3-ASR
Low-Latency MT

Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.