AI-assisted cultural heritage dissemination: Comparing NMT and glossary-augmented LLM translation in rock art documents
Summary
A study compared three machine translation (MT) setups for Spanish-to-English translation of a terminology-dense rock art text, focusing on operational feasibility for cultural heritage dissemination. The setups included DeepL as an NMT baseline, Gemini-Simple (an LLM with a basic prompt), and Gemini-RAG (the same LLM augmented with a 200-term bilingual glossary via retrieval-augmented generation). Human evaluation using PEARMUT involved multi-way Direct Assessment (0-100) for overall quality and targeted terminology auditing with a restricted MQM taxonomy. Gemini-RAG achieved the highest exact-match terminology accuracy at 81.4%, significantly outperforming Gemini-Simple (69.1%) and DeepL (64.4%). Crucially, Gemini-RAG maintained overall translation quality (mean DA 85.3) comparable to Gemini-Simple (85.2), both superior to DeepL (80.3). This indicates that lightweight glossary augmentation substantially improves terminology control without degrading overall quality.
Key takeaway
For cultural heritage institutions and translators seeking to scale multilingual dissemination, your focus should be on implementing lightweight terminology management. Even a small, ad-hoc glossary, when integrated with LLM-based translation via simple RAG prompting, can dramatically improve lexical control and consistency. This approach offers a pragmatic path to enhancing translation quality for specialized content, reducing post-editing burden, and building trust without requiring extensive resources or complex model modifications.
Key insights
Glossary-augmented LLMs significantly improve terminology accuracy in specialized translation without sacrificing overall quality.
Principles
- Terminology control is crucial for specialized translation quality.
- Overall translation quality and terminology compliance are distinct metrics.
- Lightweight interventions can yield substantial gains.
Method
Compare NMT, basic LLM, and glossary-augmented LLM translation using human evaluation via Direct Assessment for overall quality and MQM-style auditing for terminology accuracy and error types.
In practice
- Create minimal glossaries for high-impact terms.
- Inject glossary entries selectively via RAG prompting.
- Focus quality evaluation on terminology-sensitive points.
Topics
- LLM Translation
- Cultural Heritage Dissemination
- Retrieval-Augmented Generation
- Terminology Control
- Machine Translation Evaluation
Best for: AI Scientist, NLP Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.