AI-assisted cultural heritage dissemination: Comparing NMT and glossary-augmented LLM translation in rock art documents

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Research Methodology & Innovation · Depth: Expert, extended

Summary

A study compared three machine translation (MT) setups for Spanish-to-English translation of a terminology-dense rock art text, focusing on operational feasibility for cultural heritage dissemination. The setups included DeepL as an NMT baseline, Gemini-Simple (an LLM with a basic prompt), and Gemini-RAG (the same LLM augmented with a 200-term bilingual glossary via retrieval-augmented generation). Human evaluation using PEARMUT involved multi-way Direct Assessment (0-100) for overall quality and targeted terminology auditing with a restricted MQM taxonomy. Gemini-RAG achieved the highest exact-match terminology accuracy at 81.4%, significantly outperforming Gemini-Simple (69.1%) and DeepL (64.4%). Crucially, Gemini-RAG maintained overall translation quality (mean DA 85.3) comparable to Gemini-Simple (85.2), both superior to DeepL (80.3). This indicates that lightweight glossary augmentation substantially improves terminology control without degrading overall quality.

Key takeaway

For cultural heritage institutions and translators seeking to scale multilingual dissemination, your focus should be on implementing lightweight terminology management. Even a small, ad-hoc glossary, when integrated with LLM-based translation via simple RAG prompting, can dramatically improve lexical control and consistency. This approach offers a pragmatic path to enhancing translation quality for specialized content, reducing post-editing burden, and building trust without requiring extensive resources or complex model modifications.

Key insights

Glossary-augmented LLMs significantly improve terminology accuracy in specialized translation without sacrificing overall quality.

Principles

Method

Compare NMT, basic LLM, and glossary-augmented LLM translation using human evaluation via Direct Assessment for overall quality and MQM-style auditing for terminology accuracy and error types.

In practice

Topics

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.