Better Literary Translation: A Multi-Aspect Data Generation and LLM Training Approach

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

A new multi-aspect iterative refinement framework significantly enhances literary translation by generating high-quality training data. This approach employs specialized LLM translators, focusing on distinct quality dimensions like expression fluency and literary effect, to produce superior translation references and preference pairs. The generated references improve supervised fine-tuning (SFT) performance by 8.65 CEA100 points over original ground truth. For reinforcement learning, the framework leverages an explicit reward model with GRPO, yielding an additional 1.51 point improvement, while DPO-series methods showed performance degradation. The resulting models, LitMT-8B and LitMT-14B, achieve 67.25 and 69.07 CEA100 respectively on the MetaphorTrans English-to-Chinese benchmark, demonstrating competitive performance against Claude Sonnet 4.5 (68.43) and strong generalization to out-of-domain literary works.

Key takeaway

For NLP Engineers building literary translation systems, you should prioritize data quality through multi-aspect refinement. This method, using specialized LLMs to iteratively improve fluency and literary effect, generates superior training data. Your models can then achieve competitive performance, like LitMT-14B's 69.07 CEA100, with significantly fewer parameters than frontier LLMs. Combine supervised fine-tuning with explicit reward modeling via GRPO, as DPO-series methods degrade performance in this domain.

Key insights

Multi-aspect iterative refinement generates superior literary translation data for LLM training.

Principles

Method

A multi-aspect iterative refinement framework generates high-quality translation references and preference pairs using specialized LLM translators for expression fluency and literary effect, followed by supervised fine-tuning and explicit reward modeling with GRPO.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.