Better Literary Translation: A Multi-Aspect Data Generation and LLM Training Approach
Summary
A new multi-aspect iterative refinement framework addresses challenges in literary translation. It tackles the scarcity of high-quality annotated data and the need to balance expression fluency with literary effect. This framework generates high-quality translation references and preference data. It uses specialized LLM translators, each targeting a distinct quality dimension. The generated data supports supervised fine-tuning (SFT) and reinforcement learning (RL). Experiments show generated references outperform original ground truth for SFT by 8.65 CEA100 points. For RL, DPO degraded performance, but GRPO yielded an additional 1.51 point improvement. This is attributed to GRPO's stability and online exploration. The resulting LitMT-8B and LitMT-14B models achieved 67.25 and 69.07 CEA100 respectively on MetaphorTrans English-to-Chinese. These scores are competitive with Claude Sonnet 4.5 at 68.43 CEA100 and generalize well to out-of-domain literary work.
Key takeaway
For NLP Engineers developing literary translation systems, this multi-aspect data generation and LLM training approach offers a robust method. It overcomes data scarcity and enhances model performance. You should consider implementing specialized LLM translators for iterative data refinement. Also, leverage GRPO for reinforcement learning. GRPO demonstrated superior stability and exploration compared to DPO, achieving competitive results against models like Claude Sonnet 4.5.
Key insights
Multi-aspect data generation and iterative refinement significantly enhance LLM literary translation.
Principles
- Specialized LLM translators can target distinct quality dimensions.
- Generated references can surpass original ground truth for SFT.
- GRPO offers stability and online exploration for RL in this context.
Method
A multi-aspect iterative refinement framework generates high-quality translation references and preference data via specialized LLM translators, then uses this data for supervised fine-tuning and GRPO-based reinforcement learning.
In practice
- Generate superior SFT data using specialized LLM translators.
- Employ GRPO for reinforcement learning in literary translation tasks.
Topics
- Literary Translation
- Large Language Models
- Data Generation
- Reinforcement Learning
- Supervised Fine-tuning
- GRPO
- MetaphorTrans Benchmark
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.