Better Literary Translation: A Multi-Aspect Data Generation and LLM Training Approach
Summary
A new multi-aspect iterative refinement framework significantly enhances literary translation by generating high-quality training data. This approach employs specialized LLM translators, focusing on distinct quality dimensions like expression fluency and literary effect, to produce superior translation references and preference pairs. The generated references improve supervised fine-tuning (SFT) performance by 8.65 CEA100 points over original ground truth. For reinforcement learning, the framework leverages an explicit reward model with GRPO, yielding an additional 1.51 point improvement, while DPO-series methods showed performance degradation. The resulting models, LitMT-8B and LitMT-14B, achieve 67.25 and 69.07 CEA100 respectively on the MetaphorTrans English-to-Chinese benchmark, demonstrating competitive performance against Claude Sonnet 4.5 (68.43) and strong generalization to out-of-domain literary works.
Key takeaway
For NLP Engineers building literary translation systems, you should prioritize data quality through multi-aspect refinement. This method, using specialized LLMs to iteratively improve fluency and literary effect, generates superior training data. Your models can then achieve competitive performance, like LitMT-14B's 69.07 CEA100, with significantly fewer parameters than frontier LLMs. Combine supervised fine-tuning with explicit reward modeling via GRPO, as DPO-series methods degrade performance in this domain.
Key insights
Multi-aspect iterative refinement generates superior literary translation data for LLM training.
Principles
- Decompose literary translation quality into fluency and literary effect.
- Specialized LLMs can optimize distinct quality dimensions.
- Explicit reward modeling with GRPO outperforms DPO for literary translation.
Method
A multi-aspect iterative refinement framework generates high-quality translation references and preference pairs using specialized LLM translators for expression fluency and literary effect, followed by supervised fine-tuning and explicit reward modeling with GRPO.
In practice
- Use Qwen3-235B-A22B-Instruct as a data generation backbone.
- Combine learned reward with BLEU and format constraints for GRPO.
- Train LitMT-8B/14B from Qwen3-Base models for efficiency.
Topics
- Literary Translation
- LLM Fine-tuning
- Data Generation
- Reinforcement Learning
- Reward Modeling
- MetaphorTrans Benchmark
- Qwen3 Models
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.