Better Literary Translation: A Multi-Aspect Data Generation and LLM Training Approach

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

A new multi-aspect iterative refinement framework significantly enhances literary translation by generating high-quality training data. This approach employs specialized LLM translators, focusing on distinct quality dimensions like expression fluency and literary effect, to produce superior translation references and preference pairs. The generated references improve supervised fine-tuning (SFT) performance by 8.65 CEA100 points over original ground truth. For reinforcement learning, the framework leverages an explicit reward model with GRPO, yielding an additional 1.51 point improvement, while DPO-series methods showed performance degradation. The resulting models, LitMT-8B and LitMT-14B, achieve 67.25 and 69.07 CEA100 respectively on the MetaphorTrans English-to-Chinese benchmark, demonstrating competitive performance against Claude Sonnet 4.5 (68.43) and strong generalization to out-of-domain literary works.

Key takeaway

For NLP Engineers building literary translation systems, you should prioritize data quality through multi-aspect refinement. This method, using specialized LLMs to iteratively improve fluency and literary effect, generates superior training data. Your models can then achieve competitive performance, like LitMT-14B's 69.07 CEA100, with significantly fewer parameters than frontier LLMs. Combine supervised fine-tuning with explicit reward modeling via GRPO, as DPO-series methods degrade performance in this domain.

Key insights

Multi-aspect iterative refinement generates superior literary translation data for LLM training.

Principles

Decompose literary translation quality into fluency and literary effect.
Specialized LLMs can optimize distinct quality dimensions.
Explicit reward modeling with GRPO outperforms DPO for literary translation.

Method

A multi-aspect iterative refinement framework generates high-quality translation references and preference pairs using specialized LLM translators for expression fluency and literary effect, followed by supervised fine-tuning and explicit reward modeling with GRPO.

In practice

Use Qwen3-235B-A22B-Instruct as a data generation backbone.
Combine learned reward with BLEU and format constraints for GRPO.
Train LitMT-8B/14B from Qwen3-Base models for efficiency.

Topics

Literary Translation
LLM Fine-tuning
Data Generation
Reinforcement Learning
Reward Modeling
MetaphorTrans Benchmark
Qwen3 Models

Code references

tatsu-lab/stanford_alpaca

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.