Improving Code Translation with Syntax-Guided and Semantic-aware Preference Optimization
Summary
CTO, a novel approach, improves code translation by integrating syntax-guided and semantic-aware preference optimization. Large Language Models (LLMs) often struggle with syntactic correctness and semantic consistency in code translation, and existing preference-based learning methods are hindered by unreliable semantic rewards. CTO addresses this by training a cross-lingual semantic model via contrastive learning to directly assess functional equivalence between source and translated code. This robust semantic signal is then unified with compiler-based syntactic feedback within a direct preference optimization (DPO) framework, treating code translation as a multi-objective optimization problem. Experiments on C++, Java, and Python translations demonstrate that CTO significantly outperforms existing baselines and alternative preference optimization strategies, achieving accuracy gains of up to 3.66% on TransCoder-Test and 4.27% on HumanEval-X with CodeT5, and even higher gains with CodeLlama-7B and Qwen2.5-Coder-7B.
Key takeaway
For AI Engineers and Research Scientists working on cross-lingual code migration, CTO offers a robust method to enhance translation quality. By integrating direct semantic equivalence assessment with compiler-based syntactic feedback, your models can achieve superior accuracy and functional alignment. Consider adopting CTO's multi-objective preference optimization to overcome limitations of traditional supervised finetuning and improve the reliability of your code translation systems, especially for critical legacy modernization projects.
Key insights
CTO unifies syntax-guided and semantic-aware preference optimization for robust, accurate code translation.
Principles
- Semantic rewards must derive directly from source code.
- Compiler feedback provides infallible syntactic correctness signals.
- Multi-objective optimization improves code translation accuracy.
Method
CTO trains a cross-lingual semantic model via contrastive learning, then unifies its semantic reward with compiler-based syntactic feedback within a DPO framework, formulating code translation as a multi-objective optimization problem.
In practice
- Use Qwen3-8B for negative sample generation.
- Employ LoRA for finetuning larger models like CodeLlama-7B.
- Prioritize syntactic correctness in preference dataset construction.
Topics
- Code Translation
- Large Language Models
- Direct Preference Optimization
- Cross-lingual Semantic Model
- Syntactic Correctness
Code references
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.