TransLaw: A Large-Scale Dataset and Multi-Agent Benchmark Simulating Professional Translation of Hong Kong Case Law
Summary
TransLaw is a novel multi-agent framework and large-scale dataset designed for professional translation of Hong Kong case law, addressing challenges like intricate legal terminology and cultural nuances. It employs three specialized LLM-powered agents—Translator, Annotator, and Proofreader—to collaboratively produce high-accuracy translations. The system was evaluated using 13 open-source and commercial LLMs, demonstrating superior performance over GPT-4o in legal semantic accuracy, structural coherence, and stylistic fidelity, though it still trails human experts in contextualization and naturalness. TransLaw significantly reduces translation costs, being 3,972 times cheaper than professional human services and 10.26% cheaper than direct GPT-4o usage for the FACC 1/2021 case. A Bilingual Judgment Corpus (BJC) of 344 HK Court of Final Appeal decisions (1997–2022) was created for benchmarking.
Key takeaway
For legal professionals or AI scientists developing specialized translation solutions, TransLaw demonstrates that a multi-agent LLM framework can significantly enhance accuracy and reduce costs in complex domains like Hong Kong case law. You should consider adopting a collaborative agent architecture with distinct roles (Translator, Annotator, Proofreader) and integrating domain-specific feedback mechanisms to surpass general-purpose LLMs like GPT-4o, even if human review remains crucial for ultimate contextual and stylistic naturalness.
Key insights
Multi-agent LLM systems can achieve high-quality, cost-effective legal translation by simulating human workflows.
Principles
- Decompose complex tasks into specialized agent roles.
- Incorporate multi-level error annotation for iterative refinement.
- Utilize memory modules for continuous agent learning.
Method
TransLaw's workflow involves a Translator agent, an Annotator agent marking errors with 30 subcategories, and a Proofreader agent iteratively refining translations based on feedback and stored error triplets.
In practice
- Configure LLM agents (e.g., GPT-4o, GPT-3.5 Turbo) for specific translation roles.
- Integrate terminology databases like DoJ Glossaries for domain specificity.
- Use human annotators in hybrid setups to enhance coherence and style.
Topics
- Multi-agent Systems
- Legal Machine Translation
- Large Language Models
- Hong Kong Case Law
- Translation Quality Evaluation
- Cost Reduction
Best for: Research Scientist, AI Scientist, NLP Engineer, Domain Expert
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.