TransLaw: A Large-Scale Dataset and Multi-Agent Benchmark Simulating Professional Translation of Hong Kong Case Law

2025-06-12 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing, Data Science & Analytics · Depth: Expert, extended

Summary

TransLaw is a novel multi-agent framework and large-scale dataset designed for professional translation of Hong Kong case law, addressing challenges like intricate legal terminology and cultural nuances. It employs three specialized LLM-powered agents—Translator, Annotator, and Proofreader—to collaboratively produce high-accuracy translations. The system was evaluated using 13 open-source and commercial LLMs, demonstrating superior performance over GPT-4o in legal semantic accuracy, structural coherence, and stylistic fidelity, though it still trails human experts in contextualization and naturalness. TransLaw significantly reduces translation costs, being 3,972 times cheaper than professional human services and 10.26% cheaper than direct GPT-4o usage for the FACC 1/2021 case. A Bilingual Judgment Corpus (BJC) of 344 HK Court of Final Appeal decisions (1997–2022) was created for benchmarking.

Key takeaway

For legal professionals or AI scientists developing specialized translation solutions, TransLaw demonstrates that a multi-agent LLM framework can significantly enhance accuracy and reduce costs in complex domains like Hong Kong case law. You should consider adopting a collaborative agent architecture with distinct roles (Translator, Annotator, Proofreader) and integrating domain-specific feedback mechanisms to surpass general-purpose LLMs like GPT-4o, even if human review remains crucial for ultimate contextual and stylistic naturalness.

Key insights

Multi-agent LLM systems can achieve high-quality, cost-effective legal translation by simulating human workflows.

Principles

Decompose complex tasks into specialized agent roles.
Incorporate multi-level error annotation for iterative refinement.
Utilize memory modules for continuous agent learning.

Method

TransLaw's workflow involves a Translator agent, an Annotator agent marking errors with 30 subcategories, and a Proofreader agent iteratively refining translations based on feedback and stored error triplets.

In practice

Configure LLM agents (e.g., GPT-4o, GPT-3.5 Turbo) for specific translation roles.
Integrate terminology databases like DoJ Glossaries for domain specificity.
Use human annotators in hybrid setups to enhance coherence and style.

Topics

Multi-agent Systems
Legal Machine Translation
Large Language Models
Hong Kong Case Law
Translation Quality Evaluation
Cost Reduction

Best for: Research Scientist, AI Scientist, NLP Engineer, Domain Expert

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.