TransLaw: A Large-Scale Dataset and Multi-Agent Benchmark Simulating Professional Translation of Hong Kong Case Law

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing, Data Science & Analytics · Depth: Expert, extended

Summary

TransLaw is a novel multi-agent framework and large-scale dataset designed for professional translation of Hong Kong case law, addressing challenges like intricate legal terminology and cultural nuances. It employs three specialized LLM-powered agents—Translator, Annotator, and Proofreader—to collaboratively produce high-accuracy translations. The system was evaluated using 13 open-source and commercial LLMs, demonstrating superior performance over GPT-4o in legal semantic accuracy, structural coherence, and stylistic fidelity, though it still trails human experts in contextualization and naturalness. TransLaw significantly reduces translation costs, being 3,972 times cheaper than professional human services and 10.26% cheaper than direct GPT-4o usage for the FACC 1/2021 case. A Bilingual Judgment Corpus (BJC) of 344 HK Court of Final Appeal decisions (1997–2022) was created for benchmarking.

Key takeaway

For legal professionals or AI scientists developing specialized translation solutions, TransLaw demonstrates that a multi-agent LLM framework can significantly enhance accuracy and reduce costs in complex domains like Hong Kong case law. You should consider adopting a collaborative agent architecture with distinct roles (Translator, Annotator, Proofreader) and integrating domain-specific feedback mechanisms to surpass general-purpose LLMs like GPT-4o, even if human review remains crucial for ultimate contextual and stylistic naturalness.

Key insights

Multi-agent LLM systems can achieve high-quality, cost-effective legal translation by simulating human workflows.

Principles

Method

TransLaw's workflow involves a Translator agent, an Annotator agent marking errors with 30 subcategories, and a Proofreader agent iteratively refining translations based on feedback and stored error triplets.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer, Domain Expert

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.