HKJudge: A Legal Discourse-Annotated Corpus for Interpreting What Courts Find, How They Reason, and What They Rule
Summary
The Hong Kong Judgment Discourse Dataset (HKJudge) is introduced as the first sentence-level expert-annotated legal discourse corpus for Hong Kong criminal judgments. Comprising approximately 290,000 sentences and 6.5 million tokens from all five levels of HK's court hierarchy (1968-2024), HKJudge features a two-tier annotation schema. This schema assigns one of 26 rhetorical roles to each sentence and further annotates sentences with three span-level sentencing elements: charge, imprisonment term, and fine. Ten legal linguistics experts achieved a high inter-annotator agreement of κ=0.8. The dataset facilitates two tasks, rhetorical role classification and legal element extraction, for which benchmark evaluations were conducted using BERT-based models, open-source LLMs, and commercial LLMs like GPT-4, Claude, and Gemini. While fine-tuned LLMs surpassed BERT models, they did not match human expert performance, highlighting ongoing challenges in legal LLM reasoning. The HKJudge dataset and code are publicly available.
Key takeaway
For NLP engineers developing legal AI systems for Hong Kong or similar common-law jurisdictions, you should prioritize expert-annotated datasets like HKJudge. Direct model transfer from other jurisdictions is unreliable due to unique discourse structures. Utilize HKJudge's two-tier schema and benchmarks to train and evaluate LLMs for tasks like legal judgment prediction, focusing on fine-tuning to bridge the gap with human expert performance. This will enhance the accuracy and reliability of your legal reasoning models.
Key insights
HKJudge provides a unique, expert-annotated corpus and schema for Hong Kong legal discourse, benchmarking LLMs against human performance.
Principles
- Sentence-level discourse annotation models judgment structure.
- Direct transfer of legal NLP models across jurisdictions is unreliable.
- Expert annotation is crucial for high-quality legal NLP datasets.
Method
A two-tier schema assigns 26 rhetorical roles to sentences and extracts three span-level sentencing elements (charge, imprisonment term, fine) from criminal judgments. This involves expert annotation and benchmarking various LLMs.
In practice
- Use HKJudge for legal judgment prediction (LJP) systems.
- Apply the two-tier schema for legal search and case analysis.
- Benchmark LLMs for legal reasoning tasks using HKJudge.
Topics
- Legal NLP
- Discourse Annotation
- Hong Kong Law
- Court Judgments
- Large Language Models
- Legal Judgment Prediction
Code references
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.