HKJudge: A Legal Discourse-Annotated Corpus for Interpreting What Courts Find, How They Reason, and What They Rule
Summary
The Hong Kong Judgment Discourse Dataset (HKJudge) is introduced as the first sentence-level expert-annotated legal discourse corpus, specifically for Hong Kong judgments. It encompasses criminal judgments from all five court hierarchy levels, containing approximately 290,000 sentences and 6.5 million tokens. A two-tier discourse schema assigns each sentence one of 26 rhetorical roles and annotates three sentencing elements (charge, imprisonment term, fine) at the span level. Ten legal linguistics annotators achieved an inter-annotator agreement of κ=0.8. The work formulates two tasks, rhetorical role classification and legal element extraction, providing benchmark evaluations across four BERT-based models, two open-source LLMs, and four commercial LLMs.
Key takeaway
For NLP Engineers or AI Scientists developing legal AI systems, this work highlights the critical role of expert-annotated discourse corpora. You should consider adapting the HKJudge dataset's two-tier annotation schema for your own legal text analysis projects. This approach can significantly improve the accuracy of models predicting legal judgment outcomes and extracting specific legal elements from complex court documents.
Key insights
Expert-annotated legal discourse corpora enable advanced AI understanding of court judgments.
Principles
- Sentence-level discourse annotation models judgment structure.
- A two-tier schema captures facts, reasoning, and rulings.
Method
A two-tier discourse schema assigns 26 rhetorical roles at the sentence level and annotates three sentencing elements (charge, imprisonment term, fine) at the span level, applied by legal linguistics experts.
In practice
- Utilize a two-tier annotation schema for legal text.
- Benchmark LLMs for legal rhetorical role classification.
Topics
- Legal NLP
- Corpus Annotation
- Discourse Analysis
- Legal Judgment Prediction
- Large Language Models
- Hong Kong Law
Code references
Best for: Research Scientist, NLP Engineer, AI Scientist, Legal Professional
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.