HKJudge: A Legal Discourse-Annotated Corpus for Interpreting What Courts Find, How They Reason, and What They Rule

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, long

Summary

The Hong Kong Judgment Discourse Dataset (HKJudge) is introduced as the first sentence-level expert-annotated legal discourse corpus for Hong Kong criminal judgments. Comprising approximately 290,000 sentences and 6.5 million tokens from all five levels of HK's court hierarchy (1968-2024), HKJudge features a two-tier annotation schema. This schema assigns one of 26 rhetorical roles to each sentence and further annotates sentences with three span-level sentencing elements: charge, imprisonment term, and fine. Ten legal linguistics experts achieved a high inter-annotator agreement of κ=0.8. The dataset facilitates two tasks, rhetorical role classification and legal element extraction, for which benchmark evaluations were conducted using BERT-based models, open-source LLMs, and commercial LLMs like GPT-4, Claude, and Gemini. While fine-tuned LLMs surpassed BERT models, they did not match human expert performance, highlighting ongoing challenges in legal LLM reasoning. The HKJudge dataset and code are publicly available.

Key takeaway

For NLP engineers developing legal AI systems for Hong Kong or similar common-law jurisdictions, you should prioritize expert-annotated datasets like HKJudge. Direct model transfer from other jurisdictions is unreliable due to unique discourse structures. Utilize HKJudge's two-tier schema and benchmarks to train and evaluate LLMs for tasks like legal judgment prediction, focusing on fine-tuning to bridge the gap with human expert performance. This will enhance the accuracy and reliability of your legal reasoning models.

Key insights

HKJudge provides a unique, expert-annotated corpus and schema for Hong Kong legal discourse, benchmarking LLMs against human performance.

Principles

Method

A two-tier schema assigns 26 rhetorical roles to sentences and extracts three span-level sentencing elements (charge, imprisonment term, fine) from criminal judgments. This involves expert annotation and benchmarking various LLMs.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.