LLMs Prompted for Legal Context Object More: Overrefusal from Small On-Premises LLMs in Criminal Legal Context

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A study investigating small, on-premises Large Language Models (LLMs) found that these models exhibit significant "overrefusal" when prompted within a criminal legal context. Researchers assessed several modern small LLMs, commonly considered for on-device assistance, to understand how refusal rates are impacted by legal prompts. Surprisingly, the inclusion of authority-style prefixes, such as "you are acting as an assistant of the national supreme court" or "[...] defense lawyer," systematically increased refusal rates by 2 to 20 times compared to a no-prefix baseline. Conversely, a known role-play jailbreak prefix showed inconsistent effects, sharply increasing refusals in some models while barely affecting others. This finding suggests that small, deployable LLMs are unstable when framed with contextual information that a real institutional user would naturally introduce, highlighting a need for further investigation to mitigate potential biases.

Key takeaway

For legal professionals experimenting with on-premises LLMs for tasks like translation or reformulation, you should be aware that contextual framing significantly impacts model behavior. Your use of authority-style prefixes, such as "you are acting as an assistant of the national supreme court," can dramatically increase refusal rates, potentially introducing biases and slowing case processing. Therefore, carefully evaluate how you prompt these models and investigate their refusal patterns to ensure consistent and unbiased assistance.

Key insights

Small LLMs show significant overrefusal in legal contexts, especially with authority-style prefixes, indicating instability under contextual framing.

Principles

Method

The study investigated modern small LLMs for on-device assistance, assessing overrefusal impact on legal prompts by comparing refusal rates with authority-style and role-play jailbreak prefixes against a no-prefix baseline.

In practice

Topics

Best for: Research Scientist, CTO, VP of Engineering/Data, Legal Professional, AI Ethicist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.