LLMs Prompted for Legal Context Object More: Overrefusal from Small On-Premises LLMs in Criminal Legal Context
Summary
A study investigating small, on-premises Large Language Models (LLMs) found that these models exhibit significant "overrefusal" when prompted within a criminal legal context. Researchers assessed several modern small LLMs, commonly considered for on-device assistance, to understand how refusal rates are impacted by legal prompts. Surprisingly, the inclusion of authority-style prefixes, such as "you are acting as an assistant of the national supreme court" or "[...] defense lawyer," systematically increased refusal rates by 2 to 20 times compared to a no-prefix baseline. Conversely, a known role-play jailbreak prefix showed inconsistent effects, sharply increasing refusals in some models while barely affecting others. This finding suggests that small, deployable LLMs are unstable when framed with contextual information that a real institutional user would naturally introduce, highlighting a need for further investigation to mitigate potential biases.
Key takeaway
For legal professionals experimenting with on-premises LLMs for tasks like translation or reformulation, you should be aware that contextual framing significantly impacts model behavior. Your use of authority-style prefixes, such as "you are acting as an assistant of the national supreme court," can dramatically increase refusal rates, potentially introducing biases and slowing case processing. Therefore, carefully evaluate how you prompt these models and investigate their refusal patterns to ensure consistent and unbiased assistance.
Key insights
Small LLMs show significant overrefusal in legal contexts, especially with authority-style prefixes, indicating instability under contextual framing.
Principles
- Authority-style prefixes increase LLM refusal rates significantly.
- Small on-prem LLMs are unstable with contextual legal framing.
- Overrefusal can introduce biases in legal case processing.
Method
The study investigated modern small LLMs for on-device assistance, assessing overrefusal impact on legal prompts by comparing refusal rates with authority-style and role-play jailbreak prefixes against a no-prefix baseline.
In practice
- Test LLM refusal rates with specific legal role prompts.
- Monitor small LLMs for bias from selective refusal.
- Avoid authority-style prefixes in legal LLM interactions.
Topics
- Large Language Models
- On-premises LLMs
- Legal AI
- Prompt Engineering
- AI Bias
- Refusal Rates
Best for: Research Scientist, CTO, VP of Engineering/Data, Legal Professional, AI Ethicist, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.