Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

2026-05-04 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

DriftBench, a new benchmark, evaluates how large language models (LLMs) maintain fidelity to original objectives and constraints during multi-turn scientific ideation. The study, involving 2,146 benchmark runs across seven models from five providers and 38 research briefs, reveals that iterative pressure consistently increases structural complexity and often reduces adherence to initial constraints. A key finding is the "knows-but-violates" (KBV) phenomenon, where models accurately restate constraints (97.3% recall across models) but simultaneously violate them in their proposals. KBV rates range from 8% (GPT-5.4) to 99% (Sonnet 4.6), with five of seven models exceeding 50%. Structured checkpointing partially reduces KBV rates but does not eliminate the dissociation, and complexity inflation persists. Human validation confirms that the LLM judge under-detects violations, suggesting reported adherence scores are conservative. The benchmark data, including briefs, prompts, rubrics, transcripts, and scores, is openly released.

Key takeaway

For AI Architects and NLP Engineers designing multi-turn LLM applications, recognize that models can "know" constraints yet still violate them. You should integrate explicit content validation beyond simple recall checks and consider model-specific drift rates, as these vary widely (8-99% KBV). Proactively implement structured checkpoints and automated monitoring to mitigate, though not eliminate, constraint drift and complexity inflation in iterative ideation workflows.

Key insights

LLMs often violate constraints in multi-turn ideation despite perfect declarative recall, a "knows-but-violates" dissociation.

Principles

Iterative pressure increases LLM output complexity.
Constraint adherence does not correlate with declarative recall.
Drift patterns vary significantly across LLM models.

Method

DriftBench evaluates constraint adherence in multi-turn LLM ideation using structured research briefs with hard constraints, restatement probes, and multi-faceted scoring, including human validation.

In practice

Pair restatement checks with proposal content validation.
Implement periodic checkpoints in LLM workflows.
Automated constraint monitoring can slightly improve adherence.

Topics

DriftBench
Constraint Adherence
Multi-Turn LLM Interaction
Knows-But-Violates Rate
Complexity Inflation

Code references

kruthof/driftbench

Best for: Research Scientist, AI Architect, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.