Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation
Summary
DriftBench, a new benchmark, evaluates how large language models (LLMs) maintain fidelity to original objectives and constraints during multi-turn scientific ideation. The study, involving 2,146 benchmark runs across seven models from five providers and 38 research briefs, reveals that iterative pressure consistently increases structural complexity and often reduces adherence to initial constraints. A key finding is the "knows-but-violates" (KBV) phenomenon, where models accurately restate constraints (97.3% recall across models) but simultaneously violate them in their proposals. KBV rates range from 8% (GPT-5.4) to 99% (Sonnet 4.6), with five of seven models exceeding 50%. Structured checkpointing partially reduces KBV rates but does not eliminate the dissociation, and complexity inflation persists. Human validation confirms that the LLM judge under-detects violations, suggesting reported adherence scores are conservative. The benchmark data, including briefs, prompts, rubrics, transcripts, and scores, is openly released.
Key takeaway
For AI Architects and NLP Engineers designing multi-turn LLM applications, recognize that models can "know" constraints yet still violate them. You should integrate explicit content validation beyond simple recall checks and consider model-specific drift rates, as these vary widely (8-99% KBV). Proactively implement structured checkpoints and automated monitoring to mitigate, though not eliminate, constraint drift and complexity inflation in iterative ideation workflows.
Key insights
LLMs often violate constraints in multi-turn ideation despite perfect declarative recall, a "knows-but-violates" dissociation.
Principles
- Iterative pressure increases LLM output complexity.
- Constraint adherence does not correlate with declarative recall.
- Drift patterns vary significantly across LLM models.
Method
DriftBench evaluates constraint adherence in multi-turn LLM ideation using structured research briefs with hard constraints, restatement probes, and multi-faceted scoring, including human validation.
In practice
- Pair restatement checks with proposal content validation.
- Implement periodic checkpoints in LLM workflows.
- Automated constraint monitoring can slightly improve adherence.
Topics
- DriftBench
- Constraint Adherence
- Multi-Turn LLM Interaction
- Knows-But-Violates Rate
- Complexity Inflation
Code references
Best for: Research Scientist, AI Architect, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.