Dense Contexts Are Hard Contexts: Lexical Density Limits Effective Context in LLMs

2026-05-04 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

A study by Dettori et al. reveals that lexical density, the rate at which a context introduces distinct information, significantly limits the effective context window of Large Language Models. Evaluating eight open-weight LLMs (9B–685B parameters) across three "find-the-needle" benchmarks—MK-NIAH (MATTR=0.58), Scene-Rules (MATTR=0.75), and WordChecker (MATTR=1.0)—with identical ~12k token lengths and controlled needle positions, researchers observed a sharp performance collapse. Models achieving near-perfect retrieval in sparse contexts dropped below 60% accuracy on denser ones. Further experiments confirmed that reducing lexical density consistently restored performance, demonstrating that effective context capacity is a function of this previously overlooked factor, interacting with positional effects and causing degradation earlier than predicted by length alone.

Key takeaway

For Machine Learning Engineers evaluating LLMs for applications involving dense, information-rich inputs, you must consider lexical density as a critical performance factor. Your models' effective context capacity will degrade significantly earlier than predicted by token length or needle position alone in such scenarios. Avoid over-compressing prompts, as this can inadvertently increase density and reduce reliability. Prioritize testing with lexically diverse contexts to accurately assess real-world performance.

Key insights

Lexical density, the rate of distinct information, is a critical, overlooked factor limiting LLM long-context performance.

Principles

Effective context capacity is density-dependent.
High density amplifies positional decay.
Redundant text allows skimming, dense text requires full processing.

Method

The study used three "find-the-needle" benchmarks (~12k tokens, controlled needle position) with varying Moving-Average Type-Token Ratio (MATTR) to quantify density's impact on 8 LLMs (9B–685B).

In practice

Test LLMs with information-rich, dense inputs.
Balance prompt compression with density impact.
Design agentic configurations with density in mind.

Topics

Lexical Density
LLM Context Window
Long-Context Performance
Needle-in-a-Haystack
Prompt Engineering
Model Evaluation

Code references

gkamradt/LLMTest_NeedleInAHaystack

Best for: Research Scientist, AI Architect, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.