Structured Context Engineering for File-Native Agentic Systems

2026-02-09 · Source: Simon Willison's Weblog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Advanced, quick

Summary

A new paper by Damon McMillan, "Structured Context Engineering for File-Native Agentic Systems," systematically studies context engineering for structured data, using SQL generation as a proxy for programmatic agent operations. The research involved 9,649 experiments across 11 models, 4 data formats (YAML, Markdown, JSON, and TOON), and SQL schemas ranging from 10 to 10,000 tables. Key findings indicate that frontier models like Opus 4.5, GPT-5.2, and Gemini 2.5 Pro significantly outperformed open-source models such as DeepSeek V3.2, Kimi K2, and Llama 4. While frontier models benefited from filesystem-based context retrieval, open-source models showed less convincing results. The study also identified a "grep tax" where models unfamiliar with the TOON format consumed 138% to 740% more tokens than YAML for schema sizes from 500 to 10,000 tables, despite TOON's smaller file size.

Key takeaway

For AI Architects designing agentic systems that interact with large structured datasets, prioritize frontier models (e.g., Opus 4.5, GPT-5.2, Gemini 2.5 Pro) for superior performance, especially with filesystem context retrieval. Be wary of using less common data formats like TOON, as they can incur a significant "grep tax" in token consumption due to model unfamiliarity, leading to increased operational costs and reduced efficiency. Stick to widely recognized formats like YAML or JSON unless you can fine-tune your models.

Key insights

Frontier LLMs excel at structured context tasks, outperforming open-source models, with format familiarity impacting token efficiency.

Principles

Model capability dictates structured context performance.
Familiarity with data format reduces token consumption.

Method

The study used SQL generation as a proxy for programmatic agent operations, testing 11 models across 4 formats and schemas up to 10,000 tables.

In practice

Prioritize frontier models for complex agentic tasks.
Use common data formats like YAML or JSON.
Avoid novel formats without model fine-tuning.

Topics

Structured Context Engineering
LLM Agentic Systems
Large SQL Schemas
Data Format Efficiency
Frontier LLMs

Code references

toon-format/toon

Best for: AI Scientist, Research Scientist, AI Architect, AI Researcher, AI Engineer, Prompt Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.