Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems
Summary
A benchmark study titled "Notation Matters" evaluates token-optimized data formats, TOON (Token-Oriented Object Notation) and TRON (Token Reduced Object Notation), against standard JSON in agentic AI systems. These systems use structured data for tool schemas, execution results, and tool invocations, where JSON's structural elements introduce token overhead. The research assessed TOON and TRON on four agentic benchmarks (BFCL, MCPToolBenchPP, MCP-Universe, StableToolBench) and five open-weight large language models. Decoupling input and output compression, the study found TRON reduces tokens by up to 27% while maintaining accuracy within 14 percentage points of the JSON baseline. TOON achieved up to an 18% token reduction at a similar 9 percentage point accuracy cost, but exhibited cascading failures on multi-turn parsing and collapsed parallel tool-call output for most models.
Key takeaway
For machine learning engineers optimizing agentic AI systems, you should evaluate token-optimized formats like TRON to significantly reduce token consumption. TRON offers up to a 27% token reduction with a manageable accuracy trade-off (within 14pp of JSON), directly impacting operational costs and inference speed. Be cautious with TOON, as its 18% reduction comes with potential multi-turn parsing failures and issues with parallel tool-call output, which could compromise system reliability.
Key insights
Token-optimized formats like TRON significantly reduce LLM token consumption in agentic systems with acceptable accuracy trade-offs.
Principles
- JSON's structure incurs substantial token overhead.
- Token efficiency can be decoupled from accuracy.
- Format choice impacts agentic system reliability.
Method
The study evaluated formats on four agentic benchmarks and five LLMs, independently measuring input and output compression to assess comprehension and generation.
In practice
- Consider TRON for token-constrained agentic systems.
- Evaluate TOON carefully for multi-turn tasks.
- Benchmark format impact on specific LLM agents.
Topics
- Agentic AI Systems
- Large Language Models
- Token Optimization
- Data Formats
- Benchmarking
- TRON
Best for: AI Engineer, AI Architect, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.