Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems
Summary
A benchmark study evaluated token-optimized data formats, TOON (Token-Oriented Object Notation) and TRON (Token Reduced Object Notation), as alternatives to JSON in agentic AI systems. The research, conducted across four agentic benchmarks (BFCL, MCPToolBenchPP, MCP-Universe, StableToolBench) and five open-weight LLMs, decoupled input and output compression to assess token reduction and accuracy independently. Results show TRON reduced tokens by up to 27% while maintaining accuracy within 14 percentage points (pp) of the JSON baseline. TOON achieved up to 18% token reduction at a 9 pp accuracy cost, but exhibited cascading failures in multi-turn scenarios and struggled with parallel tool-call output. Token savings were most significant with complex schemas, and explicit reasoning steps in models like Qwen3-32B improved robustness against format-driven structural surprises.
Key takeaway
For Machine Learning Engineers optimizing LLM inference costs in agentic systems, consider adopting TRON for structured data exchange. TRON can reduce token consumption by up to 27% with minimal accuracy impact, especially for workloads with many structurally similar tools. However, avoid TOON in multi-turn agentic systems. Its stronger compression is often offset by cascading parsing failures and accuracy losses on parallel tool calls. Always benchmark formats on your specific workload.
Key insights
Token-optimized formats like TRON can significantly reduce LLM token consumption in agentic systems with minimal accuracy loss.
Principles
- Token efficiency in agentic AI is directly tied to data serialization format.
- LLM accuracy and parsing success are sensitive to structured data format.
- Explicit reasoning steps can mitigate format-driven structural surprises.
Method
The study benchmarked TOON and TRON against JSON across four agentic benchmarks and five LLMs. It decoupled input and output compression, measuring token reduction and accuracy independently at three points: tool schemas, tool calls, and tool results.
In practice
- TRON offers token reduction for structurally similar tool workloads.
- TOON's multi-turn parsing failures limit its general use.
- Measure format performance on specific agentic system workloads.
Topics
- Token Optimization
- Agentic AI
- Data Serialization
- TRON Format
- TOON Format
- LLM Benchmarking
Code references
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.