Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems

2026-05-28 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

A benchmark study titled "Notation Matters" evaluates token-optimized data formats, TOON (Token-Oriented Object Notation) and TRON (Token Reduced Object Notation), against standard JSON in agentic AI systems. These systems use structured data for tool schemas, execution results, and tool invocations, where JSON's structural elements introduce token overhead. The research assessed TOON and TRON on four agentic benchmarks (BFCL, MCPToolBenchPP, MCP-Universe, StableToolBench) and five open-weight large language models. Decoupling input and output compression, the study found TRON reduces tokens by up to 27% while maintaining accuracy within 14 percentage points of the JSON baseline. TOON achieved up to an 18% token reduction at a similar 9 percentage point accuracy cost, but exhibited cascading failures on multi-turn parsing and collapsed parallel tool-call output for most models.

Key takeaway

For machine learning engineers optimizing agentic AI systems, you should evaluate token-optimized formats like TRON to significantly reduce token consumption. TRON offers up to a 27% token reduction with a manageable accuracy trade-off (within 14pp of JSON), directly impacting operational costs and inference speed. Be cautious with TOON, as its 18% reduction comes with potential multi-turn parsing failures and issues with parallel tool-call output, which could compromise system reliability.

Key insights

Token-optimized formats like TRON significantly reduce LLM token consumption in agentic systems with acceptable accuracy trade-offs.

Principles

JSON's structure incurs substantial token overhead.
Token efficiency can be decoupled from accuracy.
Format choice impacts agentic system reliability.

Method

The study evaluated formats on four agentic benchmarks and five LLMs, independently measuring input and output compression to assess comprehension and generation.

In practice

Consider TRON for token-constrained agentic systems.
Evaluate TOON carefully for multi-turn tasks.
Benchmark format impact on specific LLM agents.

Topics

Agentic AI Systems
Large Language Models
Token Optimization
Data Formats
Benchmarking
TRON

Best for: AI Engineer, AI Architect, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.