Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems

2026-05-06 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Expert, extended

Summary

A benchmark study evaluated token-optimized data formats, TOON (Token-Oriented Object Notation) and TRON (Token Reduced Object Notation), as alternatives to JSON in agentic AI systems. The research, conducted across four agentic benchmarks (BFCL, MCPToolBenchPP, MCP-Universe, StableToolBench) and five open-weight LLMs, decoupled input and output compression to assess token reduction and accuracy independently. Results show TRON reduced tokens by up to 27% while maintaining accuracy within 14 percentage points (pp) of the JSON baseline. TOON achieved up to 18% token reduction at a 9 pp accuracy cost, but exhibited cascading failures in multi-turn scenarios and struggled with parallel tool-call output. Token savings were most significant with complex schemas, and explicit reasoning steps in models like Qwen3-32B improved robustness against format-driven structural surprises.

Key takeaway

For Machine Learning Engineers optimizing LLM inference costs in agentic systems, consider adopting TRON for structured data exchange. TRON can reduce token consumption by up to 27% with minimal accuracy impact, especially for workloads with many structurally similar tools. However, avoid TOON in multi-turn agentic systems. Its stronger compression is often offset by cascading parsing failures and accuracy losses on parallel tool calls. Always benchmark formats on your specific workload.

Key insights

Token-optimized formats like TRON can significantly reduce LLM token consumption in agentic systems with minimal accuracy loss.

Principles

Token efficiency in agentic AI is directly tied to data serialization format.
LLM accuracy and parsing success are sensitive to structured data format.
Explicit reasoning steps can mitigate format-driven structural surprises.

Method

The study benchmarked TOON and TRON against JSON across four agentic benchmarks and five LLMs. It decoupled input and output compression, measuring token reduction and accuracy independently at three points: tool schemas, tool calls, and tool results.

In practice

TRON offers token reduction for structurally similar tool workloads.
TOON's multi-turn parsing failures limit its general use.
Measure format performance on specific agentic system workloads.

Topics

Token Optimization
Agentic AI
Data Serialization
TRON Format
TOON Format
LLM Benchmarking

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.