How to save up to 70% of tokens with LLMs?

· Source: LLM on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, medium

Summary

An analysis of LLM token usage in AI Agentic workflows reveals that optimizing database schema representation within prompts can yield substantial savings. The author achieved an 84% token reduction in a specific project. For a test "Dummy project" with 20 interconnected database tables, converting the schema from JSON (48144 characters, 9106 tokens using the cl100k_base tokenizer) to TypeScript (10412 characters, 2758 tokens) resulted in a 69.71% reduction, saving 6,348 tokens per request. This optimization not only decreases context costs and processing latency but also improved LLM accuracy. The "syntax tax" of JSON Schema, characterized by verbose `required` arrays, nested enums, explicit type annotations, and token fragmentation, makes it less efficient than TypeScript's concise syntax.

Key takeaway

For AI Engineers optimizing LLM agentic workflows, consider replacing verbose JSON Schema representations of database structures with TypeScript interfaces in system prompts. This approach can reduce token consumption by up to 69.71% and improve model accuracy, significantly lowering operational costs and latency. You should use TypeScript for general text formatting and extraction tasks, reserving JSON Schema for scenarios demanding strict API grammar validation.

Key insights

Representing database schemas as TypeScript interfaces in LLM prompts significantly reduces token usage and improves accuracy.

Principles

Method

Convert Zod schemas to TypeScript interfaces using `zod-to-ts`, inject the resulting string into the system prompt, and validate LLM output with Zod.

In practice

Topics

Best for: Prompt Engineer, AI Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by LLM on Medium.