I Tested the Viral “Caveman” AI Trick. Here’s What It Actually Saves (And What It Doesn’t)

2026-06-19 · Source: Towards AI - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, medium

Summary

The "Caveman" AI tool, a free, open-source skill for Claude Code and other models released in April 2026 by Julius Brussee, claims to reduce token usage by up to 75%. Independent testing reveals that while it achieves a 61-68% reduction on conversational output, this only constitutes about 25% of a typical coding session's total tokens. Consequently, overall session savings range from a modest 4-10%. Caveman is most effective for chat-heavy workflows like commit messages or code review comments, offering minimal impact on tasks dominated by reasoning or code generation. Far more significant cost reductions, up to 90% for repeated input costs, are achieved through prompt caching, and 50-70% off total spend via intelligent model routing, which directs requests to cheaper, lightweight models for simpler tasks.

Key takeaway

For AI Engineers or MLOps teams focused on optimizing AI API costs, prioritize prompt caching and model routing over the "Caveman" trick. While Caveman offers a free, modest 4-10% overall token reduction for chat-heavy workflows, your most significant savings, potentially 70-85%, will come from enabling prompt caching for repeated inputs and implementing intelligent model routing to direct simpler tasks to cheaper models. Integrate Caveman as a bonus, but focus your initial efforts on the higher-impact strategies.

Key insights

Caveman provides real but modest AI token savings, primarily on discursive text, with caching and routing offering greater impact.

Principles

AI token savings are task-dependent.
Input token costs are often overlooked.
Model routing optimizes cost-performance.

Method

Caveman is installed via `npx` or `git clone`, activated with `/caveman`, and offers Lite, Full, or Ultra compression. Prompt caching activates automatically. Model routing uses tiered models based on task complexity.

In practice

Use Caveman for chat-heavy AI interactions.
Implement prompt caching for repeated inputs.
Route simple tasks to cheaper models.

Topics

AI Cost Optimization
Token Reduction
Prompt Caching
Model Routing
Claude Code
LLM Efficiency

Code references

JuliusBrussee/caveman

Best for: MLOps Engineer, AI Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.