How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS

2026-05-30 · Source: AI Engineer · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Advanced, long

Summary

Nick Nisi of WorkOS details his experience building and deploying AI agent systems, both internally with "Case" and externally with the WorkOS CLI. For internal development, he scaled work across 20+ repositories using agents, developing "Case," a harness built on Pi and a TypeScript state machine with five agents (implementer, verifier, reviewer, closer, retro agent). This system enforces task completion with cryptographic proof, preventing agents from "lying." For the WorkOS CLI, agents automate AuthKit installation in under 5 minutes, even provisioning accounts. Critically, Nisi discovered that reducing agent "skills" by 95% (from 10,000 lines of generated documentation-based skills to 553 lines of hand-written "gotchas") significantly improved results. Evals, which previously took 68 minutes and now take 6 minutes, revealed that adding skills could decrease correctness from 97% to 77%.

Key takeaway

For AI Engineers building agentic systems, prioritize robust enforcement mechanisms over extensive instruction. You should implement state machines and cryptographic proofs to ensure agents complete tasks reliably, rather than trusting prompts. Focus your agent's "skills" on specific "gotchas" identified through continuous evaluation, as over-instruction can degrade performance. Measure agent outcomes rigorously to avoid introducing noise and ensure your system genuinely improves efficiency.

Key insights

Agent performance improves by enforcing actions with code, guiding with specific "gotchas," and rigorously measuring outcomes, not by comprehensive instruction.

Principles

Enforce agent actions with code, not just prompts.
Guide models with specific "gotchas," not comprehensive docs.
Measure agent performance; do not assume effectiveness.

Method

Implement a TypeScript state machine to orchestrate agents (implementer, verifier, reviewer, closer, retro agent) with enforced gates. Cryptographically prove task completion. Refine agent "skills" by focusing on common "gotchas" identified through rigorous evals.

In practice

Verify test execution with SHA-256 hashes of output.
Record Playwright videos to prove UI bug fixes.
Automate product setup, like AuthKit installation.

Topics

AI Agents
Agentic Systems
Harness Engineering
LLM Evaluation
Developer Experience
WorkOS CLI

Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer, AI Architect

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Engineer.