How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS

· Source: AI Engineer · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Advanced, long

Summary

Nick Nisi of WorkOS details his experience building and deploying AI agent systems, both internally with "Case" and externally with the WorkOS CLI. For internal development, he scaled work across 20+ repositories using agents, developing "Case," a harness built on Pi and a TypeScript state machine with five agents (implementer, verifier, reviewer, closer, retro agent). This system enforces task completion with cryptographic proof, preventing agents from "lying." For the WorkOS CLI, agents automate AuthKit installation in under 5 minutes, even provisioning accounts. Critically, Nisi discovered that reducing agent "skills" by 95% (from 10,000 lines of generated documentation-based skills to 553 lines of hand-written "gotchas") significantly improved results. Evals, which previously took 68 minutes and now take 6 minutes, revealed that adding skills could decrease correctness from 97% to 77%.

Key takeaway

For AI Engineers building agentic systems, prioritize robust enforcement mechanisms over extensive instruction. You should implement state machines and cryptographic proofs to ensure agents complete tasks reliably, rather than trusting prompts. Focus your agent's "skills" on specific "gotchas" identified through continuous evaluation, as over-instruction can degrade performance. Measure agent outcomes rigorously to avoid introducing noise and ensure your system genuinely improves efficiency.

Key insights

Agent performance improves by enforcing actions with code, guiding with specific "gotchas," and rigorously measuring outcomes, not by comprehensive instruction.

Principles

Method

Implement a TypeScript state machine to orchestrate agents (implementer, verifier, reviewer, closer, retro agent) with enforced gates. Cryptographically prove task completion. Refine agent "skills" by focusing on common "gotchas" identified through rigorous evals.

In practice

Topics

Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Engineer.