How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS
Summary
Nick Nisi of WorkOS details his experience building and deploying AI agent systems, both internally with "Case" and externally with the WorkOS CLI. For internal development, he scaled work across 20+ repositories using agents, developing "Case," a harness built on Pi and a TypeScript state machine with five agents (implementer, verifier, reviewer, closer, retro agent). This system enforces task completion with cryptographic proof, preventing agents from "lying." For the WorkOS CLI, agents automate AuthKit installation in under 5 minutes, even provisioning accounts. Critically, Nisi discovered that reducing agent "skills" by 95% (from 10,000 lines of generated documentation-based skills to 553 lines of hand-written "gotchas") significantly improved results. Evals, which previously took 68 minutes and now take 6 minutes, revealed that adding skills could decrease correctness from 97% to 77%.
Key takeaway
For AI Engineers building agentic systems, prioritize robust enforcement mechanisms over extensive instruction. You should implement state machines and cryptographic proofs to ensure agents complete tasks reliably, rather than trusting prompts. Focus your agent's "skills" on specific "gotchas" identified through continuous evaluation, as over-instruction can degrade performance. Measure agent outcomes rigorously to avoid introducing noise and ensure your system genuinely improves efficiency.
Key insights
Agent performance improves by enforcing actions with code, guiding with specific "gotchas," and rigorously measuring outcomes, not by comprehensive instruction.
Principles
- Enforce agent actions with code, not just prompts.
- Guide models with specific "gotchas," not comprehensive docs.
- Measure agent performance; do not assume effectiveness.
Method
Implement a TypeScript state machine to orchestrate agents (implementer, verifier, reviewer, closer, retro agent) with enforced gates. Cryptographically prove task completion. Refine agent "skills" by focusing on common "gotchas" identified through rigorous evals.
In practice
- Verify test execution with SHA-256 hashes of output.
- Record Playwright videos to prove UI bug fixes.
- Automate product setup, like AuthKit installation.
Topics
- AI Agents
- Agentic Systems
- Harness Engineering
- LLM Evaluation
- Developer Experience
- WorkOS CLI
Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Engineer.