Building self-improving tax agents with Codex
Summary
Thrive Holdings and OpenAI co-developed Tax AI, a self-improving agent system, for Crete's network of over 30 accounting firms to automate complex tax return preparation. Launched in May 2026, Tax AI processed 7,000 tax returns during a pilot season, significantly reducing the time spent on 1040 and 1041 forms. The system drafts returns with up to 97% accuracy and increases throughput by approximately 50%. Notably, Tax AI demonstrated measurable self-improvement, with returns achieving 75% correct field completion rising from 25% at launch to 86% within six weeks, and showing even faster growth at 90% and 100% completion levels. This improvement is driven by a three-part loop: expert practitioner feedback, structured production traces, and a Codex-driven iteration loop with tailored evaluations. This approach allows the system to autonomously identify and fix errors, expanding its capabilities from simpler W-2s to complex K-1s and schedules.
Key takeaway
For AI Engineers or MLOps teams building agents in domains with expert users, you should prioritize integrating practitioner feedback and comprehensive production traces into your development cycle. This enables autonomous improvement, as demonstrated by Tax AI's 97% accuracy and 50% throughput increase. Design your system to capture expert corrections as structured data, transforming them into actionable evaluation targets for an AI-driven iteration loop. This approach allows your agents to self-improve continuously, reducing manual engineering effort and expanding capabilities over time.
Key insights
Self-improving agents can be built by fusing practitioner expertise with AI-driven feedback loops and structured production data.
Principles
- Practitioners must steer product learning.
- Product design should capture full production traces.
- Use AI for iterative improvement loops.
Method
Design a three-part loop: capture practitioner corrections as structured data, group failures into actionable eval targets, and use Codex to investigate, implement fixes, and validate against evals.
In practice
- Implement structured data capture for expert actions.
- Group recurring errors to create clear eval targets.
- Use Codex to automate code fixes and validation.
Topics
- Self-improving Agents
- Codex
- Tax AI
- AI Feedback Loops
- MLOps
- Production Traces
- Accounting Automation
Best for: AI Engineer, MLOps Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by OpenAI News.