Build agents that own workflows — or workflows that own LLM calls?
Every team is rebuilding this decision. The 'autonomous agent' framing burned a lot of teams in 2025; 'workflow with LLM steps' burns them differently. The two architectures aren't interchangeable.
The question
Our 11 production agents started as 'autonomous agents that decide their own tool calls' (LangGraph, CrewAI patterns). We're now considering moving 3-4 of them to a 'workflow with LLM calls as steps' architecture (Temporal-style, deterministic state machines). Which architecture do we standardize on for new work, and do we migrate any of the existing 11?
The premise
- Team
- ~50 engineers, ~10 actively building AI features, single MLOps engineer. AI work pulls from feature-shipping capacity — any new commitment has to trade against the roadmap. AI infra: 1 senior engineer with both LangGraph and Temporal-style experience; 2 mid-level on LangGraph only; agents owned by feature pods (no central platform team).
- Compliance
- SOC2 Type II in scope. EU customer data subjects us to GDPR plus the EU AI Act's August 2026 GPAI-deployer obligations. Deterministic workflows are easier to audit and produce traceability artifacts the EU AI Act will ask for.
- Stack
- 11 production agents: 7 'autonomous' (the agent decides tool calls dynamically), 4 'orchestrated' (a workflow defines steps, LLM is one of them). The autonomous agents have caused 80% of our production incidents — including one that briefly leaked PII via a tool call. The orchestrated agents have caused 20% but feel 'lower-leverage' to the pods who built them.
- Budget
- Monthly AI spend ~$30K with quarterly board visibility. Approvals required for sustained jumps >20%. Cost-per-outcome metrics in place; finance asks for unit economics by use case. Migrating 3-4 agents to a workflow architecture: ~6 engineer-weeks. Net cost is engineering time, not external spend.
Are the autonomous-agent incidents a feature problem or a framework problem?
Mostly the autonomous-decision-point itself. The agents that 'choose' which tool to call, with no constraints, surface bad behaviors (wrong tool, infinite loops, prompt-injection escalation). Adding more guardrails to autonomous agents helps but doesn't fix the root cause. Workflow architecture moves the decision point to deterministic code, not LLM-at-runtime.
What do we lose by moving to workflows?
Flexibility — the agent can no longer adapt to unforeseen inputs the way an autonomous one does. For ~80% of our production cases (well-bounded workflows: 'process this support ticket', 'enrich this lead', 'classify this contract'), this isn't a loss — we never wanted runtime flexibility there. For 2-3 cases (research agents, exploratory tools), autonomous is the feature and we keep it.
Which existing agents migrate, and which stay?
Migrate: customer-support deflection, sales-enrichment, internal ops automation (the 4 highest-volume + most reliability-sensitive). Stay autonomous: research-assistant and the 2 internal exploratory tools. Net: 4 of 11 migrate; new work defaults to workflow + LLM-steps unless there's an explicit reason for autonomous.
Counsel's position
Standardize on deterministic workflow architectures for all new AI development to ensure EU AI Act traceability, and commit your 6 engineer-week budget to migrating the most error-prone autonomous agents immediately to eliminate PII leaks and reduce your 80% incident rate.
Verdict
The verdict: Adopt centralized governance and checkpointing for multi-day workflows.
Adopt centralized governance and checkpointing for multi-day workflows
Given your upcoming EU AI Act obligations, externalizing agent policies and maintaining state are critical for compliance and reliability.
Migrate from single autonomous agents to specialized multi-agent workflows
Given your 80% incident rate with autonomous agents, shifting to specialized agents connected by standardized protocols will drastically improve reliability.
Transition to custom orchestration layers for strict production SLAs
Given your limited MLOps capacity and need for SOC2 compliance, owning your orchestration layer provides the exact execution clarity required for incident response.
Standardize on orchestrated sub-agents and strict structured outputs
Given your $30K monthly AI spend and board visibility, implementing native orchestration with circuit breakers will prevent runaway token burn during agent failures.
Build deterministic workflow agents for strict execution control
Given your need for traceability under the EU AI Act, hardcoding sequential steps rather than relying on LLM decision-making ensures predictable, auditable behavior.
Read another verdict
- Set our LLM data retention policy now, or wait for an incident to force it?
- Build our own vertical copilot — or buy from a category vendor?
- Roll AI coding tools across the whole engineering org — and how do we measure it?
- Standardize the team on one agent framework, or let each pod pick?
- Kill every AI pilot that can't show ROI in 90 days?
- Use AI to flatten middle management this year?
- Stand up a FinOps practice for tokens and GPUs now?
- Replace customer support with AI — or avoid the Klarna outcome?