Build agents that own workflows — or workflows that own LLM calls?

Every team is rebuilding this decision. The 'autonomous agent' framing burned a lot of teams in 2025; 'workflow with LLM steps' burns them differently. The two architectures aren't interchangeable.

2026-05-27 · Counsel verdict · AIssential

The question

Our 11 production agents started as 'autonomous agents that decide their own tool calls' (LangGraph, CrewAI patterns). We're now considering moving 3-4 of them to a 'workflow with LLM calls as steps' architecture (Temporal-style, deterministic state machines). Which architecture do we standardize on for new work, and do we migrate any of the existing 11?

The premise

Team: ~50 engineers, ~10 actively building AI features, single MLOps engineer. AI work pulls from feature-shipping capacity — any new commitment has to trade against the roadmap. AI infra: 1 senior engineer with both LangGraph and Temporal-style experience; 2 mid-level on LangGraph only; agents owned by feature pods (no central platform team).
Compliance: SOC2 Type II in scope. EU customer data subjects us to GDPR plus the EU AI Act's August 2026 GPAI-deployer obligations. Deterministic workflows are easier to audit and produce traceability artifacts the EU AI Act will ask for.
Stack: 11 production agents: 7 'autonomous' (the agent decides tool calls dynamically), 4 'orchestrated' (a workflow defines steps, LLM is one of them). The autonomous agents have caused 80% of our production incidents — including one that briefly leaked PII via a tool call. The orchestrated agents have caused 20% but feel 'lower-leverage' to the pods who built them.
Budget: Monthly AI spend ~$30K with quarterly board visibility. Approvals required for sustained jumps >20%. Cost-per-outcome metrics in place; finance asks for unit economics by use case. Migrating 3-4 agents to a workflow architecture: ~6 engineer-weeks. Net cost is engineering time, not external spend.

Are the autonomous-agent incidents a feature problem or a framework problem?

Mostly the autonomous-decision-point itself. The agents that 'choose' which tool to call, with no constraints, surface bad behaviors (wrong tool, infinite loops, prompt-injection escalation). Adding more guardrails to autonomous agents helps but doesn't fix the root cause. Workflow architecture moves the decision point to deterministic code, not LLM-at-runtime.

What do we lose by moving to workflows?

Flexibility — the agent can no longer adapt to unforeseen inputs the way an autonomous one does. For ~80% of our production cases (well-bounded workflows: 'process this support ticket', 'enrich this lead', 'classify this contract'), this isn't a loss — we never wanted runtime flexibility there. For 2-3 cases (research agents, exploratory tools), autonomous is the feature and we keep it.

Which existing agents migrate, and which stay?

Migrate: customer-support deflection, sales-enrichment, internal ops automation (the 4 highest-volume + most reliability-sensitive). Stay autonomous: research-assistant and the 2 internal exploratory tools. Net: 4 of 11 migrate; new work defaults to workflow + LLM-steps unless there's an explicit reason for autonomous.

Counsel's position

Standardize on deterministic workflow architectures for all new AI development to ensure EU AI Act traceability, and commit your 6 engineer-week budget to migrating the most error-prone autonomous agents immediately to eliminate PII leaks and reduce your 80% incident rate.

Verdict

The verdict: Adopt centralized governance and checkpointing for multi-day workflows.

Adopt centralized governance and checkpointing for multi-day workflows

Given your upcoming EU AI Act obligations, externalizing agent policies and maintaining state are critical for compliance and reliability.

The Production Gap: 5 Patterns for Building Long-Running AI Agents*

Migrate from single autonomous agents to specialized multi-agent workflows

Given your 80% incident rate with autonomous agents, shifting to specialized agents connected by standardized protocols will drastically improve reliability.

Three tiers of Agentic AI - and when to use none of them

Transition to custom orchestration layers for strict production SLAs

Given your limited MLOps capacity and need for SOC2 compliance, owning your orchestration layer provides the exact execution clarity required for incident response.

Why AI Engineers Are Moving Beyond LangChain to Native Agent Architectures

Standardize on orchestrated sub-agents and strict structured outputs

Given your $30K monthly AI spend and board visibility, implementing native orchestration with circuit breakers will prevent runaway token burn during agent failures.

Production-Ready AI Agents: 5 Lessons from Refactoring a Monolith

Build deterministic workflow agents for strict execution control

Given your need for traceability under the EU AI Act, hardcoding sequential steps rather than relying on LLM decision-making ensures predictable, auditable behavior.

MCP vs ADK: How Modern AI Agents Connect and Work Together

Read another verdict

Get Counsel for your own decisions →