LLM-as-Code Agentic Programming for Agent Harness

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & & Engineering, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Agentic Programming and LLM-as-Code are proposed as a solution to inherent reliability issues in current LLM agent frameworks, which typically assign deterministic control flow to probabilistic LLMs. This architectural flaw leads to problems like token explosion, control-flow hallucination, and unreliable task completion. The new paradigm shifts control flow to the program, treating the LLM as an "LLM-as-Code" component invoked only for specific reasoning or generation tasks. It features a code-driven workflow, a Directed Acyclic Graph (DAG)-structured context that limits context length by call depth, facilitates multi-agent collaboration, and supports self-programmed evolution where improvements are committed as durable code. Empirical evidence from a GUI automation agent on the OSWorld benchmark demonstrates its practicality, achieving an 86.8% success rate in 15 steps, surpassing the strongest prior system's 80.4% in 100 steps.

Key takeaway

For Machine Learning Engineers building LLM agents for structured, long-horizon tasks, you should reconsider the common LLM-as-orchestrator paradigm. This approach inherently leads to unreliability and context overflow. Instead, adopt Agentic Programming, where your program manages deterministic control flow and invokes LLMs as adaptive components for reasoning. This ensures compliance, bounds context, and improves overall agent stability, as demonstrated by an 86.8% success rate on OSWorld.

Key insights

Assigning deterministic control flow to probabilistic LLMs causes inherent agent unreliability; programs should manage control, invoking LLMs only for reasoning.

Principles

Method

Implement agent workflows with ordinary code for control flow, invoking LLMs as "LLM-as-Code" components for reasoning or generation within specific function calls. This creates a DAG-structured context.

In practice

Topics

Code references

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.