Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, short

Summary

Evoflux is an inference-time evolutionary search method designed to enhance the reliability of compact language model (LM) tool agents. These agents, while cost-effective and low-latency, often struggle with complex tool use, failing to execute workflow graphs due to issues like tool resolution, parameter validation, or dependency tracking, particularly when tool catalogs change. Traditional small-corpus distillation methods prove insufficient for teaching the necessary recovery behaviors. Evoflux addresses this by treating compact tool use as the repair of executable tool workflows, evolving typed graphs via structured edits, execution feedback, adaptive intensity, meta-guided redesign, and diversity pruning. Benchmarked on held-out MCP-Bench tasks involving live MCP servers and 250 tools, Evoflux boosted execution feasibility from approximately 3% to 17-24% across various small planners. This performance contrasts sharply with SFT and SFT+DPO, which underperformed or collapsed, and ReAct, which showed higher variance and token cost, demonstrating Evoflux's superior reliability with scarce teacher-trace budgets.

Key takeaway

For AI Engineers developing compact language model agents, if you are struggling with tool execution failures or dynamic tool catalog changes, traditional fine-tuning methods like SFT or DPO are likely insufficient. You should consider integrating inference-time evolutionary search, as demonstrated by Evoflux, to repair executable tool workflows. This approach significantly enhances execution feasibility, offering a more reliable path for agents operating with scarce teacher-trace budgets and complex tool environments.

Key insights

Evoflux repairs failed tool workflows for compact LMs via inference-time evolutionary search, significantly boosting execution feasibility.

Principles

Method

Evoflux evolves typed workflow graphs through structured edits, execution feedback, adaptive intensity, meta-guided redesign, and diversity pruning to repair failed plans at inference time.

In practice

Topics

Code references

Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.