Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents
Summary
Evoflux is an inference-time evolutionary search method designed to enhance the reliability of compact language model (LM) tool agents. These agents, while cost-effective and low-latency, often struggle with complex tool use, failing to execute workflow graphs due to issues like tool resolution, parameter validation, or dependency tracking, particularly when tool catalogs change. Traditional small-corpus distillation methods prove insufficient for teaching the necessary recovery behaviors. Evoflux addresses this by treating compact tool use as the repair of executable tool workflows, evolving typed graphs via structured edits, execution feedback, adaptive intensity, meta-guided redesign, and diversity pruning. Benchmarked on held-out MCP-Bench tasks involving live MCP servers and 250 tools, Evoflux boosted execution feasibility from approximately 3% to 17-24% across various small planners. This performance contrasts sharply with SFT and SFT+DPO, which underperformed or collapsed, and ReAct, which showed higher variance and token cost, demonstrating Evoflux's superior reliability with scarce teacher-trace budgets.
Key takeaway
For AI Engineers developing compact language model agents, if you are struggling with tool execution failures or dynamic tool catalog changes, traditional fine-tuning methods like SFT or DPO are likely insufficient. You should consider integrating inference-time evolutionary search, as demonstrated by Evoflux, to repair executable tool workflows. This approach significantly enhances execution feasibility, offering a more reliable path for agents operating with scarce teacher-trace budgets and complex tool environments.
Key insights
Evoflux repairs failed tool workflows for compact LMs via inference-time evolutionary search, significantly boosting execution feasibility.
Principles
- Small LMs struggle with complex tool use without robust recovery.
- Execution-grounded search improves reliability with limited training data.
- Distillation alone is insufficient for dynamic tool catalog changes.
Method
Evoflux evolves typed workflow graphs through structured edits, execution feedback, adaptive intensity, meta-guided redesign, and diversity pruning to repair failed plans at inference time.
In practice
- Implement evolutionary search for compact agent tool use.
- Prioritize execution feedback in agent design.
- Use Evoflux for dynamic tool catalog environments.
Topics
- Compact Language Models
- Tool Agents
- Evolutionary Search
- Workflow Repair
- Inference-Time Optimization
- MCP-Bench
Code references
Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.