Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents
Summary
Evoflux is an inference-time evolutionary search method designed to improve tool use for compact language models (LMs) by repairing executable tool workflows. Small LMs often struggle with complex MCP-style tool use, failing in areas like tool resolution, parameter validation, and dependency tracking, a problem not effectively addressed by small-corpus distillation. Evoflux addresses this by evolving typed workflow graphs through structured edits, execution feedback, and diversity pruning. Benchmarking on held-out MCP-Bench tasks, involving live MCP servers and 250 tools, Evoflux significantly raised execution feasibility for small planners from approximately 3% to 17-24%. In contrast, SFT and SFT+DPO methods performed worse, while ReAct showed higher variance and token cost, demonstrating Evoflux's reliability under limited teacher-trace budgets.
Key takeaway
For Machine Learning Engineers deploying compact language models for complex tool orchestration, consider integrating inference-time evolutionary search methods like Evoflux. This approach significantly boosts execution feasibility from approximately 3% to 17-24% on challenging tasks, outperforming traditional fine-tuning methods when teacher-trace data is limited. You should explore execution-grounded search to build more robust and reliable tool agents, especially where plan repair and dynamic adaptation to changing tool catalogs are critical.
Key insights
Evoflux uses inference-time evolutionary search to repair executable tool workflows, significantly improving compact LM tool use.
Principles
- Small LMs struggle with complex tool orchestration.
- Small-corpus distillation fails for recovery behavior.
- Execution-grounded search enhances reliability.
Method
Evoflux evolves typed workflow graphs via structured edits, execution feedback, adaptive intensity, meta-guided redesign, and diversity pruning to repair failed plans.
In practice
- Improve compact LM tool use reliability.
- Enhance workflow feasibility from ~3% to 17-24%.
- Outperform SFT/DPO in scarce teacher-trace settings.
Topics
- Evoflux
- Compact Language Models
- Tool Use Agents
- Evolutionary Search
- Workflow Orchestration
- Inference Optimization
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.