Training-Free Agentic AI: Probabilistic Control and Coordination in Multi-Agent LLM Systems

2026-03-17 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Multi-Agent Systems · Depth: Expert, extended

Summary

REDEREF is a training-free controller designed to enhance the efficiency and robustness of multi-agent Large Language Model (LLM) systems. It addresses challenges like inefficient routing, noisy feedback, and high interaction costs in complex, long-horizon tasks. The framework integrates four key components: belief-guided delegation using Thompson sampling to prioritize agents with positive historical contributions, reflection-driven re-routing via a calibrated LLM or programmatic judge, evidence-based selection over output averaging, and memory-aware priors to mitigate cold-start inefficiencies. Experiments on split-knowledge tasks demonstrate that REDEREF reduces token usage by 28%, agent calls by 17%, and time-to-success by 19% compared to random recursive delegation, all while maintaining comparable task success rates. It also exhibits graceful adaptation to agent or judge degradation, proving that simple probabilistic control can significantly improve multi-agent LLM system performance without requiring training or fine-tuning.

Key takeaway

For AI Engineers building multi-agent LLM systems, consider integrating training-free probabilistic control mechanisms like REDEREF. Your systems can achieve substantial efficiency gains—reducing token usage, agent calls, and time-to-success—without the overhead of fine-tuning or complex reinforcement learning. Focus on robust judge calibration and memory-aware prior initialization to ensure reliable, adaptive, and auditable multi-agent coordination.

Key insights

Probabilistic control with belief-guided delegation significantly boosts multi-agent LLM system efficiency and robustness without training.

Principles

Prioritize agents by their positive marginal contribution.
Binary feedback with Beta posteriors is robust to judge noise.
Selection with evidence outperforms output averaging.

Method

REDEREF uses Thompson sampling for belief-guided delegation, a calibrated judge for reflection and re-routing, evidence-based selection for aggregation, and memory-aware priors for cold-start mitigation within a recursive loop.

In practice

Implement Thompson sampling for dynamic agent routing.
Calibrate judges to quantify and manage error rates.
Use memory-aware priors to reduce cold-start inefficiency.

Topics

Multi-Agent LLM Systems
Probabilistic Control
Thompson Sampling
Recursive Delegation
Agent Coordination

Best for: AI Scientist, Research Scientist, AI Engineer, AI Researcher, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.