ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces

· Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Advanced, long

Summary

ACAR (Adaptive Complexity & Attribution Routing) is a measurement framework for multi-model orchestration that uses self-consistency variance ($\\sigma$) to route tasks across single-model, two-model, and three-model execution modes. Evaluated across 1,510 tasks from four benchmarks (MathArena, Reasoning Gym, LiveCodeBench, SuperGPQA) using Claude Sonnet 4, GPT-4o, and Gemini 2.0 Flash, ACAR-U (without retrieval) achieved 55.6% accuracy, surpassing the two-model baseline (54.4%) while avoiding full ensembling on 54.2% of tasks. The mechanism is model-agnostic and requires no learned components. Notably, retrieval augmentation *decreased* accuracy by 3.4 percentage points due to low semantic alignment (median similarity 0.167), and "agreement-but-wrong" scenarios (where models agree on incorrect answers) bounded achievable accuracy 8 percentage points below full ensembling. The framework also found that attribution estimates based on proxy signals showed weak correlation with ground-truth values.

Key takeaway

For AI Architects designing multi-model LLM systems, ACAR's findings suggest prioritizing heuristic-based routing over learned classifiers for auditability and stability. You should implement self-consistency variance for adaptive compute allocation, as it improves accuracy over fixed two-model ensembles while reducing full ensemble usage. Critically, avoid naive retrieval augmentation without high semantic similarity thresholds, and recognize that "agreement-but-wrong" scenarios will inherently limit maximum achievable accuracy, necessitating strategies beyond simple ensembling for those cases.

Key insights

Self-consistency variance can effectively route multi-model ensembles, but retrieval and proxy attribution often fail.

Principles

Method

ACAR routes tasks based on self-consistency variance ($\\sigma$) derived from N=3 probe samples, mapping variance to single-agent, arena-lite, or full-arena execution modes. This heuristic avoids learned components for auditability.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist, AI Architect, AI Researcher, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.