Brick: Spatial Capability Routing for the Mixture-of-Models (MoM) Paradigm

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Cloud Computing & IT Infrastructure · Depth: Advanced, extended

Summary

Brick is a multimodal router designed for the Mixture-of-Models (MoM) paradigm, bridging heterogeneous LLM pools at inference time. Released in May 2026, it addresses the challenge of dispatching queries to the cheapest model that will answer correctly, moving beyond superficial routing methods. Brick scores models on six capability dimensions, combines this with a per-query difficulty estimate, and dispatches via a cost-penalized geometric rule. On Dataset A (5,504 queries), Brick at max-quality achieved 76.98% accuracy, surpassing kimi2.6 (75.02%) and external routers like RouteLLM and FrugalGPT, while being 28% cheaper. Its continuous preference knob $r$ allows operators to balance max-quality and max-saving profiles, cutting costs by up to 22.15x at min-cost. Median end-to-end latency also dropped from 51.2 s to 22.8 s.

Key takeaway

For AI Architects designing cost-effective LLM inference systems, Brick's Mixture-of-Models paradigm offers a compelling solution. You should consider implementing capability-aware routing to dynamically dispatch queries to the cheapest, most suitable model, significantly reducing cloud bills and latency. Leverage the continuous preference knob to fine-tune your quality-vs-spend balance, especially for agentic workloads where single-step routing avoids compounding costs and delays.

Key insights

Spatial capability routing for Mixture-of-Models (MoM) optimizes LLM inference cost and quality by matching query needs to model skills.

Principles

Method

Brick uses a six-step pipeline: query truncation, keyword matching, ModernBERT capability classification, complexity estimation, per-model scoring via cost-penalized geometric rule, and argmin selection.

In practice

Topics

Code references

Best for: CTO, VP of Engineering/Data, Director of AI/ML, MLOps Engineer, AI Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.