Scaling LLMs horizontally: hidden-state coupling without weight modification [R]

2026-05-18 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

Residual Coupling (RC) is a novel architecture that horizontally scales language models by connecting frozen base models in parallel using small, learned linear bridge projections. These bridges read hidden states from one model and inject additive updates into another's residual stream at intermediate layers, forming feedback loops in bilateral setups to stabilize streams without altering base weights. This approach establishes a two-step paradigm where base models act as memorizers and lightweight linear bridges handle cross-domain generalization, preventing overfitting by mapping only existing geometric relationships. RC significantly reduces perplexity in medical tasks (80.7% reduction to 11.02 vs. 57.08 for baseline), improves TruthfulQA Health accuracy by 9.1 percentage points, and achieves a perplexity of 5.91 in a coding test with mismatched tokenizers, outperforming MoE and frozen baselines.

Key takeaway

For AI Engineers building multi-model systems, Residual Coupling offers a compelling alternative to vertical scaling. You can integrate specialist models or add/remove components without retraining the entire system, preserving base model integrity and preventing catastrophic forgetting. Consider RC for scenarios requiring dynamic capability fusion or when leveraging diverse, pre-trained models for improved generalization and reduced hallucinations.

Key insights

Residual Coupling enables horizontal scaling of frozen LLMs via linear bridges, enhancing performance without weight modification.

Principles

Frozen base weights prevent catastrophic forgetting.
Linear bridges map existing geometric relationships.
Uncorrelated hallucinations allow error suppression.

Method

Residual Coupling connects frozen LLMs in parallel using linear bridge projections that read hidden states and inject additive updates into other models' residual streams, forming stabilizing feedback loops.

In practice

Integrate specialist models without retraining.
Replace multi-turn prompting with single parallel pass.
Deploy models/bridges on separate nodes or edge devices.

Topics

Residual Coupling
Horizontal LLM Scaling
Hidden-State Coupling
Frozen Language Models
Cross-Domain Generalization

Code references

pfekin/residual-coupling

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.