Scaling LLMs horizontally: hidden-state coupling without weight modification [R]
Summary
Residual Coupling (RC) is a novel architecture that horizontally scales language models by connecting frozen base models in parallel using small, learned linear bridge projections. These bridges read hidden states from one model and inject additive updates into another's residual stream at intermediate layers, forming feedback loops in bilateral setups to stabilize streams without altering base weights. This approach establishes a two-step paradigm where base models act as memorizers and lightweight linear bridges handle cross-domain generalization, preventing overfitting by mapping only existing geometric relationships. RC significantly reduces perplexity in medical tasks (80.7% reduction to 11.02 vs. 57.08 for baseline), improves TruthfulQA Health accuracy by 9.1 percentage points, and achieves a perplexity of 5.91 in a coding test with mismatched tokenizers, outperforming MoE and frozen baselines.
Key takeaway
For AI Engineers building multi-model systems, Residual Coupling offers a compelling alternative to vertical scaling. You can integrate specialist models or add/remove components without retraining the entire system, preserving base model integrity and preventing catastrophic forgetting. Consider RC for scenarios requiring dynamic capability fusion or when leveraging diverse, pre-trained models for improved generalization and reduced hallucinations.
Key insights
Residual Coupling enables horizontal scaling of frozen LLMs via linear bridges, enhancing performance without weight modification.
Principles
- Frozen base weights prevent catastrophic forgetting.
- Linear bridges map existing geometric relationships.
- Uncorrelated hallucinations allow error suppression.
Method
Residual Coupling connects frozen LLMs in parallel using linear bridge projections that read hidden states and inject additive updates into other models' residual streams, forming stabilizing feedback loops.
In practice
- Integrate specialist models without retraining.
- Replace multi-turn prompting with single parallel pass.
- Deploy models/bridges on separate nodes or edge devices.
Topics
- Residual Coupling
- Horizontal LLM Scaling
- Hidden-State Coupling
- Frozen Language Models
- Cross-Domain Generalization
Code references
Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.