Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models
Summary
A novel feedforward graph architecture integrates heterogeneous frozen large language models (LLMs) as computational nodes, communicating through a shared continuous latent space via learned linear projections. This architecture builds on prior work showing geometric compatibility between independently trained LLM latent spaces, extending it to end-to-end trainable multi-node graphs. The system uses three small frozen models (Llama-3.2-1B, Qwen2.5-1.5B, Gemma-2-2B) to encode input into a shared latent space, which then feeds into two larger frozen models (Phi-3-mini, Mistral-7B). A lightweight cross-attention output node processes their representations. With only 17.6M trainable parameters against approximately 12B frozen, the architecture achieves 87.3% on ARC-Challenge, 82.8% on OpenBookQA, and 67.2% on MMLU, surpassing the best single constituent model by 11.4, 6.2, and 1.2 percentage points, respectively.
Key takeaway
For AI Engineers seeking to improve model performance without extensive retraining, this architecture offers a compelling approach. You can achieve significant gains by integrating multiple frozen LLMs into a feedforward graph, leveraging their combined strengths with a relatively small number of trainable parameters. Consider experimenting with different combinations of frozen models and optimizing the projection matrices to enhance task-specific accuracy.
Key insights
Frozen LLMs can form a trainable feedforward graph via linear projections in a shared latent space.
Principles
- LLM latent spaces exhibit geometric compatibility.
- Gradient flow is tractable across frozen model boundaries.
Method
Input is encoded by small frozen LLMs into a shared latent space, injected into larger frozen LLMs, then processed by a cross-attention output node, optimizing linear projections via backpropagation.
In practice
- Combine diverse frozen LLMs for enhanced performance.
- Utilize minimal trainable parameters for complex tasks.
Topics
- Feedforward Graph Architecture
- Frozen Language Models
- Latent Space Projections
- Multi-Node LLM Graphs
- Performance Benchmarking
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.