Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models

2026-04-09 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A novel feedforward graph architecture integrates heterogeneous frozen large language models (LLMs) as computational nodes, communicating through a shared continuous latent space via learned linear projections. This architecture builds on prior work showing geometric compatibility between independently trained LLM latent spaces, extending it to end-to-end trainable multi-node graphs. The system uses three small frozen models (Llama-3.2-1B, Qwen2.5-1.5B, Gemma-2-2B) to encode input into a shared latent space, which then feeds into two larger frozen models (Phi-3-mini, Mistral-7B). A lightweight cross-attention output node processes their representations. With only 17.6M trainable parameters against approximately 12B frozen, the architecture achieves 87.3% on ARC-Challenge, 82.8% on OpenBookQA, and 67.2% on MMLU, surpassing the best single constituent model by 11.4, 6.2, and 1.2 percentage points, respectively.

Key takeaway

For AI Engineers seeking to improve model performance without extensive retraining, this architecture offers a compelling approach. You can achieve significant gains by integrating multiple frozen LLMs into a feedforward graph, leveraging their combined strengths with a relatively small number of trainable parameters. Consider experimenting with different combinations of frozen models and optimizing the projection matrices to enhance task-specific accuracy.

Key insights

Frozen LLMs can form a trainable feedforward graph via linear projections in a shared latent space.

Principles

LLM latent spaces exhibit geometric compatibility.
Gradient flow is tractable across frozen model boundaries.

Method

Input is encoded by small frozen LLMs into a shared latent space, injected into larger frozen LLMs, then processed by a cross-attention output node, optimizing linear projections via backpropagation.

In practice

Combine diverse frozen LLMs for enhanced performance.
Utilize minimal trainable parameters for complex tasks.

Topics

Feedforward Graph Architecture
Frozen Language Models
Latent Space Projections
Multi-Node LLM Graphs
Performance Benchmarking

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.