Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A novel feedforward graph architecture integrates heterogeneous frozen large language models (LLMs) as computational nodes, communicating through a shared continuous latent space via learned linear projections. This architecture builds on prior work showing geometric compatibility between independently trained LLM latent spaces, extending it to end-to-end trainable multi-node graphs. The system uses three small frozen models (Llama-3.2-1B, Qwen2.5-1.5B, Gemma-2-2B) to encode input into a shared latent space, which then feeds into two larger frozen models (Phi-3-mini, Mistral-7B). A lightweight cross-attention output node processes their representations. With only 17.6M trainable parameters against approximately 12B frozen, the architecture achieves 87.3% on ARC-Challenge, 82.8% on OpenBookQA, and 67.2% on MMLU, surpassing the best single constituent model by 11.4, 6.2, and 1.2 percentage points, respectively.

Key takeaway

For AI Engineers seeking to improve model performance without extensive retraining, this architecture offers a compelling approach. You can achieve significant gains by integrating multiple frozen LLMs into a feedforward graph, leveraging their combined strengths with a relatively small number of trainable parameters. Consider experimenting with different combinations of frozen models and optimizing the projection matrices to enhance task-specific accuracy.

Key insights

Frozen LLMs can form a trainable feedforward graph via linear projections in a shared latent space.

Principles

Method

Input is encoded by small frozen LLMs into a shared latent space, injected into larger frozen LLMs, then processed by a cross-attention output node, optimizing linear projections via backpropagation.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.