Beyond KV-Cache: Test-Time Training & Invariant Latent Topologies for ICL

· Source: Discover AI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

Two new methodologies, both published on April 7, 2026, significantly advance AI system training and generalization. The first, "In-Place Test Time Training" (TTT) by ByteDance, Seed, and Peking University, repurposes the Multi-Layer Perceptron (MLP) block within a transformer architecture. It treats the W_down projection matrix as a "fast weight" that updates continuously during inference, allowing an LLM to adapt and learn in real-time without retraining. The second paper, "Domain Invariant Neuron-Based Retrieval" by Harbin Institute of Technology and Peng Cheng Laboratory, introduces a method for cross-domain knowledge transfer. It identifies "Domain Invariant Neurons" (DEANs) whose activation polarities remain consistent across different domains, enabling the extraction of gauge-invariant logical topologies and improving generalization to unseen domains. Both approaches shift focus from attention mechanisms to the MLP and hidden state space as the core computational and memory hubs.

Key takeaway

For AI Engineers and Research Scientists developing adaptive LLMs, these methodologies offer a paradigm shift. You can now implement continuous learning during inference by dynamically updating specific weight matrices, or achieve robust cross-domain generalization by identifying and leveraging domain-invariant neurons. Consider combining these approaches to create more efficient, structurally-aware, and dynamically adapting AI agents, potentially by applying DEAN analysis to identify a logical subspace within W_down for targeted TTT updates.

Key insights

New methods enable LLMs to learn during inference and generalize across diverse domains by modifying internal weight structures or identifying invariant neural subspaces.

Principles

Method

In-Place TTT updates the W_down projection matrix during inference via gradient descent. DEANs are identified by consistent activation polarities across source and target domains, then used for retrieval.

In practice

Topics

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Discover AI.