Federated Nested Learning: Collaborative Training of Self-Referential Memories for Test-Time Adaptation

2026-05-19 · Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Expert, extended

Summary

Federated Nested Learning (FedNL) is a novel framework that redefines Federated Learning (FL) as a three-level nested optimization system, moving beyond static model aggregation to collaboratively learn optimization rules. This approach addresses the persistent challenges of Non-IID client data and long-tail distributions in Federated LLMs. FedNL integrates a Titans-based linear attention mechanism, enabling clients to perform lightweight, zero-shot test-time adaptation by treating a delta rule as an online gradient step. Experiments on Non-IID MMLU and long-context benchmarks demonstrate that FedNL achieves competitive performance in short-context reasoning, enhances long-context retrieval and streaming Cross-Entropy, and maintains constant inference memory. It also significantly reduces communication overhead by aggregating only memory-update meta-rules, a $\sim350\times$ reduction compared to FedAvg.

Key takeaway

For research scientists developing federated learning systems for LLMs, FedNL offers a paradigm shift to overcome Non-IID data challenges and improve long-context performance. You should consider implementing FedNL's three-level nested optimization to enable zero-shot test-time adaptation and significantly reduce communication costs, especially for resource-constrained edge deployments. This approach allows your models to adapt dynamically to heterogeneous local data without altering global weights, enhancing robustness and efficiency.

Key insights

FedNL enables federated models to learn adaptive memory update rules, not just static weights, for robust test-time adaptation.

Principles

Decouple global rules from local memory content.
Treat inference as an inner-loop optimization process.
Aggregate learning capabilities, not static knowledge.

Method

FedNL reformulates FL into three nested optimization levels: client-side test-time adaptation via Delta Rule, client-side rule learning via meta-gradients, and server-side collaborative generalization via aggregation of rules.

In practice

Use Titans-based linear attention for dynamic memory.
Employ LoRA adapters for parameter-efficient rule learning.
Parallelize Delta Rule computation for long sequences.

Topics

Federated Nested Learning
Test-Time Adaptation
Non-IID Data
Titans Architecture
Linear Attention

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.