Federated Nested Learning: Collaborative Training of Self-Referential Memories for Test-Time Adaptation

· Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Expert, extended

Summary

Federated Nested Learning (FedNL) is a novel framework that redefines Federated Learning (FL) as a three-level nested optimization system, moving beyond static model aggregation to collaboratively learn optimization rules. This approach addresses the persistent challenges of Non-IID client data and long-tail distributions in Federated LLMs. FedNL integrates a Titans-based linear attention mechanism, enabling clients to perform lightweight, zero-shot test-time adaptation by treating a delta rule as an online gradient step. Experiments on Non-IID MMLU and long-context benchmarks demonstrate that FedNL achieves competitive performance in short-context reasoning, enhances long-context retrieval and streaming Cross-Entropy, and maintains constant inference memory. It also significantly reduces communication overhead by aggregating only memory-update meta-rules, a $\sim350\times$ reduction compared to FedAvg.

Key takeaway

For research scientists developing federated learning systems for LLMs, FedNL offers a paradigm shift to overcome Non-IID data challenges and improve long-context performance. You should consider implementing FedNL's three-level nested optimization to enable zero-shot test-time adaptation and significantly reduce communication costs, especially for resource-constrained edge deployments. This approach allows your models to adapt dynamically to heterogeneous local data without altering global weights, enhancing robustness and efficiency.

Key insights

FedNL enables federated models to learn adaptive memory update rules, not just static weights, for robust test-time adaptation.

Principles

Method

FedNL reformulates FL into three nested optimization levels: client-side test-time adaptation via Delta Rule, client-side rule learning via meta-gradients, and server-side collaborative generalization via aggregation of rules.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.