Slow for AI Weights, Fast for AI Harness (FST)
Summary
UC Berkeley, in collaboration with Mila and UT Austin, has introduced a new AI training methodology called Fast Slow Training (FST) for Large Language Models (LLMs), published on May 12, 2026. FST aims to overcome challenges like catastrophic forgetting and loss of reasoning plasticity in continual LLM adaptation by separating core, abstract knowledge from dynamic, short-lived data. This approach jointly optimizes "slow weights" (LLM tensor parameters) through reinforcement learning (RL) and "fast weights" (optimized context, such as prompt populations) via GePA (Generative Prompt Evolution for Agents) reflective optimization. Empirical results demonstrate that FST is up to three times more sample efficient than RL alone, achieves higher performance asymptotes, exhibits less parametric drift, and preserves reasoning plasticity, making LLMs more adaptable to new tasks and continual learning scenarios. The methodology was tested on an 8-billion parameter Qwen-3 model using an 8x H100 GPU cluster, with training times ranging from 25 to 40 GPU hours per task.
Key takeaway
For AI Engineers and Research Scientists developing continually adapting LLMs, FST offers a superior approach to traditional reinforcement learning. Your teams should consider implementing this coupled learning framework, which integrates slow weight updates via RL with fast context optimization using GePA. This strategy will enhance data efficiency, reduce parametric drift, and preserve the LLM's reasoning plasticity, enabling more robust performance across evolving tasks and domains. Explore the mathematical coupling to fully leverage its benefits.
Key insights
FST jointly optimizes LLM parameters and dynamic context for more efficient and adaptable continual learning.
Principles
- Not all adaptation should be permanently written into LLM weights.
- Context and weights must be trained together, not sequentially.
- Coupled systems learn faster and retain plasticity.
Method
FST uses a closed-loop algorithm: GePA updates prompt populations based on current policy and look-ahead data, then RL performs slow weight updates while holding the new prompt population fixed, creating co-adaptation.
In practice
- Use FST for LLMs requiring frequent adaptation to new data.
- Implement GePA for fast context optimization.
- Combine RL with prompt evolution for robust continual learning.
Topics
- Slow Fast Training
- LLM Adaptation
- Reinforcement Learning
- Generative Prompt Evolution
- Catastrophic Forgetting
Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Discover AI.