Slow for AI Weights, Fast for AI Harness (FST)

· Source: Discover AI · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

UC Berkeley, in collaboration with Mila and UT Austin, has introduced a new AI training methodology called Fast Slow Training (FST) for Large Language Models (LLMs), published on May 12, 2026. FST aims to overcome challenges like catastrophic forgetting and loss of reasoning plasticity in continual LLM adaptation by separating core, abstract knowledge from dynamic, short-lived data. This approach jointly optimizes "slow weights" (LLM tensor parameters) through reinforcement learning (RL) and "fast weights" (optimized context, such as prompt populations) via GePA (Generative Prompt Evolution for Agents) reflective optimization. Empirical results demonstrate that FST is up to three times more sample efficient than RL alone, achieves higher performance asymptotes, exhibits less parametric drift, and preserves reasoning plasticity, making LLMs more adaptable to new tasks and continual learning scenarios. The methodology was tested on an 8-billion parameter Qwen-3 model using an 8x H100 GPU cluster, with training times ranging from 25 to 40 GPU hours per task.

Key takeaway

For AI Engineers and Research Scientists developing continually adapting LLMs, FST offers a superior approach to traditional reinforcement learning. Your teams should consider implementing this coupled learning framework, which integrates slow weight updates via RL with fast context optimization using GePA. This strategy will enhance data efficiency, reduce parametric drift, and preserve the LLM's reasoning plasticity, enabling more robust performance across evolving tasks and domains. Explore the mathematical coupling to fully leverage its benefits.

Key insights

FST jointly optimizes LLM parameters and dynamic context for more efficient and adaptable continual learning.

Principles

Method

FST uses a closed-loop algorithm: GePA updates prompt populations based on current policy and look-ahead data, then RL performs slow weight updates while holding the new prompt population fixed, creating co-adaptation.

In practice

Topics

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Discover AI.