Can In-Context Learning Support Intrinsic Curiosity?
Summary
The authors investigate whether in-context learning (ICL) in sequence models can address the computational bottleneck of "intrinsic curiosity" in automated data selection. Traditional approaches for intrinsic curiosity, which reward agents based on "learning progress" (how much new data improves a world model's predictive ability), are computationally expensive due to gradient descent updates. This work explores if ICL can serve as an immediate, update-free world model. They prove that ICL-derived rewards are generally biased in Markov decision processes but are unbiased and asymptotically converge to true learning progress in non-temporal settings like active learning and Bayesian Experimental Design. Controlled experiments across continuous and symbolic environments corroborate this theory, demonstrating the ICL-driven framework successfully trains curious data-collection policies that explore optimally.
Key takeaway
For Machine Learning Engineers designing data collection strategies, if you are working in non-temporal domains like active learning or experimental design, consider applying in-context learning (ICL) for intrinsic curiosity. This approach offers a computationally efficient way to derive learning progress rewards, enabling your models to explore optimally without the overhead of traditional gradient descent updates. You can train policies using ICL's prediction errors to enhance data selection.
Key insights
In-context learning can enable computationally efficient intrinsic curiosity for data selection in non-temporal settings.
Principles
- ICL-derived rewards are biased in general Markov Decision Processes.
- ICL rewards converge to true learning progress in non-temporal settings.
- Learning progress effectively drives optimal exploration policies.
Method
Train an exploration policy to maximize learning progress using an in-context learner's prediction errors and counterfactual context manipulations, bypassing expensive gradient descent updates.
In practice
- Apply ICL for active learning reward functions.
- Utilize ICL in Bayesian Experimental Design.
- Develop ICL-driven data collection policies.
Topics
- In-Context Learning
- Intrinsic Curiosity
- Data Selection
- Active Learning
- Bayesian Experimental Design
- Sequence Models
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.