Can In-Context Learning Support Intrinsic Curiosity?
Summary
A new study investigates whether in-context learning (ICL) capabilities of large sequence models can address the computational bottleneck in "intrinsic curiosity" for automated data selection. Traditional approaches, which reward agents based on learning progress, require expensive gradient descent updates within each trajectory. This research explores using ICL's prediction errors and counterfactual context manipulations as immediate, update-free world models to maximize learning progress. The findings indicate that, for general Markov Decision Processes, unbiased estimation of true learning progress using ICL is impossible due to nuisance terms or implementation constraints. However, a positive result is shown for non-temporal settings like active learning and Bayesian Experimental Design, where ICL-derived rewards successfully bound and asymptotically converge to true learning progress. Controlled experiments across continuous and symbolic environments confirm that this ICL-driven framework effectively trains curious data-collection policies for optimal exploration.
Key takeaway
For Machine Learning Engineers designing data collection strategies, you should consider ICL-driven intrinsic curiosity for non-temporal tasks like active learning or Bayesian Experimental Design. While ICL cannot provide unbiased rewards for general Markov Decision Processes, its proven convergence in specific settings offers a computationally efficient alternative to traditional gradient-descent methods. This approach allows you to train curious data-collection policies that explore optimally, potentially streamlining your data acquisition processes.
Key insights
ICL can support intrinsic curiosity for data selection in non-temporal settings, but not general MDPs.
Principles
- Intrinsic curiosity rewards learning progress.
- ICL can serve as an update-free world model.
- ICL-derived rewards converge in non-temporal settings.
Method
An exploration policy is trained to maximize learning progress using an in-context learner's prediction errors and counterfactual context manipulations.
In practice
- Apply ICL for active learning data selection.
- Use ICL in Bayesian Experimental Design.
- Train curious policies in non-temporal environments.
Topics
- In-Context Learning
- Intrinsic Curiosity
- Automated Data Selection
- Markov Decision Processes
- Active Learning
- Bayesian Experimental Design
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.