Can In-Context Learning Support Intrinsic Curiosity?

2026-06-17 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, medium

Summary

The authors investigate whether in-context learning (ICL) in sequence models can address the computational bottleneck of "intrinsic curiosity" in automated data selection. Traditional approaches for intrinsic curiosity, which reward agents based on "learning progress" (how much new data improves a world model's predictive ability), are computationally expensive due to gradient descent updates. This work explores if ICL can serve as an immediate, update-free world model. They prove that ICL-derived rewards are generally biased in Markov decision processes but are unbiased and asymptotically converge to true learning progress in non-temporal settings like active learning and Bayesian Experimental Design. Controlled experiments across continuous and symbolic environments corroborate this theory, demonstrating the ICL-driven framework successfully trains curious data-collection policies that explore optimally.

Key takeaway

For Machine Learning Engineers designing data collection strategies, if you are working in non-temporal domains like active learning or experimental design, consider applying in-context learning (ICL) for intrinsic curiosity. This approach offers a computationally efficient way to derive learning progress rewards, enabling your models to explore optimally without the overhead of traditional gradient descent updates. You can train policies using ICL's prediction errors to enhance data selection.

Key insights

In-context learning can enable computationally efficient intrinsic curiosity for data selection in non-temporal settings.

Principles

ICL-derived rewards are biased in general Markov Decision Processes.
ICL rewards converge to true learning progress in non-temporal settings.
Learning progress effectively drives optimal exploration policies.

Method

Train an exploration policy to maximize learning progress using an in-context learner's prediction errors and counterfactual context manipulations, bypassing expensive gradient descent updates.

In practice

Apply ICL for active learning reward functions.
Utilize ICL in Bayesian Experimental Design.
Develop ICL-driven data collection policies.

Topics

In-Context Learning
Intrinsic Curiosity
Data Selection
Active Learning
Bayesian Experimental Design
Sequence Models

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.