Easier to Judge than to Find: Predicting In-Context Learning Success for Demonstration Selection

2026-05-18 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

A new framework called DiSP addresses the challenge of selecting effective in-context learning (ICL) demonstrations, a process highly sensitive to prompt content and computationally expensive due to the vast search space. DiSP operates on the principle that it is "easier to judge than to find" optimal demonstrations, meaning predicting the success of a query-context pair is more efficient than exhaustive searching. The framework stratifies queries by difficulty, estimates success rates via random trials, trains a lightweight router to predict query difficulty, and develops level-specific judges for sampled demonstrations. At inference, DiSP employs stop-on-acceptance judging within a defined budget, providing diagnostic risk tags if no suitable context is found. Evaluated across five classification datasets using Llama 3-8B and Qwen 2.5-7B, DiSP achieved superior average accuracy, outperforming strong learned selection baselines by up to 3.4%, and demonstrated up to a 23x wall-clock speedup.

Key takeaway

For AI Engineers optimizing in-context learning performance, DiSP offers a significant advancement by shifting from exhaustive search to predictive judging. You should consider integrating DiSP's sample-and-judge framework to improve demonstration selection accuracy and achieve substantial inference speedups, potentially reducing computational costs and latency for your LLM applications. This approach can streamline prompt engineering workflows.

Key insights

Predicting ICL success for a given query-context pair is more efficient than searching for optimal demonstrations.

Principles

Stratify queries by difficulty.
Estimate success rates via random trials.

Method

DiSP uses random trials to estimate success rates, trains a router for query difficulty prediction, and employs level-specific judges for sampled demonstrations, performing stop-on-acceptance judging at inference.

In practice

Use DiSP for ICL demonstration selection.
Implement lightweight routers for query difficulty.

Topics

In-Context Learning
Demonstration Selection
DiSP Framework
Query Difficulty Prediction
Llama 3-8B

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.