In-Context Learning Demystified: The Kernel Machine Perspective

2026-01-11 · Source: Agus’s Substack · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Advanced, medium

Summary

In-context learning (ICL), often perceived as a mysterious emergent property in large language models, is fundamentally a process of "weighted voting based on similarity," akin to kernel machines. This approach, which requires no gradient updates or parameter tuning at inference, directly parallels how Transformers operate. The core mechanism involves computing similarity weights between a query point and context examples using a kernel function, normalizing these into attention weights via softmax, and then aggregating labels through weighted voting. This method is demonstrated using a simple Radial Basis Function (RBF) kernel in Python, showing how predictions adapt to context examples without retraining. This perspective demystifies attention and foundation models, presenting them as sophisticated kernel machines with learned similarity functions.

Key takeaway

For AI Engineers and Machine Learning Engineers seeking to understand and implement in-context learning, recognize that it is a form of kernel smoothing. You should focus on designing effective similarity functions, whether explicit kernels or learned embeddings, to build ICL systems. This understanding allows you to create adaptable models without extensive retraining, making complex "emergent" behaviors interpretable and reproducible.

Key insights

In-context learning is fundamentally kernel smoothing with optionally learned similarity functions.

Principles

Attention IS kernel smoothing.
ICL doesn't require training.
Similarity-based voting drives ICL.

Method

Predict by computing similarity k(x, xᵢ) between query x and context xᵢ, normalizing to weights wᵢ(x) = k(x,xᵢ)/∑k(x,xⱼ), and aggregating labels: y^(x)=∑wᵢ(x)⋅yᵢ.

In practice

Use RBF, cosine, or polynomial kernels.
Implement learned embeddings for kernels.
Apply Scott's rule for bandwidth selection.

Topics

In-Context Learning
Kernel Machines
Attention Mechanisms
Transformer Models
Similarity Functions

Code references

asudjianto-xml/substack

Best for: AI Engineer, Machine Learning Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Agus’s Substack.