Learning From Examples Is Geometry Discovery Too

2026-01-11 · Source: Agus’s Substack · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, long

Summary

This article explains the geometric underpinnings of tabular in-context learning (ICL) models like TabPFN and TabICL, which offer competitive performance on small to medium tabular classification tasks without per-dataset fitting. It details how these models function as an upgraded k-nearest neighbors (k-NN) approach, learning a similarity function from thousands of synthetic datasets during pretraining rather than relying on hand-picked metrics. The core mechanism involves attention as a learned kernel, where a scoring matrix M = W_Qᵀ W_K defines similarity, enabling the model to rescale, decorrelate, and suppress noise dimensions. Pretraining ensures the model converges to a Bayesian posterior predictive, making its probability estimates meaningful when deployment tasks align with the synthetic prior. The article also introduces geometric diagnostics, such as decomposing M into symmetric (S) and antisymmetric (A) parts, to understand the learned geometry and assess prior-deployment alignment.

Key takeaway

For AI Scientists and Machine Learning Engineers deploying tabular ICL models, understanding the learned geometry is crucial for reliable performance. You should extract and analyze the scoring matrix M, particularly its symmetric (S) component's eigenstructure, to verify alignment between the model's pretraining prior and your specific deployment data. Misalignment indicates the model may be extrapolating, making its probability estimates unreliable, even if predictions are accurate. Always report both ECE and Brier scores to accurately assess calibration under prior shifts.

Key insights

Tabular ICL models learn a Bayesian-optimal similarity function from synthetic data, acting as an advanced k-NN.

Principles

Attention functions as a learned Nadaraya-Watson kernel.
Pretraining aligns the model with a Bayesian posterior predictive.
Geometric diagnostics reveal model's learned data structure.

Method

Tabular ICL pretraining involves minimizing prediction error across thousands of synthetic tasks, each with a hidden labeling rule, to learn a generalized notion of tabular structure and a robust similarity function.

In practice

Use TabICL for small-to-medium tabular classification.
Inspect M's eigenstructure for prior-deployment alignment.
Report both ECE and Brier score for calibration.

Topics

Tabular In-Context Learning
Geometric Machine Learning
Learned Similarity Functions
Bayesian Posterior Predictive
Attention Mechanisms

Code references

asudjianto-xml/substack

Best for: Machine Learning Engineer, AI Scientist, Research Scientist, Data Scientist, MLOps Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Agus’s Substack.