Attention Is Not What You Think!
Summary
The article demystifies in-context learning for tabular data, asserting that it is fundamentally "row attention" and has been a feature of Random Forests for two decades. Using the California Housing dataset, the author demonstrates how a Random Forest Regressor, trained with `n_estimators=200`, `max_depth=10`, and `min_samples_leaf=10`, implicitly learns row similarity. This "Random Forest proximity" is calculated by identifying how often rows fall into the same leaf across multiple trees, yielding a similarity score between 0 and 1. This proximity is then normalized into "attention weights" to predict a target value, such as a house price, by aggregating outcomes from similar rows. This mechanism, which is nonlinear, conditional, and feature-selective, is presented as a more sophisticated form of attention compared to kNN's fixed distance metric, achieving similar results to Transformer-based in-context learning without complex architectures.
Key takeaway
For Machine Learning Engineers evaluating advanced models for tabular data, understand that "in-context learning" is not exclusive to Transformers. Your existing Random Forest models already perform a sophisticated form of row attention, offering a robust, interpretable alternative for tasks requiring dynamic sample weighting. Consider exploring GBDT proximity or combining row and column attention to enhance your current tabular ML approaches.
Key insights
In-context learning for tabular data is row attention, a mechanism Random Forests have employed for 20 years.
Principles
- Row attention identifies relevant rows and their influence.
- Random Forest proximity measures row similarity via shared leaves.
- Learned similarity often surpasses fixed distance metrics.
Method
Train a Random Forest, compute row proximity by counting shared leaves across trees, normalize these counts into attention weights, and use these weights to aggregate target values for prediction.
In practice
- Use `rf.apply(X)` to get leaf indices.
- Calculate proximity as mean of shared leaf occurrences.
- Normalize proximity for attention weights.
Topics
- In-Context Learning
- Random Forests
- Attention Mechanisms
- Tabular Data
- Random Forest Proximity
Best for: Machine Learning Engineer, Data Scientist, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Agus’s Substack.