Ensembling Tabular Foundation Models - A Diversity Ceiling And A Calibration Trap
Summary
Tabular Foundation Models (TFMs) are increasingly competitive with gradient-boosted trees on tabular tasks, yet no single TFM consistently outperforms others. Ensembling these models, a common strategy, shows limited effectiveness. A study benchmarking six modern TFMs and six ensemble strategies across 153 OpenML classification tasks revealed that these TFMs form a near-redundant pool, with a mean pairwise Q-statistic of $0.961$. The most effective ensemble, two-level cascade stacking, achieved only a $+0.18\%$ accuracy improvement over the best single TFM, but at $253\times$ the computational cost. Notably, stacking with a logistic-regression meta-learner improved accuracy by sharpening class boundaries, which simultaneously degraded model calibration, leading to poor log-loss performance despite competitive accuracy and ROC-AUC.
Key takeaway
For AI Engineers evaluating ensemble strategies for Tabular Foundation Models, you should prioritize greedy selection over complex stacking methods. While two-level cascade stacking offers a marginal accuracy boost, its $253\times$ compute cost is generally prohibitive. Be aware that stacking with logistic regression can improve accuracy but severely compromises calibration, making it unsuitable for applications where well-calibrated probabilities are essential.
Key insights
Ensembling modern Tabular Foundation Models yields minimal gains due to high redundancy and calibration issues.
Principles
- High TFM redundancy limits ensemble gains.
- Sharpening class boundaries harms calibration.
Method
Benchmarked six TFMs and six ensemble strategies on 153 OpenML classification tasks, using Friedman and Nemenyi analysis to compare performance and identify equivalence groups.
In practice
- Greedy selection is a practical default.
- Avoid stacking if calibration is critical.
Topics
- Tabular Foundation Models
- Ensemble Learning
- Model Diversity
- Model Calibration
- Stacking Ensembles
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.