Why CatBoost is the hidden gem of tabular AI (and what the benchmarks actually say)

· Source: Valeriy’s Substack · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, medium

Summary

CatBoost, a gradient boosting library, offers distinct advantages over XGBoost and LightGBM, particularly in handling categorical features and mitigating prediction shift. While sharing a name and loss function with its counterparts, CatBoost integrates unique engineering decisions such as Ordered Boosting, Ordered Target Statistics, and Symmetric (Oblivious) Trees. Independent benchmarks, including studies by Shmuel et al. (111 datasets), McElfresh et al. (176 datasets), and TabArena 2025, consistently show CatBoost performing strongly among single Gradient Boosting Decision Trees (GBDTs), often outperforming XGBoost and LightGBM, especially with high-cardinality categoricals and mixed-type features. However, it may not always be the absolute winner, as ensembles like AutoGluon or tabular foundation models can sometimes surpass single GBDTs.

Key takeaway

For AI Engineers and Research Scientists building tabular models, you should consider CatBoost as a default baseline, especially when dealing with datasets rich in categorical or mixed-type features. Its native handling of these complexities often leads to superior accuracy compared to XGBoost or LightGBM, as evidenced by multiple independent benchmarks. However, if your workflow is deeply integrated with XGBoost tooling or if ensembles like AutoGluon are an option, you should weigh the benefits against potential switching costs.

Key insights

CatBoost excels in tabular AI by natively addressing categorical features and prediction shift with unique algorithmic innovations.

Principles

Method

CatBoost employs Ordered Boosting for gradient estimation, Ordered Target Statistics for categorical encoding using permutation prefixes, and Symmetric (Oblivious) Trees for balanced, regularized tree structures.

In practice

Topics

Best for: AI Engineer, Research Scientist, Machine Learning Engineer, Data Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Valeriy’s Substack.