Introducing TabFM: A zero-shot foundation model for tabular data

2026-06-30 · Source: The latest research from Google · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, medium

Summary

Google Research introduced TabFM on June 30, 2026, a new zero-shot foundation model for tabular data classification and regression, now available on Hugging Face and GitHub, and integrated into BigQuery ML. TabFM addresses the traditional bottlenecks of manual feature engineering and hyperparameter optimization in tabular machine learning by reframing prediction as an in-context learning (ICL) problem. Its novel hybrid architecture synthesizes elements from TabPFN and TabICL, employing alternating row and column attention, row compression, and a Transformer for efficient ICL. The model was pre-trained on hundreds of millions of dynamically generated synthetic datasets using structural causal models, overcoming the scarcity of diverse real-world tabular data. Benchmarked on TabArena across 38 classification and 13 regression datasets (700 to 150,000 samples), TabFM and its ensemble variant consistently achieved superior Elo scores compared to heavily tuned, industry-standard supervised algorithms.

Key takeaway

For Data Scientists and ML Engineers building tabular classification or regression models, TabFM significantly streamlines your workflow. You can now achieve high-quality predictions on new datasets without extensive hyperparameter tuning or manual feature engineering. This shifts focus from tedious model preparation to direct analysis, allowing you to deploy robust models faster. Consider integrating TabFM via BigQuery ML or its open-source repos to accelerate your predictive analytics projects.

Key insights

TabFM applies zero-shot in-context learning to tabular data, eliminating manual tuning and feature engineering for classification and regression.

Principles

Tabular prediction can be reframed as an ICL problem.
Synthetic data enables large-scale foundation model pre-training.
Hybrid attention architectures capture complex feature interactions.

Method

TabFM processes entire datasets as a unified prompt, using alternating row/column attention, row compression, then a Transformer for in-context learning on compressed embeddings.

In practice

Generate high-quality predictions in a single forward pass.
Access TabFM via Hugging Face, GitHub, or BigQuery ML.
Utilize TabFM-Ensemble for enhanced performance.

Topics

Tabular Data
Foundation Models
Zero-shot Learning
In-context Learning
BigQuery ML
Synthetic Data Training
TabArena Benchmark

Code references

google-research/tabfm

Best for: CTO, VP of Engineering/Data, Director of AI/ML, Data Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The latest research from Google.