GOTabPFN: From Feature Ordering to Compact Tokenization for Tabular Foundation Models on High-Dimensional Data
Summary
GOTabPFN is a novel framework designed to make small tabular foundation models effective for High-Dimensional, Low-Sample Size (HDLSS) tabular prediction without retraining large backbones. It introduces Graph-guided Ordering with Local Refinement (GO-LR), which reorders features by solving a Minimum Linear Arrangement (MinLA) problem, proven to be NP-hard. This ordering is combined with Neuro-Inspired Subunit Compression (NSC), a mechanism that pools locally adjacent ordered features into compact meta-features. GOTabPFN enables TabPFN-style prediction in HDLSS regimes, such as gene expression data with m>>2,000 features, demonstrating improved stability and accuracy. Across 8 HDLSS and 8 cross-domain benchmarks, GOTabPFN consistently outperforms 55 baselines, achieving an average rank of 1.00.
Key takeaway
For AI Scientists and Machine Learning Engineers working with High-Dimensional, Low-Sample Size tabular data, GOTabPFN offers a robust solution to extend TabPFN-style models. You should consider integrating its GO-LR ordering and NSC compression front-end to manage high feature counts, improve predictive accuracy, and enhance model stability. This approach allows you to leverage powerful foundation models without retraining large backbones, making them practical for extreme HDLSS regimes.
Key insights
GOTabPFN enables TabPFN-style prediction on high-dimensional tabular data by ordering features and compressing local neighborhoods into meta-features.
Principles
- Feature ordering for tabular data can be framed as an NP-hard Minimum Linear Arrangement problem.
- Locality-aware feature ordering constructs coherent neighborhoods essential for structured compression.
- Intrinsic dimensionality (IDF) serves as an indicator for the potential benefits of feature ordering.
Method
GO-LR linearizes feature graphs using a nearest-neighbor TSP-path heuristic and local refinement. NSC then segments the ordered axis into contiguous subunits, compressing each into a scalar meta-feature via PCA, which is then passed to a frozen TabPFN-2.5 head.
In practice
- Apply GO-LR to reorder high-dimensional tabular features to expose underlying structure.
- Utilize NSC to compress ordered feature segments into compact meta-features for TabPFN-style models.
- Evaluate the Intrinsic Dimensionality Factor (IDF) to predict when feature ordering will yield significant gains.
Topics
- Tabular Foundation Models
- HDLSS Data
- Feature Ordering
- Dimensionality Reduction
- Neuro-Inspired Compression
- TabPFN
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.