GOTabPFN: From Feature Ordering to Compact Tokenization for Tabular Foundation Models on High-Dimensional Data

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

GOTabPFN is a novel framework designed to make small tabular foundation models effective for High-Dimensional, Low-Sample Size (HDLSS) tabular prediction without retraining large backbones. It introduces Graph-guided Ordering with Local Refinement (GO-LR), which reorders features by solving a Minimum Linear Arrangement (MinLA) problem, proven to be NP-hard. This ordering is combined with Neuro-Inspired Subunit Compression (NSC), a mechanism that pools locally adjacent ordered features into compact meta-features. GOTabPFN enables TabPFN-style prediction in HDLSS regimes, such as gene expression data with m>>2,000 features, demonstrating improved stability and accuracy. Across 8 HDLSS and 8 cross-domain benchmarks, GOTabPFN consistently outperforms 55 baselines, achieving an average rank of 1.00.

Key takeaway

For AI Scientists and Machine Learning Engineers working with High-Dimensional, Low-Sample Size tabular data, GOTabPFN offers a robust solution to extend TabPFN-style models. You should consider integrating its GO-LR ordering and NSC compression front-end to manage high feature counts, improve predictive accuracy, and enhance model stability. This approach allows you to leverage powerful foundation models without retraining large backbones, making them practical for extreme HDLSS regimes.

Key insights

GOTabPFN enables TabPFN-style prediction on high-dimensional tabular data by ordering features and compressing local neighborhoods into meta-features.

Principles

Method

GO-LR linearizes feature graphs using a nearest-neighbor TSP-path heuristic and local refinement. NSC then segments the ordered axis into contiguous subunits, compressing each into a scalar meta-feature via PCA, which is then passed to a frozen TabPFN-2.5 head.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.