GOTabPFN: From Feature Ordering to Compact Tokenization for Tabular Foundation Models on High-Dimensional Data

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

GOTabPFN is a novel framework designed to make small tabular foundation models effective for High-Dimensional, Low-Sample Size (HDLSS) tabular prediction without retraining large backbones. It introduces Graph-guided Ordering with Local Refinement (GO-LR), which reorders features by solving a Minimum Linear Arrangement (MinLA) problem, proven to be NP-hard. This ordering is combined with Neuro-Inspired Subunit Compression (NSC), a mechanism that pools locally adjacent ordered features into compact meta-features. GOTabPFN enables TabPFN-style prediction in HDLSS regimes, such as gene expression data with m>>2,000 features, demonstrating improved stability and accuracy. Across 8 HDLSS and 8 cross-domain benchmarks, GOTabPFN consistently outperforms 55 baselines, achieving an average rank of 1.00.

Key takeaway

For AI Scientists and Machine Learning Engineers working with High-Dimensional, Low-Sample Size tabular data, GOTabPFN offers a robust solution to extend TabPFN-style models. You should consider integrating its GO-LR ordering and NSC compression front-end to manage high feature counts, improve predictive accuracy, and enhance model stability. This approach allows you to leverage powerful foundation models without retraining large backbones, making them practical for extreme HDLSS regimes.

Key insights

GOTabPFN enables TabPFN-style prediction on high-dimensional tabular data by ordering features and compressing local neighborhoods into meta-features.

Principles

Feature ordering for tabular data can be framed as an NP-hard Minimum Linear Arrangement problem.
Locality-aware feature ordering constructs coherent neighborhoods essential for structured compression.
Intrinsic dimensionality (IDF) serves as an indicator for the potential benefits of feature ordering.

Method

GO-LR linearizes feature graphs using a nearest-neighbor TSP-path heuristic and local refinement. NSC then segments the ordered axis into contiguous subunits, compressing each into a scalar meta-feature via PCA, which is then passed to a frozen TabPFN-2.5 head.

In practice

Apply GO-LR to reorder high-dimensional tabular features to expose underlying structure.
Utilize NSC to compress ordered feature segments into compact meta-features for TabPFN-style models.
Evaluate the Intrinsic Dimensionality Factor (IDF) to predict when feature ordering will yield significant gains.

Topics

Tabular Foundation Models
HDLSS Data
Feature Ordering
Dimensionality Reduction
Neuro-Inspired Compression
TabPFN

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.