Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data
Summary
A study on Automated Feature Preprocessing (Auto-FP) for tabular data investigates how to automate the crucial step of transforming features for classical machine learning models. The research models Auto-FP as either a hyperparameter optimization (HPO) or a neural architecture search (NAS) problem, enabling the extension of various HPO and NAS algorithms. A comprehensive evaluation of 15 algorithms was conducted across 45 public ML datasets. The findings indicate that evolution-based algorithms generally achieve the best average ranking. Surprisingly, random search emerged as a strong baseline, outperforming many surrogate-model-based and bandit-based search algorithms that typically perform well in HPO and NAS contexts. The study also analyzes the reasons for these observations, identifies bottlenecks, and explores extending Auto-FP to support parameter search, concluding with an evaluation within an AutoML context and a discussion of current AutoML tool limitations.
Key takeaway
For AI Engineers and Research Scientists developing machine learning pipelines, this study highlights that automating feature preprocessing is achievable by adapting HPO or NAS techniques. You should prioritize evaluating evolution-based algorithms for optimal performance in Auto-FP tasks. Furthermore, do not underestimate random search, as it proves to be a surprisingly strong baseline that can outperform more complex methods in this domain, potentially simplifying initial experimentation and model development.
Key insights
Automating feature preprocessing for tabular data can be framed as HPO or NAS, with evolution-based algorithms performing best.
Principles
- Feature preprocessing is critical for classical ML models.
- Random search is a strong baseline for Auto-FP.
- Evolution-based algorithms lead in Auto-FP performance.
Method
Auto-FP is modeled as either a hyperparameter optimization (HPO) or a neural architecture search (NAS) problem, allowing existing algorithms from these fields to be adapted and evaluated.
In practice
- Consider evolution-based algorithms for Auto-FP.
- Use random search as a robust baseline.
- Evaluate Auto-FP within an AutoML framework.
Topics
- Automated Feature Preprocessing
- Tabular Data
- Hyperparameter Optimization
- Neural Architecture Search
- Evolution-based Algorithms
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.