Algorithm Selection with Zero Domain Knowledge via Text Embeddings
Summary
ZeroFolio, a new feature-free algorithm selection approach, replaces traditional hand-crafted instance features with pretrained text embeddings. The method operates in three steps: reading the raw instance file as plain text, embedding it using a pretrained model, and selecting an algorithm via weighted k-nearest neighbors. This pipeline, which requires zero domain knowledge, was evaluated across 11 ASlib scenarios spanning 7 domains, including SAT, MaxSAT, QBF, ASP, CSP, MIP, and graph problems. ZeroFolio consistently outperformed a random forest trained on hand-crafted features in 10 of 11 scenarios with a single fixed configuration, and in all 11 with two-seed voting, often by a substantial margin. Key design choices identified through ablation include inverse-distance weighting, line shuffling, and Manhattan distance. Combining ZeroFolio with hand-crafted features via soft voting further improved performance when both selectors were competitive.
Key takeaway
For AI Engineers and NLP Engineers developing algorithm selection systems, ZeroFolio offers a compelling feature-free alternative to traditional methods. By leveraging pretrained text embeddings, you can bypass costly and domain-specific feature engineering, significantly reducing development time and improving generalizability across diverse problem domains. Consider integrating ZeroFolio's serialization, embedding, and k-NN selection pipeline, especially for text-based instance formats, to achieve competitive performance without extensive domain expertise.
Key insights
Pretrained text embeddings can effectively replace hand-crafted features for domain-agnostic algorithm selection.
Principles
- Pretrained embeddings distinguish problem instances without domain knowledge.
- Line shuffling improves robustness by exposing varied instance parts.
- Hybrid selection benefits from complementary predictive signals.
Method
ZeroFolio serializes raw instance files as text, embeds them with a pretrained model, and selects algorithms using weighted k-nearest neighbors, optionally with multi-seed voting for variance reduction.
In practice
- Use inverse-distance weighting for k-NN algorithm selection.
- Implement line shuffling for text serialization of instance files.
- Consider Manhattan distance for embedding similarity calculations.
Topics
- Algorithm Selection
- Text Embeddings
- Zero Domain Knowledge
- k-Nearest Neighbors
- ASlib Benchmarks
Best for: AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.