Algorithm Selection with Zero Domain Knowledge via Text Embeddings

2026-04-24 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, long

Summary

ZeroFolio, a new feature-free algorithm selection approach, replaces traditional hand-crafted instance features with pretrained text embeddings. The method operates in three steps: reading the raw instance file as plain text, embedding it using a pretrained model, and selecting an algorithm via weighted k-nearest neighbors. This pipeline, which requires zero domain knowledge, was evaluated across 11 ASlib scenarios spanning 7 domains, including SAT, MaxSAT, QBF, ASP, CSP, MIP, and graph problems. ZeroFolio consistently outperformed a random forest trained on hand-crafted features in 10 of 11 scenarios with a single fixed configuration, and in all 11 with two-seed voting, often by a substantial margin. Key design choices identified through ablation include inverse-distance weighting, line shuffling, and Manhattan distance. Combining ZeroFolio with hand-crafted features via soft voting further improved performance when both selectors were competitive.

Key takeaway

For AI Engineers and NLP Engineers developing algorithm selection systems, ZeroFolio offers a compelling feature-free alternative to traditional methods. By leveraging pretrained text embeddings, you can bypass costly and domain-specific feature engineering, significantly reducing development time and improving generalizability across diverse problem domains. Consider integrating ZeroFolio's serialization, embedding, and k-NN selection pipeline, especially for text-based instance formats, to achieve competitive performance without extensive domain expertise.

Key insights

Pretrained text embeddings can effectively replace hand-crafted features for domain-agnostic algorithm selection.

Principles

Pretrained embeddings distinguish problem instances without domain knowledge.
Line shuffling improves robustness by exposing varied instance parts.
Hybrid selection benefits from complementary predictive signals.

Method

ZeroFolio serializes raw instance files as text, embeds them with a pretrained model, and selects algorithms using weighted k-nearest neighbors, optionally with multi-seed voting for variance reduction.

In practice

Use inverse-distance weighting for k-NN algorithm selection.
Implement line shuffling for text serialization of instance files.
Consider Manhattan distance for embedding similarity calculations.

Topics

Algorithm Selection
Text Embeddings
Zero Domain Knowledge
k-Nearest Neighbors
ASlib Benchmarks

Best for: AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.