Algorithm Selection with Zero Domain Knowledge via Text Embeddings

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, long

Summary

ZeroFolio, a new feature-free algorithm selection approach, replaces traditional hand-crafted instance features with pretrained text embeddings. The method operates in three steps: reading the raw instance file as plain text, embedding it using a pretrained model, and selecting an algorithm via weighted k-nearest neighbors. This pipeline, which requires zero domain knowledge, was evaluated across 11 ASlib scenarios spanning 7 domains, including SAT, MaxSAT, QBF, ASP, CSP, MIP, and graph problems. ZeroFolio consistently outperformed a random forest trained on hand-crafted features in 10 of 11 scenarios with a single fixed configuration, and in all 11 with two-seed voting, often by a substantial margin. Key design choices identified through ablation include inverse-distance weighting, line shuffling, and Manhattan distance. Combining ZeroFolio with hand-crafted features via soft voting further improved performance when both selectors were competitive.

Key takeaway

For AI Engineers and NLP Engineers developing algorithm selection systems, ZeroFolio offers a compelling feature-free alternative to traditional methods. By leveraging pretrained text embeddings, you can bypass costly and domain-specific feature engineering, significantly reducing development time and improving generalizability across diverse problem domains. Consider integrating ZeroFolio's serialization, embedding, and k-NN selection pipeline, especially for text-based instance formats, to achieve competitive performance without extensive domain expertise.

Key insights

Pretrained text embeddings can effectively replace hand-crafted features for domain-agnostic algorithm selection.

Principles

Method

ZeroFolio serializes raw instance files as text, embeds them with a pretrained model, and selects algorithms using weighted k-nearest neighbors, optionally with multi-seed voting for variance reduction.

In practice

Topics

Best for: AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.