Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

ReElicit is a Bayesian optimization framework designed for tuning system prompts in modern AI systems when feedback is limited to aggregate metrics, not per-example labels. It addresses the challenge of sample-constrained black-box optimization over discrete, variable-length text. The framework introduces "embedding by elicitation," where a Llama 3.3 70B Instruct LLM dynamically elicits a compact, interpretable feature space from task descriptions and prompt-score history. A Gaussian process surrogate then selects target feature vectors, which the LLM realizes and refines into deployable system prompts. Evaluated across ten system prompt optimization tasks with a 30-evaluation budget, ReElicit achieved the strongest aggregate performance profile among representative aggregate-only baselines, demonstrating LLMs' utility as adaptive semantic representation builders for natural-language artifact optimization.

Key takeaway

For Machine Learning Engineers optimizing system prompts with aggregate, costly feedback, ReElicit offers a robust approach. You should consider implementing dynamic feature elicitation with an LLM to create an adaptive search space, leveraging Bayesian optimization for sample-efficient exploration. This method improves performance consistency across tasks, especially when each prompt evaluation is expensive, by translating semantic targets into deployable prompts.

Key insights

ReElicit uses an LLM to dynamically create a semantic feature space for Bayesian optimization of system prompts with aggregate feedback.

Principles

Method

ReElicit defines features from prompt-score history, extracts prompt coordinates, fits a Gaussian process surrogate, selects target feature vectors, and refines them into deployable prompts using feature-gap feedback.

In practice

Topics

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Prompt Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.