Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts
Summary
ReElicit is a Bayesian optimization framework designed for tuning system prompts in modern AI systems when feedback is limited to aggregate metrics, not per-example labels. It addresses the challenge of sample-constrained black-box optimization over discrete, variable-length text. The framework introduces "embedding by elicitation," where a Llama 3.3 70B Instruct LLM dynamically elicits a compact, interpretable feature space from task descriptions and prompt-score history. A Gaussian process surrogate then selects target feature vectors, which the LLM realizes and refines into deployable system prompts. Evaluated across ten system prompt optimization tasks with a 30-evaluation budget, ReElicit achieved the strongest aggregate performance profile among representative aggregate-only baselines, demonstrating LLMs' utility as adaptive semantic representation builders for natural-language artifact optimization.
Key takeaway
For Machine Learning Engineers optimizing system prompts with aggregate, costly feedback, ReElicit offers a robust approach. You should consider implementing dynamic feature elicitation with an LLM to create an adaptive search space, leveraging Bayesian optimization for sample-efficient exploration. This method improves performance consistency across tasks, especially when each prompt evaluation is expensive, by translating semantic targets into deployable prompts.
Key insights
ReElicit uses an LLM to dynamically create a semantic feature space for Bayesian optimization of system prompts with aggregate feedback.
Principles
- LLMs can build adaptive semantic representations.
- Dynamic feature elicitation reduces representation error.
- Aggregate metrics benefit from uncertainty-aware search.
Method
ReElicit defines features from prompt-score history, extracts prompt coordinates, fits a Gaussian process surrogate, selects target feature vectors, and refines them into deployable prompts using feature-gap feedback.
In practice
- Apply LLMs for semantic space construction in BO.
- Use feature-gap refinement for prompt generation.
- Prioritize target-evaluation efficiency for costly metrics.
Topics
- Bayesian Optimization
- System Prompts
- Large Language Models
- Prompt Optimization
- Semantic Embeddings
- Aggregate Feedback
Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Prompt Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.