Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts

2026-05-21 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

ReElicit is a Bayesian optimization framework designed for tuning system prompts in modern AI systems when feedback is limited to aggregate metrics, not per-example labels. It addresses the challenge of sample-constrained black-box optimization over discrete, variable-length text. The framework introduces "embedding by elicitation," where a Llama 3.3 70B Instruct LLM dynamically elicits a compact, interpretable feature space from task descriptions and prompt-score history. A Gaussian process surrogate then selects target feature vectors, which the LLM realizes and refines into deployable system prompts. Evaluated across ten system prompt optimization tasks with a 30-evaluation budget, ReElicit achieved the strongest aggregate performance profile among representative aggregate-only baselines, demonstrating LLMs' utility as adaptive semantic representation builders for natural-language artifact optimization.

Key takeaway

For Machine Learning Engineers optimizing system prompts with aggregate, costly feedback, ReElicit offers a robust approach. You should consider implementing dynamic feature elicitation with an LLM to create an adaptive search space, leveraging Bayesian optimization for sample-efficient exploration. This method improves performance consistency across tasks, especially when each prompt evaluation is expensive, by translating semantic targets into deployable prompts.

Key insights

ReElicit uses an LLM to dynamically create a semantic feature space for Bayesian optimization of system prompts with aggregate feedback.

Principles

LLMs can build adaptive semantic representations.
Dynamic feature elicitation reduces representation error.
Aggregate metrics benefit from uncertainty-aware search.

Method

ReElicit defines features from prompt-score history, extracts prompt coordinates, fits a Gaussian process surrogate, selects target feature vectors, and refines them into deployable prompts using feature-gap feedback.

In practice

Apply LLMs for semantic space construction in BO.
Use feature-gap refinement for prompt generation.
Prioritize target-evaluation efficiency for costly metrics.

Topics

Bayesian Optimization
System Prompts
Large Language Models
Prompt Optimization
Semantic Embeddings
Aggregate Feedback

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Prompt Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.