LLM-Guided Evolution for Medical Decision Pipelines
Summary
LLM-guided MAP-Elites evolution offers an inference-time method for optimizing medical decision pipelines, bypassing costly fine-tuning or manual prompt engineering. Researchers applied this approach across three distinct clinical tasks: urgency triage, interactive consultation, and medical image classification. In triage, evolved programs significantly improved Semigran accuracy from 77.3% to 87.1% and emergency recall from 0.60 to 0.97, while also enhancing MIMIC-ESI exact accuracy from 56.7% to 62.0% and reducing severe undertriage from 3.6% to 1.2%. For interactive consultation, evolved policies improved the accuracy–cost frontier across Llama-3, Qwen-3.5, and Gemma-4 models, achieving accuracy gains (e.g., Llama-3-8B by 3.1 percentage points) and substantial token usage reductions (e.g., Llama-3-8B by 89.6%). Furthermore, prompt-only evolution enhanced frozen MedGemma VLMs for PneumoniaMNIST classification, particularly at lower resolutions. Qualitative analysis revealed that these performance gains stem from interpretable program-level mechanisms, such as calibrated triage boundaries and targeted evidence acquisition, rather than mere prompt rewording.
Key takeaway
For Machine Learning Engineers adapting LLMs for clinical applications, you should consider LLM-guided MAP-Elites evolution as a powerful inference-time optimization method. This approach allows you to discover and refine decision strategies, improving accuracy and safety-relevant behaviors without costly model fine-tuning. Implement safety-weighted objectives and structured evaluation to ensure robust, interpretable gains in areas like triage or interactive consultation.
Key insights
LLM-guided evolution optimizes medical decision pipelines at inference-time, outperforming manual baselines through interpretable program-level changes.
Principles
- Evolutionary search can improve clinically relevant operating points, not just aggregate accuracy.
- Gains arise from interpretable program-level mechanisms, not superficial prompt edits.
- Optimizing on small vignette sets can lead to overfitting decision boundaries.
Method
LLM-guided MAP-Elites optimization uses a frozen LLM (gpt-oss-120b) to mutate executable artifacts (programs, policies, prompts). Task-specific evaluators score candidates, updating an archive of high-performing, diverse solutions.
In practice
- Use MAP-Elites to evolve decision logic, not just prompts, for LLM-based systems.
- Implement safety-weighted fitness functions to prioritize critical clinical outcomes.
- Incorporate structured batch composition and lineage blending for stable evaluation.
Topics
- LLM-guided Evolution
- Medical Decision Support
- MAP-Elites
- Clinical Triage
- Interactive Consultation
- Vision-Language Models
Code references
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.