LLM-Guided Evolution for Medical Decision Pipelines

2026-04-26 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, AI in Clinical Decision Support · Depth: Expert, extended

Summary

LLM-guided MAP-Elites evolution offers an inference-time method for optimizing medical decision pipelines, bypassing costly fine-tuning or manual prompt engineering. Researchers applied this approach across three distinct clinical tasks: urgency triage, interactive consultation, and medical image classification. In triage, evolved programs significantly improved Semigran accuracy from 77.3% to 87.1% and emergency recall from 0.60 to 0.97, while also enhancing MIMIC-ESI exact accuracy from 56.7% to 62.0% and reducing severe undertriage from 3.6% to 1.2%. For interactive consultation, evolved policies improved the accuracy–cost frontier across Llama-3, Qwen-3.5, and Gemma-4 models, achieving accuracy gains (e.g., Llama-3-8B by 3.1 percentage points) and substantial token usage reductions (e.g., Llama-3-8B by 89.6%). Furthermore, prompt-only evolution enhanced frozen MedGemma VLMs for PneumoniaMNIST classification, particularly at lower resolutions. Qualitative analysis revealed that these performance gains stem from interpretable program-level mechanisms, such as calibrated triage boundaries and targeted evidence acquisition, rather than mere prompt rewording.

Key takeaway

For Machine Learning Engineers adapting LLMs for clinical applications, you should consider LLM-guided MAP-Elites evolution as a powerful inference-time optimization method. This approach allows you to discover and refine decision strategies, improving accuracy and safety-relevant behaviors without costly model fine-tuning. Implement safety-weighted objectives and structured evaluation to ensure robust, interpretable gains in areas like triage or interactive consultation.

Key insights

LLM-guided evolution optimizes medical decision pipelines at inference-time, outperforming manual baselines through interpretable program-level changes.

Principles

Evolutionary search can improve clinically relevant operating points, not just aggregate accuracy.
Gains arise from interpretable program-level mechanisms, not superficial prompt edits.
Optimizing on small vignette sets can lead to overfitting decision boundaries.

Method

LLM-guided MAP-Elites optimization uses a frozen LLM (gpt-oss-120b) to mutate executable artifacts (programs, policies, prompts). Task-specific evaluators score candidates, updating an archive of high-performing, diverse solutions.

In practice

Use MAP-Elites to evolve decision logic, not just prompts, for LLM-based systems.
Implement safety-weighted fitness functions to prioritize critical clinical outcomes.
Incorporate structured batch composition and lineage blending for stable evaluation.

Topics

LLM-guided Evolution
Medical Decision Support
MAP-Elites
Clinical Triage
Interactive Consultation
Vision-Language Models

Code references

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.