Language Models as Interfaces, Not Oracles: A Hybrid LLM-ML System for Pediatric Appendicitis
Summary
ClaMPAPP (Clinical Language-assisted Machine-learning Pipeline for Appendicitis) is a novel hybrid system designed for pediatric appendicitis diagnosis, addressing the limitations of direct LLM use in clinical decision support. It employs a large language model (LLM) as an interface to extract schema-constrained clinical features from free-text narratives, which then undergo deterministic plausibility checks. Validated features are subsequently passed to an XGBoost classifier, trained on clinical, laboratory, and ultrasound variables, for stable risk prediction. Evaluated on two independent pediatric appendicitis cohorts from German hospitals, ClaMPAPP demonstrated superior overall diagnostic performance compared to end-to-end LLM baselines, minimizing missed appendicitis cases and showing robustness to narrative reordering. This architecture supports an LLM-as-interface, ML-as-predictor design, enhancing auditable pathways for clinical decision support.
Key takeaway
For AI Architects designing clinical decision support systems, you should prioritize hybrid LLM-ML architectures like ClaMPAPP. This approach leverages LLMs for accessible free-text interpretation while relying on structured machine learning for stable, auditable diagnostic predictions. Implementing this design minimizes critical safety concerns, such as missed appendicitis cases, by separating natural-language usability from the core predictive inference.
Key insights
Hybrid LLM-ML systems improve clinical decision support by using LLMs for interface tasks and ML for stable prediction.
Principles
- LLMs as direct diagnostic engines are unstable.
- Hybrid LLM-ML separates usability from inference.
- Structured ML offers stable risk prediction.
Method
ClaMPAPP extracts schema-constrained features from free-text, applies deterministic plausibility checks, then feeds validated features to an XGBoost classifier.
In practice
- Integrate LLMs for free-text interpretation.
- Use XGBoost for stable risk prediction.
Topics
- Language Models
- Clinical Decision Support
- Pediatric Appendicitis
- XGBoost Classifier
- Hybrid AI Systems
- Feature Extraction
Best for: NLP Engineer, AI Scientist, Research Scientist, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.