Language Models as Interfaces, Not Oracles: A Hybrid LLM-ML System for Pediatric Appendicitis

2026-06-17 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, AI in Healthcare · Depth: Advanced, quick

Summary

ClaMPAPP (Clinical Language-assisted Machine-learning Pipeline for Appendicitis) is a novel hybrid system designed for pediatric appendicitis diagnosis, addressing the limitations of direct LLM use in clinical decision support. It employs a large language model (LLM) as an interface to extract schema-constrained clinical features from free-text narratives, which then undergo deterministic plausibility checks. Validated features are subsequently passed to an XGBoost classifier, trained on clinical, laboratory, and ultrasound variables, for stable risk prediction. Evaluated on two independent pediatric appendicitis cohorts from German hospitals, ClaMPAPP demonstrated superior overall diagnostic performance compared to end-to-end LLM baselines, minimizing missed appendicitis cases and showing robustness to narrative reordering. This architecture supports an LLM-as-interface, ML-as-predictor design, enhancing auditable pathways for clinical decision support.

Key takeaway

For AI Architects designing clinical decision support systems, you should prioritize hybrid LLM-ML architectures like ClaMPAPP. This approach leverages LLMs for accessible free-text interpretation while relying on structured machine learning for stable, auditable diagnostic predictions. Implementing this design minimizes critical safety concerns, such as missed appendicitis cases, by separating natural-language usability from the core predictive inference.

Key insights

Hybrid LLM-ML systems improve clinical decision support by using LLMs for interface tasks and ML for stable prediction.

Principles

LLMs as direct diagnostic engines are unstable.
Hybrid LLM-ML separates usability from inference.
Structured ML offers stable risk prediction.

Method

ClaMPAPP extracts schema-constrained features from free-text, applies deterministic plausibility checks, then feeds validated features to an XGBoost classifier.

In practice

Integrate LLMs for free-text interpretation.
Use XGBoost for stable risk prediction.

Topics

Language Models
Clinical Decision Support
Pediatric Appendicitis
XGBoost Classifier
Hybrid AI Systems
Feature Extraction

Best for: NLP Engineer, AI Scientist, Research Scientist, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.