L2D-Clinical: Learning to Defer for Adaptive Model Selection in Clinical Text Classification

2026-04-14 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Health & Medical Research · Depth: Expert, quick

Summary

L2D-Clinical is a novel framework designed for clinical text classification that adaptively selects between specialized fine-tuned BERT models and general-purpose large language models (LLMs). This framework learns when a BERT classifier should defer to an LLM based on uncertainty signals and text characteristics, aiming to improve accuracy by leveraging the complementary strengths of both model types. Unlike previous Learning to Defer (L2D) methods that defer to human experts, L2D-Clinical defers to an LLM. Evaluated on two English clinical tasks, ADE detection (ADE Corpus V2) and treatment outcome classification (MIMIC-IV), L2D-Clinical demonstrated significant performance gains. On ADE, it achieved an F1-score of 0.928, a 1.7-point increase over BioBERT's 0.911, by deferring 7% of instances. On MIMIC, it reached an F1-score of 0.980, a 9.3-point increase over ClinicalBERT's 0.887, by deferring 16.8% of cases to GPT-5-nano.

Key takeaway

For AI Architects and Machine Learning Engineers building clinical text classification systems, L2D-Clinical offers a method to enhance model accuracy by intelligently combining BERT-based classifiers with LLMs. Your teams can achieve superior performance over single-model approaches, as demonstrated by F1-score improvements of up to 9.3 points on clinical tasks, while also minimizing LLM API costs through selective deferral. Consider integrating this adaptive deferral strategy to optimize both performance and operational efficiency in your deployments.

Key insights

Adaptive model selection in clinical text classification improves accuracy by deferring to LLMs based on uncertainty.

Principles

Combine specialized and general models.
Deferral decisions based on uncertainty signals.

Method

L2D-Clinical learns to defer from a BERT classifier to an LLM using uncertainty signals and text characteristics to selectively leverage LLM strengths.

In practice

Improve F1-score on ADE detection by 1.7 points.
Improve F1-score on MIMIC by 9.3 points.

Topics

Clinical Text Classification
Learning to Defer
Adaptive Model Selection
BERT Models
Large Language Models

Best for: AI Architect, AI Engineer, Machine Learning Engineer, AI Scientist, Research Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.