MADE: A Living Benchmark for Multi-Label Text Classification with Uncertainty Quantification of Medical Device Adverse Events

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Medical Devices & Health Technology · Depth: Expert, quick

Summary

MADE is a new, continuously updated multi-label text classification (MLTC) benchmark designed for medical device adverse event reports, addressing challenges like label imbalances, dependencies, and combinatorial complexity in high-stakes healthcare AI. It features a long-tailed distribution of hierarchical labels and uses strict temporal splits to prevent training data contamination and ensure reproducible evaluation. The benchmark establishes baselines across over 20 encoder- and decoder-only models, including fine-tuned and few-shot instruction-tuned/reasoning variants. It also systematically assesses entropy-based, consistency-based, and self-verbalized uncertainty quantification (UQ) methods. Key findings indicate that smaller discriminatively fine-tuned decoders offer strong accuracy and competitive UQ, while generative fine-tuning provides the most reliable UQ.

Key takeaway

For NLP Engineers developing MLTC systems in high-stakes domains like healthcare, you should consider MADE for benchmarking. Its continuous updates and temporal splits offer a robust evaluation environment, helping you distinguish genuine model capabilities from memorization. Prioritize smaller discriminatively fine-tuned decoders for strong accuracy with competitive UQ, or generative fine-tuning if reliable UQ is your primary concern, especially for rare labels.

Key insights

MADE is a living MLTC benchmark for medical device adverse events, emphasizing reliable uncertainty quantification.

Principles

Method

MADE establishes baselines for over 20 models (encoder/decoder, fine-tuned/few-shot) and systematically assesses entropy-, consistency-, and self-verbalized UQ methods.

In practice

Topics

Best for: NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.