Retrieve, Then Classify: Corpus-Grounded Automation of Clinical Value Set Authoring
Summary
Clinical value set authoring, which involves identifying all standardized vocabulary codes for a clinical concept, is a significant bottleneck in clinical quality measurement and phenotyping. A new approach, Retrieval-Augmented Set Completion (RASC), addresses this by retrieving the K most similar existing value sets from a curated corpus to create a candidate pool, then applying a classifier to each candidate code. This method reduces the effective output space from the full vocabulary to a smaller retrieved pool. RASC was evaluated on 11,803 publicly available VSAC value sets, establishing the first large-scale benchmark for this task. A cross-encoder fine-tuned on SAPBert achieved an AUROC of 0.852 and a value-set-level F1 of 0.298, outperforming a simpler three-layer Multilayer Perceptron (AUROC 0.799, F1 0.250). Both classifiers reduced irrelevant candidates per true positive from 12.3 (retrieval-only) to approximately 3.2 and 4.4, respectively. Zero-shot GPT-4o achieved an F1 of 0.105, with 48.6% of its codes absent from VSAC.
Key takeaway
For clinical informaticists and AI scientists developing automated phenotyping or quality measurement systems, RASC offers a robust alternative to direct LLM prompting for value set authoring. You should consider implementing RASC's retrieve-and-select paradigm to improve accuracy and reduce irrelevant code generation, especially when dealing with large, dynamic clinical vocabularies like VSAC. This approach significantly outperforms zero-shot LLMs and simpler classifiers, providing a more reliable foundation for clinical applications.
Key insights
Retrieval-Augmented Set Completion (RASC) significantly improves clinical value set authoring by combining retrieval with classification.
Principles
- Shrink output space for complex vocabularies.
- LLMs struggle with large, version-controlled clinical vocabularies.
Method
RASC retrieves K similar value sets, forms a candidate pool, then classifies each candidate code to complete the set.
In practice
- Use RASC for clinical value set generation.
- Fine-tune cross-encoders like SAPBert for classification.
Topics
- Clinical Value Set Authoring
- Retrieval-Augmented Set Completion
- SAPBert
- VSAC Value Sets
- Large Language Models
Code references
Best for: NLP Engineer, Machine Learning Engineer, AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.