RASC+: Retrieval-Constrained LLM Adjudication for Clinical Value Set Authoring
Summary
The RASC+ system addresses the challenge of authoring clinical value sets, which define standardized terminology codes for quality measurement and clinical decision support. Traditional zero-shot large language model (LLM) generation performs poorly due to the vast, version-controlled nature of clinical code systems. RASC+ proposes a stage-wise alternative, optimizing candidate-pool construction for recall and using a constrained LLM for adjudication. On the full 3,744-value-set RASC test split, Qwen3-based retrieval, enhanced with vocabulary-aware expansion and code-display rescue, increased candidate-pool recall from 0.553 to 0.730. Replacing the original SAPBert cross-encoder with blinded GPT-5 adjudication over this expanded pool significantly boosted full-test macro F1 from 0.287 to 0.549, demonstrating substantial improvement while maintaining safety constraints.
Key takeaway
For AI Scientists developing clinical value sets or similar knowledge base construction, you should consider adopting a retrieval-constrained LLM adjudication framework. This approach, demonstrated by RASC+, significantly improves accuracy and ensures all generated codes originate from an auditable candidate pool, crucial for safety and compliance. Prioritize high-recall retrieval combined with a powerful, constrained LLM adjudicator like GPT-5 to enhance your system's performance and reliability.
Key insights
Retrieval-constrained LLM adjudication significantly improves clinical value set authoring by combining high-recall candidate generation with precise selection.
Principles
- Direct LLM memorization is insufficient for large, version-controlled clinical code systems.
- Stage-wise approaches can enhance LLM performance on specialized, constrained tasks.
- Safety constraints necessitate auditable candidate pools for clinical code generation.
Method
RASC+ employs Qwen3-based retrieval with vocabulary-aware expansion and code-display rescue to build a high-recall candidate pool, followed by blinded GPT-5 adjudication for constrained candidate selection.
In practice
- Utilize Qwen3 for robust candidate pool generation in clinical code tasks.
- Implement GPT-5 for precise, constrained adjudication of clinical codes.
- Ensure all generated codes are traceable to an auditable source pool.
Topics
- Clinical Value Sets
- Large Language Models
- Retrieval-Augmented Generation
- Clinical NLP
- GPT-5
- Qwen3
Best for: NLP Engineer, AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.