Prompt, Plan, Extract: Zero-Shot Agentic LLMs Workflows for Lung Pathology Extraction from Clinical Narratives
Summary
A recent study developed and evaluated a zero-shot, agentic workflow utilizing five open-source generative Large Language Models (LLMs) for extracting lung pathology information from clinical narratives. The research focused on populating 13 College of American Pathologists synoptic fields from lung resection pathology reports, a task traditionally requiring labor-intensive manual extraction or expensive supervised NLP pipelines. Comparing against a supervised GatorTron NER-RE baseline, which achieved a Micro-F1 of 0.960, the best zero-shot model, GPT-OSS-20B, demonstrated a Micro-F1 of 0.893 with a recall of 0.949. This model accurately extracted complex relations, such as Pathologic Stage, without requiring task-specific training. The findings suggest that open-source, zero-shot agentic LLMs offer a low-cost solution for this critical information extraction challenge, validated by a novel, registry-aligned evaluation framework.
Key takeaway
For NLP Engineers or AI Scientists working with clinical data extraction, consider integrating zero-shot agentic LLMs into your workflows. These models, like GPT-OSS-20B, offer a low-cost alternative to traditional supervised methods for tasks such as populating pathology report fields, achieving strong performance (Micro-F1 of 0.893) without extensive manual annotation. This approach can significantly reduce development time and resource expenditure, allowing you to deploy robust information extraction solutions more rapidly.
Key insights
Zero-shot agentic LLMs can accurately extract complex lung pathology data from clinical narratives without specific training.
Principles
- Zero-shot LLMs reduce annotation costs.
- Agentic workflows improve extraction accuracy.
- Open-source LLMs offer viable alternatives.
Method
The workflow involves prompting, planning, and extraction using agentic LLMs to populate 13 synoptic fields from pathology reports, evaluated against a supervised baseline.
In practice
- Implement zero-shot LLMs for clinical data extraction.
- Use GPT-OSS-20B for lung pathology reports.
- Develop registry-aligned evaluation frameworks.
Topics
- Zero-Shot LLMs
- Clinical NLP
- Information Extraction
- Lung Pathology
- Agentic Workflows
- Generative LLMs
Best for: AI Engineer, Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.