LLM-Based Data Generation and Clinical Skills Evaluation for Low-Resource French OSCEs
Summary
A new pipeline leverages Large Language Models (LLMs) to generate and evaluate French Objective Structured Clinical Examinations (OSCEs), addressing the scarcity of annotated transcripts and logistical constraints in medical training. This controlled pipeline produces synthetic doctor-patient interview transcripts, guided by scenario-specific evaluation criteria, simulating varied student skill levels through ideal and perturbed performances. The generated dialogues are then automatically silver-labeled using an LLM-assisted framework that allows for adjustable evaluation strictness. Benchmarking demonstrates that mid-size open-source models, specifically those with $\le$32B parameters, achieve approximately 90% accuracy on this synthetic data, comparable to GPT-4o. This finding suggests the feasibility of developing locally deployable, privacy-preserving evaluation systems for medical education in low-resource contexts.
Key takeaway
For medical educators and NLP engineers developing clinical training tools, this research indicates that LLMs can effectively simulate and evaluate OSCEs, even in low-resource settings. You should consider using mid-size open-source LLMs for generating synthetic training data and automating skill assessment, as they offer comparable accuracy to larger proprietary models while enhancing privacy and local deployability. This approach can significantly expand access to practice and feedback for medical students.
Key insights
LLMs can generate and evaluate synthetic medical interview data, enabling scalable clinical skills assessment.
Principles
- Synthetic data can overcome resource scarcity.
- LLMs can silver-label generated dialogues.
- Mid-size LLMs rival larger models for specific tasks.
Method
A controlled pipeline generates synthetic doctor-patient dialogues with varied skill levels, then an LLM-assisted framework silver-labels them based on scenario-specific criteria and adjustable strictness.
In practice
- Generate synthetic data for low-resource scenarios.
- Use LLMs for automated skill evaluation.
- Deploy mid-size LLMs for privacy-preserving systems.
Topics
- French OSCEs
- Large Language Models
- Clinical Skills Evaluation
- Synthetic Data Generation
- Medical Education
Best for: AI Scientist, Research Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.