LLM-Based Data Generation and Clinical Skills Evaluation for Low-Resource French OSCEs

2026-04-09 · Source: Computation and Language · Field: Health & Wellbeing — Medical Devices & Health Technology, Medical Education · Depth: Advanced, quick

Summary

A new pipeline leverages Large Language Models (LLMs) to generate and evaluate French Objective Structured Clinical Examinations (OSCEs), addressing the scarcity of annotated transcripts and logistical constraints in medical training. This controlled pipeline produces synthetic doctor-patient interview transcripts, guided by scenario-specific evaluation criteria, simulating varied student skill levels through ideal and perturbed performances. The generated dialogues are then automatically silver-labeled using an LLM-assisted framework that allows for adjustable evaluation strictness. Benchmarking demonstrates that mid-size open-source models, specifically those with $\le$32B parameters, achieve approximately 90% accuracy on this synthetic data, comparable to GPT-4o. This finding suggests the feasibility of developing locally deployable, privacy-preserving evaluation systems for medical education in low-resource contexts.

Key takeaway

For medical educators and NLP engineers developing clinical training tools, this research indicates that LLMs can effectively simulate and evaluate OSCEs, even in low-resource settings. You should consider using mid-size open-source LLMs for generating synthetic training data and automating skill assessment, as they offer comparable accuracy to larger proprietary models while enhancing privacy and local deployability. This approach can significantly expand access to practice and feedback for medical students.

Key insights

LLMs can generate and evaluate synthetic medical interview data, enabling scalable clinical skills assessment.

Principles

Synthetic data can overcome resource scarcity.
LLMs can silver-label generated dialogues.
Mid-size LLMs rival larger models for specific tasks.

Method

A controlled pipeline generates synthetic doctor-patient dialogues with varied skill levels, then an LLM-assisted framework silver-labels them based on scenario-specific criteria and adjustable strictness.

In practice

Generate synthetic data for low-resource scenarios.
Use LLMs for automated skill evaluation.
Deploy mid-size LLMs for privacy-preserving systems.

Topics

French OSCEs
Large Language Models
Clinical Skills Evaluation
Synthetic Data Generation
Medical Education

Best for: AI Scientist, Research Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.