A Dataset of Brazilian Portuguese Clinical Notes for Anaphylaxis Detection
Summary
A new dataset of Brazilian Portuguese clinical notes has been developed to facilitate automatic detection of anaphylaxis using Natural Language Processing (NLP). This corpus, comprising 969 clinical narratives, was annotated by allergists for the presence or absence of anaphylaxis, adhering to established clinical diagnostic criteria. The notes were sourced from three distinct origins: clinician-authored synthetic scenarios, medical literature case reports rewritten by specialists, and de-identified notes from the SemClinBr corpus. Reflecting a realistic prevalence of approximately 5% positive cases, the dataset aims to support large-scale analysis of health records and retrospective clinical research. It is designed as a reusable resource for Portuguese clinical NLP, enabling future work in document classification, information extraction, and language modeling within the medical domain.
Key takeaway
For NLP Engineers developing clinical applications in Portuguese, this new anaphylaxis detection dataset offers a critical resource. You should integrate this corpus to train and evaluate models for document classification and information extraction, ensuring your systems reflect realistic clinical prevalence and diagnostic criteria. This can significantly advance the accuracy and utility of your medical NLP solutions.
Key insights
A new Brazilian Portuguese clinical note dataset enables NLP-driven anaphylaxis detection for research and health record analysis.
Principles
- High-quality labeled corpora are essential for clinical NLP.
- Realistic prevalence conditions enhance dataset utility.
Method
Clinical notes from synthetic scenarios, rewritten case reports, and de-identified SemClinBr data were annotated by allergists using established diagnostic criteria.
In practice
- Use for document classification of clinical notes.
- Apply to information extraction tasks in medical text.
- Support language modeling in the clinical domain.
Topics
- Anaphylaxis Detection
- Clinical NLP
- Brazilian Portuguese
- Medical Text Annotation
- Clinical Notes Dataset
Best for: NLP Engineer, AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.