EQ-5D Classification Using Biomedical Entity-Enriched Pre-trained Language Models and Multiple Instance Learning
Summary
A study investigates improving the automated detection of EQ-5D (EuroQol 5-Dimensions) health-related quality of life instrument mentions in scientific abstracts, crucial for systematic literature reviews. Researchers fine-tuned general-purpose (BERT) and domain-specific (SciBERT, BioBERT) pre-trained language models (PLMs), enhancing them with biomedical entity information extracted via scispaCy models. They conducted nine experimental setups, combining three scispaCy models with three PLMs, and evaluated performance at both sentence and study levels. Additionally, a Multiple Instance Learning (MIL) approach with attention pooling was explored to aggregate sentence-level data into study-level predictions. The findings show consistent F1-score improvements, reaching 0.82, and nearly perfect recall at the study-level, significantly outperforming classical bag-of-words and recently reported PLM baselines. This indicates that entity enrichment substantially improves domain adaptation and model generalization for accurate automated screening.
Key takeaway
For AI Scientists developing automated screening tools for systematic literature reviews, incorporating biomedical entity enrichment with domain-specific PLMs like BioBERT is critical. Your models will achieve superior F1-scores and near-perfect recall at the study-level, drastically reducing manual screening effort and improving consistency. Consider implementing a Multiple Instance Learning approach to effectively aggregate sentence-level insights into robust study-level predictions, ensuring high sensitivity in identifying relevant studies.
Key insights
Entity enrichment significantly enhances PLM performance for domain-specific text classification in biomedical literature.
Principles
- Domain-specific PLMs outperform general-purpose models in specialized tasks.
- Entity enrichment improves model generalization and domain adaptation.
Method
Fine-tuning PLMs with biomedical entity-enriched sentences, aggregated via Multiple Instance Learning with attention pooling, for study-level classification.
In practice
- Use scispaCy for biomedical entity extraction.
- Combine domain-specific PLMs like BioBERT with entity enrichment.
- Employ MIL with attention pooling for robust study-level predictions.
Topics
- Pre-trained Language Models
- Biomedical Entity Enrichment
- Multiple Instance Learning
- Systematic Literature Reviews
- EQ-5D Classification
Code references
Best for: AI Scientist, AI Researcher, Research Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.