EQ-5D Classification Using Biomedical Entity-Enriched Pre-trained Language Models and Multiple Instance Learning

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Biomedical Natural Language Processing · Depth: Advanced, extended

Summary

A study investigates improving the automated detection of EQ-5D (EuroQol 5-Dimensions) health-related quality of life instrument mentions in scientific abstracts, crucial for systematic literature reviews. Researchers fine-tuned general-purpose (BERT) and domain-specific (SciBERT, BioBERT) pre-trained language models (PLMs), enhancing them with biomedical entity information extracted via scispaCy models. They conducted nine experimental setups, combining three scispaCy models with three PLMs, and evaluated performance at both sentence and study levels. Additionally, a Multiple Instance Learning (MIL) approach with attention pooling was explored to aggregate sentence-level data into study-level predictions. The findings show consistent F1-score improvements, reaching 0.82, and nearly perfect recall at the study-level, significantly outperforming classical bag-of-words and recently reported PLM baselines. This indicates that entity enrichment substantially improves domain adaptation and model generalization for accurate automated screening.

Key takeaway

For AI Scientists developing automated screening tools for systematic literature reviews, incorporating biomedical entity enrichment with domain-specific PLMs like BioBERT is critical. Your models will achieve superior F1-scores and near-perfect recall at the study-level, drastically reducing manual screening effort and improving consistency. Consider implementing a Multiple Instance Learning approach to effectively aggregate sentence-level insights into robust study-level predictions, ensuring high sensitivity in identifying relevant studies.

Key insights

Entity enrichment significantly enhances PLM performance for domain-specific text classification in biomedical literature.

Principles

Method

Fine-tuning PLMs with biomedical entity-enriched sentences, aggregated via Multiple Instance Learning with attention pooling, for study-level classification.

In practice

Topics

Code references

Best for: AI Scientist, AI Researcher, Research Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.