RecallRisk-BERT: A Multi-Task Framework for Post-Report Medical Device Recall Triage

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Medical Devices & Health Technology · Depth: Advanced, quick

Summary

RecallRisk-BERT is a multi-task framework designed to automate post-report medical device recall triage, severity assessment, and root-cause interpretation. Addressing the challenge of increasing FDA recall records, this framework utilizes 54,165 FDA medical device recall records spanning from 2002 to October 2025. The system integrates PubMedBERT-based textual representations of recall narratives with embedding-based structured features like product code and medical specialty. RecallRisk-BERT simultaneously predicts recall severity (Class I/II/III) and one of nine consolidated root-cause categories. In single-task severity prediction, a LightGBM-based text-tabular configuration achieved an accuracy of 0.963, macro-F1 of 0.856, and ROC-AUC of 0.974. The multi-task RecallRisk-BERT significantly surpassed the single-task PubMedBERT baseline, demonstrating strong consistency between model-derived risk rankings and observed root-cause severity patterns (rho = 0.983, p = 1.936e-6).

Key takeaway

For Machine Learning Engineers developing regulatory compliance systems, you should consider multi-task text-tabular models for improved recall triage. This approach, exemplified by RecallRisk-BERT, significantly enhances the simultaneous prediction of recall severity and root-cause categories. You can leverage combined textual and structured data to achieve high accuracy and F1-scores, streamlining post-report analysis. Implement similar BERT-based architectures when joint classification of related attributes is critical for decision support.

Key insights

Multi-task text-tabular models effectively triage medical device recalls by jointly predicting severity and root-cause.

Principles

Method

RecallRisk-BERT combines PubMedBERT for recall narratives with embeddings for structured features to simultaneously predict recall severity (3 classes) and root-cause (9 classes).

In practice

Topics

Best for: NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.