Noise reduction in BERT NER models for clinical entity extraction

2026-03-03 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Natural Language Processing · Depth: Expert, extended

Summary

ConcertAI LLC researchers developed a Noise Removal (NR) model to enhance the precision of BERT-based Named Entity Recognition (NER) models for clinical entity extraction from unstructured clinical notes. While fine-tuned BERT models achieve high recall, they struggle with precision due to the SoftMax function's tendency to assign high confidence scores even to uncertain predictions, making simple thresholding ineffective. The NR model employs a supervised Decision Tree classifier that analyzes token-level entity tags and probability sequences from the NER model. It leverages advanced features, including a Probability Density Map (PDM) that captures the "Semantic-Pull" effect in Transformer embeddings, along with statistical features like inter-class probability differentials and sequence entropy. This post-processing approach significantly reduced False Positives (FPs) by 50% to 90% across various clinical NER models on both EMR and MIMIC-III datasets, with minimal True Positive (TP) degradation (below 6%).

Key takeaway

For research scientists developing clinical NLP pipelines, relying solely on BERT's SoftMax probabilities for filtering noisy NER predictions is insufficient. You should integrate a lightweight, explainable post-processing Noise Removal model, such as the Decision Tree-based approach described, to significantly enhance precision. This method allows you to reduce false positives by up to 90% with minimal impact on recall, crucial for high-fidelity clinical data extraction without complex model retraining.

Key insights

A post-processing Noise Removal model significantly boosts clinical NER precision by analyzing probability distributions and contextual features.

Principles

SoftMax confidence alone is unreliable for uncertainty estimation.
Contextual probability distributions reveal semantic coherence.
Decision Trees offer explainable noise classification.

Method

A supervised Decision Tree classifier uses Probability Density Maps (PDM) and statistical features derived from NER model outputs to classify predictions as "strong" or "weak," filtering out noisy (weak) entities without modifying the core NER architecture.

In practice

Implement a post-processing NR model to improve clinical NER precision.
Utilize Probability Density Maps to capture contextual semantic-pull.
Employ Decision Trees for explainable noise detection in NLP pipelines.

Topics

Clinical NER
Noise Reduction
Transformer Models
Uncertainty Estimation
Probability Density Map

Best for: Research Scientist, AI Researcher, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.