ModernBERT is more efficient than conventional BERT for chest CT findings classification in Japanese radiology reports

· Source: Machine learning : nature.com subject feeds · Field: Science & Research — Health & Medical Research, Mathematics & Computational Sciences · Depth: Expert, medium

Summary

A study published in Scientific Reports on April 3, 2026, compared three Japanese language models—BERT Base, JMedRoBERTa, and ModernBERT—for multi-label classification of 18 chest CT findings from radiology reports. Researchers fine-tuned all models under identical conditions using the CT-RATE-JPN dataset. ModernBERT demonstrated superior efficiency, generating significantly fewer tokens and achieving faster training and inference times while maintaining comparable in-domain performance (74.7% exact match accuracy vs. 72.7% for BERT Base). However, when tested on an external, domain-shifted dataset called RR-Findings, ModernBERT showed the largest decline in exact match accuracy, with BERT Base outperforming both JMedRoBERTa and ModernBERT. Despite this, ModernBERT retained reasonable ranking ability, indicated by smaller average precision differences. The study highlights ModernBERT's computational advantages for in-domain tasks but underscores its sensitivity to linguistic variability in real-world clinical data.

Key takeaway

For NLP Engineers developing solutions for Japanese radiology reports, consider ModernBERT for its efficiency in token generation and faster training/inference. However, if your application involves diverse or domain-shifted real-world clinical data, you should prioritize extensive and varied training data or implement domain-specific calibration strategies to mitigate performance degradation and ensure robust deployment in heterogeneous clinical environments.

Key insights

ModernBERT offers computational efficiency for Japanese medical text classification but requires diverse data for robustness.

Principles

Method

Three Japanese language models (BERT Base, JMedRoBERTa, ModernBERT) were fine-tuned on the CT-RATE-JPN dataset for multi-label classification of 18 chest CT findings, then evaluated on an internal test set and an external, domain-shifted RR-Findings dataset.

In practice

Topics

Best for: NLP Engineer, AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.