When is 3D Worth It? A Resource-Performance Frontier for CNNs and Transformers in Lung CT

2026-06-05 · Source: Computer Vision and Pattern Recognition · Field: Science & Research — Health & Medical Research, Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A study investigated the practical value of three-dimensional (3D) models for volumetric medical imaging, specifically lung CT, by comparing input dimensionality (2D, 2.5D, 3D) across convolutional neural networks (CNNs) and Vision Transformers (ViTs). Using a leakage-free NLST cohort (n = 1,977) and LIDC-IDRI data, the research found that a 2.5D CNN offered the most favorable discrimination-stability trade-off, achieving an ROC-AUC of 0.682 (95% CI [0.546, 0.799]) with a stable operating point. In contrast, 3D CNNs exhibited threshold instability, and Transformers produced degenerate predictions, such as all-positive outputs. The authors present these findings as a controlled resource-performance frontier and a failure-mode taxonomy, noting wide and overlapping confidence intervals, rather than definitive superiority claims. For class-imbalanced lung cancer screening, 2D and 2.5D inputs provided a more reliable balance of performance, stability, and computational efficiency than full 3D representations.

Key takeaway

For AI Scientists and Research Scientists developing models for class-imbalanced lung cancer screening using volumetric CT, you should prioritize 2.5D CNN architectures. This approach offers a more reliable balance of performance, stability, and computational efficiency compared to full 3D models or Vision Transformers, which exhibited instability or degenerate predictions in this study. Focus on optimizing 2.5D methods to achieve robust clinical utility.

Key insights

Input dimensionality significantly impacts model performance and stability in volumetric medical imaging.

Principles

Practical value of 3D models depends on performance gains versus added computational cost.
Input dimensionality affects model behavior under fixed training protocols.
Wide confidence intervals mean results are frontiers, not definitive superiority claims.

Method

The study compared 2D, 2.5D, and 3D input dimensionality for CNNs and ViTs using a fixed training protocol on NLST (n=1,977) and LIDC-IDRI data to assess discrimination-stability.

In practice

Consider 2.5D CNNs for lung CT classification due to favorable trade-offs.
Be aware of threshold instability in 3D CNNs for this application.
Avoid Vision Transformers for lung cancer screening due to degenerate predictions.

Topics

Lung CT
Medical Imaging
Convolutional Neural Networks
Vision Transformers
Input Dimensionality
Resource-Performance Frontier
Lung Cancer Screening

Best for: Computer Vision Engineer, AI Scientist, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.