When is 3D Worth It? A Resource-Performance Frontier for CNNs and Transformers in Lung CT
Summary
A study investigated the practical value of three-dimensional (3D) models for volumetric medical imaging, specifically lung CT, by comparing input dimensionality (2D, 2.5D, 3D) across convolutional neural networks (CNNs) and Vision Transformers (ViTs). Using a leakage-free NLST cohort (n = 1,977) and LIDC-IDRI data, the research found that a 2.5D CNN offered the most favorable discrimination-stability trade-off, achieving an ROC-AUC of 0.682 (95% CI [0.546, 0.799]) with a stable operating point. In contrast, 3D CNNs exhibited threshold instability, and Transformers produced degenerate predictions, such as all-positive outputs. The authors present these findings as a controlled resource-performance frontier and a failure-mode taxonomy, noting wide and overlapping confidence intervals, rather than definitive superiority claims. For class-imbalanced lung cancer screening, 2D and 2.5D inputs provided a more reliable balance of performance, stability, and computational efficiency than full 3D representations.
Key takeaway
For AI Scientists and Research Scientists developing models for class-imbalanced lung cancer screening using volumetric CT, you should prioritize 2.5D CNN architectures. This approach offers a more reliable balance of performance, stability, and computational efficiency compared to full 3D models or Vision Transformers, which exhibited instability or degenerate predictions in this study. Focus on optimizing 2.5D methods to achieve robust clinical utility.
Key insights
Input dimensionality significantly impacts model performance and stability in volumetric medical imaging.
Principles
- Practical value of 3D models depends on performance gains versus added computational cost.
- Input dimensionality affects model behavior under fixed training protocols.
- Wide confidence intervals mean results are frontiers, not definitive superiority claims.
Method
The study compared 2D, 2.5D, and 3D input dimensionality for CNNs and ViTs using a fixed training protocol on NLST (n=1,977) and LIDC-IDRI data to assess discrimination-stability.
In practice
- Consider 2.5D CNNs for lung CT classification due to favorable trade-offs.
- Be aware of threshold instability in 3D CNNs for this application.
- Avoid Vision Transformers for lung cancer screening due to degenerate predictions.
Topics
- Lung CT
- Medical Imaging
- Convolutional Neural Networks
- Vision Transformers
- Input Dimensionality
- Resource-Performance Frontier
- Lung Cancer Screening
Best for: Computer Vision Engineer, AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.