When is 3D Worth It? A Resource-Performance Frontier for CNNs and Transformers in Lung CT

· Source: Computer Vision and Pattern Recognition · Field: Science & Research — Health & Medical Research, Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A study investigated the practical value of three-dimensional (3D) models for volumetric medical imaging, specifically lung CT, by comparing input dimensionality (2D, 2.5D, 3D) across convolutional neural networks (CNNs) and Vision Transformers (ViTs). Using a leakage-free NLST cohort (n = 1,977) and LIDC-IDRI data, the research found that a 2.5D CNN offered the most favorable discrimination-stability trade-off, achieving an ROC-AUC of 0.682 (95% CI [0.546, 0.799]) with a stable operating point. In contrast, 3D CNNs exhibited threshold instability, and Transformers produced degenerate predictions, such as all-positive outputs. The authors present these findings as a controlled resource-performance frontier and a failure-mode taxonomy, noting wide and overlapping confidence intervals, rather than definitive superiority claims. For class-imbalanced lung cancer screening, 2D and 2.5D inputs provided a more reliable balance of performance, stability, and computational efficiency than full 3D representations.

Key takeaway

For AI Scientists and Research Scientists developing models for class-imbalanced lung cancer screening using volumetric CT, you should prioritize 2.5D CNN architectures. This approach offers a more reliable balance of performance, stability, and computational efficiency compared to full 3D models or Vision Transformers, which exhibited instability or degenerate predictions in this study. Focus on optimizing 2.5D methods to achieve robust clinical utility.

Key insights

Input dimensionality significantly impacts model performance and stability in volumetric medical imaging.

Principles

Method

The study compared 2D, 2.5D, and 3D input dimensionality for CNNs and ViTs using a fixed training protocol on NLST (n=1,977) and LIDC-IDRI data to assess discrimination-stability.

In practice

Topics

Best for: Computer Vision Engineer, AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.