Prototype-driven fusion of pathology and spatial transcriptomics for interpretable survival prediction
Summary
PathoSpatial is an interpretable, end-to-end framework designed for prognostic modeling by integrating co-registered Whole Slide Images (WSIs) and Spatial Transcriptomics (ST) data. It employs a multi-level experts architecture with task-guided prototype learning, adaptively combining unsupervised within-modality discovery with supervised cross-modal aggregation. Evaluated on a triple-negative breast cancer cohort with paired ST and WSIs, PathoSpatial achieved strong and consistent performance across five survival endpoints: Distant Relapse-Free Survival (DRFS), Relapse-Free Survival (RFS), Invasive Breast Cancer-Free Survival (IBCFS), Invasive Disease-Free Survival (IDFS), and Overall Survival (OS). The framework demonstrated superior or comparable performance to existing unimodal and multimodal methods, while inherently enabling post-hoc prototype interpretation and molecular risk decomposition to provide quantitative, biologically grounded explanations and highlight candidate prognostic factors.
Key takeaway
For research scientists developing prognostic models in computational pathology, PathoSpatial offers a robust framework for integrating WSI and ST data. You should consider its adaptive, prototype-driven fusion strategy to improve both predictive accuracy and interpretability, especially for complex survival endpoints. The framework's ability to decompose risk into specific morphological and molecular patterns can guide the identification of novel prognostic factors and refine personalized treatment strategies.
Key insights
PathoSpatial fuses WSI and ST data via prototype learning for interpretable, spatially-informed survival prediction.
Principles
- Cross-modal fusion enhances prognostic accuracy.
- Prototype learning reduces noise and complexity.
- Adaptive fusion outperforms static alignment.
Method
PathoSpatial uses modality-specific prototype experts and a cross-modal fusion expert within a hierarchical Mixture-of-Experts architecture, optimized with a composite loss function including a diversity penalty.
In practice
- Use scGPT for ST spot embeddings.
- Use UNI2 for histology patch embeddings.
Topics
- Spatial Transcriptomics
- Whole Slide Images
- Multiple Instance Learning
- Survival Prediction
- Multimodal Fusion
Code references
Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.