Prototype-driven fusion of pathology and spatial transcriptomics for interpretable survival prediction

2026-02-16 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Computational Pathology · Depth: Expert, extended

Summary

PathoSpatial is an interpretable, end-to-end framework designed for prognostic modeling by integrating co-registered Whole Slide Images (WSIs) and Spatial Transcriptomics (ST) data. It employs a multi-level experts architecture with task-guided prototype learning, adaptively combining unsupervised within-modality discovery with supervised cross-modal aggregation. Evaluated on a triple-negative breast cancer cohort with paired ST and WSIs, PathoSpatial achieved strong and consistent performance across five survival endpoints: Distant Relapse-Free Survival (DRFS), Relapse-Free Survival (RFS), Invasive Breast Cancer-Free Survival (IBCFS), Invasive Disease-Free Survival (IDFS), and Overall Survival (OS). The framework demonstrated superior or comparable performance to existing unimodal and multimodal methods, while inherently enabling post-hoc prototype interpretation and molecular risk decomposition to provide quantitative, biologically grounded explanations and highlight candidate prognostic factors.

Key takeaway

For research scientists developing prognostic models in computational pathology, PathoSpatial offers a robust framework for integrating WSI and ST data. You should consider its adaptive, prototype-driven fusion strategy to improve both predictive accuracy and interpretability, especially for complex survival endpoints. The framework's ability to decompose risk into specific morphological and molecular patterns can guide the identification of novel prognostic factors and refine personalized treatment strategies.

Key insights

PathoSpatial fuses WSI and ST data via prototype learning for interpretable, spatially-informed survival prediction.

Principles

Cross-modal fusion enhances prognostic accuracy.
Prototype learning reduces noise and complexity.
Adaptive fusion outperforms static alignment.

Method

PathoSpatial uses modality-specific prototype experts and a cross-modal fusion expert within a hierarchical Mixture-of-Experts architecture, optimized with a composite loss function including a diversity penalty.

In practice

Use scGPT for ST spot embeddings.
Use UNI2 for histology patch embeddings.

Topics

Spatial Transcriptomics
Whole Slide Images
Multiple Instance Learning
Survival Prediction
Multimodal Fusion

Code references

liulihe954/PathoSpatial

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.