Federated Survival Analysis in Healthcare: A Multi-Model Evaluation on Cross-Institutional Heterogeneous Breast Cancer Data

2026-06-24 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

A systematic evaluation of federated survival analysis was conducted on the Fed-TCGA-BRCA dataset, a cross-institutional breast cancer cohort comprising 1,088 patients across six naturally heterogeneous centers. The study compared three survival models—Cox Proportional Hazards (CoxPH), DeepSurv, and Random Survival Forest (RSF)—across centralized, local, and federated training paradigms. For gradient-based models, FedAvg, FedProx, and FedAdam optimization strategies were assessed. Results indicate that federated learning consistently outperforms local training and often approaches, or for DeepSurv, even exceeds centralized performance. RSF demonstrated the best overall balance of discrimination, calibration, and robustness across diverse clients. The study also found that performance hinges on the diversity of client distributions, not merely their number, and that FedAvg and FedProx are more stable than FedAdam. These findings inform practical guidelines for model and training paradigm selection in privacy-constrained healthcare.

Key takeaway

For AI Scientists and Machine Learning Engineers developing survival models in privacy-sensitive healthcare, federated learning offers a robust alternative to centralized data aggregation. You should prioritize Random Survival Forest (RSF) for its superior balance of discrimination, calibration, and robustness across heterogeneous client data. When using gradient-based models like DeepSurv, opt for FedAvg or FedProx over FedAdam. Leverage the study's practical guidelines to align model and training paradigm choices with specific data characteristics, privacy needs, and computational resources.

Key insights

Federated learning enables robust survival modeling on private, heterogeneous healthcare data, with Random Survival Forest (RSF) offering the best balance of performance.

Principles

Federated learning consistently improves model generalization over local training.
Model performance in FL is driven by client data diversity, not just client count.
Ensemble methods like RSF provide superior robustness and calibration under data heterogeneity.

Method

A multi-model evaluation compared CoxPH, DeepSurv, and RSF across centralized, local, and federated training on the Fed-TCGA-BRCA dataset, assessing FedAvg, FedProx, and FedAdam.

In practice

Deploy Random Survival Forest (RSF) for federated survival analysis with heterogeneous client data.
Utilize FedAvg or FedProx for federated optimization of gradient-based survival models.
Select CoxPH for interpretability in settings with homogeneous data distributions.

Topics

Federated Learning
Survival Analysis
Breast Cancer
Random Survival Forest
DeepSurv
Data Heterogeneity
ML Privacy

Code references

nataliamorenob/Survival-Models-in-Federated-Healthcare-Settings

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.