Federated Survival Analysis in Healthcare: A Multi-Model Evaluation on Cross-Institutional Heterogeneous Breast Cancer Data
Summary
A systematic evaluation of federated survival analysis was conducted on the Fed-TCGA-BRCA dataset, a cross-institutional breast cancer cohort comprising 1,088 patients across six naturally heterogeneous centers. The study compared three survival models—Cox Proportional Hazards (CoxPH), DeepSurv, and Random Survival Forest (RSF)—across centralized, local, and federated training paradigms. For gradient-based models, FedAvg, FedProx, and FedAdam optimization strategies were assessed. Results indicate that federated learning consistently outperforms local training and often approaches, or for DeepSurv, even exceeds centralized performance. RSF demonstrated the best overall balance of discrimination, calibration, and robustness across diverse clients. The study also found that performance hinges on the diversity of client distributions, not merely their number, and that FedAvg and FedProx are more stable than FedAdam. These findings inform practical guidelines for model and training paradigm selection in privacy-constrained healthcare.
Key takeaway
For AI Scientists and Machine Learning Engineers developing survival models in privacy-sensitive healthcare, federated learning offers a robust alternative to centralized data aggregation. You should prioritize Random Survival Forest (RSF) for its superior balance of discrimination, calibration, and robustness across heterogeneous client data. When using gradient-based models like DeepSurv, opt for FedAvg or FedProx over FedAdam. Leverage the study's practical guidelines to align model and training paradigm choices with specific data characteristics, privacy needs, and computational resources.
Key insights
Federated learning enables robust survival modeling on private, heterogeneous healthcare data, with Random Survival Forest (RSF) offering the best balance of performance.
Principles
- Federated learning consistently improves model generalization over local training.
- Model performance in FL is driven by client data diversity, not just client count.
- Ensemble methods like RSF provide superior robustness and calibration under data heterogeneity.
Method
A multi-model evaluation compared CoxPH, DeepSurv, and RSF across centralized, local, and federated training on the Fed-TCGA-BRCA dataset, assessing FedAvg, FedProx, and FedAdam.
In practice
- Deploy Random Survival Forest (RSF) for federated survival analysis with heterogeneous client data.
- Utilize FedAvg or FedProx for federated optimization of gradient-based survival models.
- Select CoxPH for interpretability in settings with homogeneous data distributions.
Topics
- Federated Learning
- Survival Analysis
- Breast Cancer
- Random Survival Forest
- DeepSurv
- Data Heterogeneity
- ML Privacy
Code references
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.