Federated Survival Analysis in Healthcare: A Multi-Model Evaluation on Cross-Institutional Heterogeneous Breast Cancer Data

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

A systematic evaluation of federated survival analysis was conducted on the Fed-TCGA-BRCA dataset, a cross-institutional breast cancer cohort comprising 1,088 patients across six naturally heterogeneous centers. The study compared three survival models—Cox Proportional Hazards (CoxPH), DeepSurv, and Random Survival Forest (RSF)—across centralized, local, and federated training paradigms. For gradient-based models, FedAvg, FedProx, and FedAdam optimization strategies were assessed. Results indicate that federated learning consistently outperforms local training and often approaches, or for DeepSurv, even exceeds centralized performance. RSF demonstrated the best overall balance of discrimination, calibration, and robustness across diverse clients. The study also found that performance hinges on the diversity of client distributions, not merely their number, and that FedAvg and FedProx are more stable than FedAdam. These findings inform practical guidelines for model and training paradigm selection in privacy-constrained healthcare.

Key takeaway

For AI Scientists and Machine Learning Engineers developing survival models in privacy-sensitive healthcare, federated learning offers a robust alternative to centralized data aggregation. You should prioritize Random Survival Forest (RSF) for its superior balance of discrimination, calibration, and robustness across heterogeneous client data. When using gradient-based models like DeepSurv, opt for FedAvg or FedProx over FedAdam. Leverage the study's practical guidelines to align model and training paradigm choices with specific data characteristics, privacy needs, and computational resources.

Key insights

Federated learning enables robust survival modeling on private, heterogeneous healthcare data, with Random Survival Forest (RSF) offering the best balance of performance.

Principles

Method

A multi-model evaluation compared CoxPH, DeepSurv, and RSF across centralized, local, and federated training on the Fed-TCGA-BRCA dataset, assessing FedAvg, FedProx, and FedAdam.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.