Who Trains Matters: Federated Learning under Enrollment and Participation Selection Biases

2026-04-29 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Federated learning (FL) often assumes contributing clients represent the target population, but this assumption can fail due to two distinct selection biases. Enrollment bias arises from eligibility rules like device constraints or user consent, determining which clients are reachable for training. Participation bias, more commonly studied, stems from factors such as battery state or network status, influencing which enrolled clients participate in each communication round. This research formalizes FL under a two-stage selection model and introduces \textsc{FedIPW}, an inverse-probability-weighted aggregation scheme designed to recover the target-population mean update. Additionally, it proposes a limited-information aggregate-calibration extension that uses known target-population summaries to reweight the enrolled sample, partially correcting enrollment bias when client-level covariates for non-enrolled clients are unavailable. The study includes an algorithm-agnostic optimization analysis demonstrating that incomplete selection correction can lead to a non-vanishing bias floor, validated by experiments on synthetic federated logistic regression.

Key takeaway

For research scientists developing federated learning systems, understanding and mitigating enrollment and participation biases is crucial. Your models' performance on the true target population can be significantly degraded by these biases, even with existing participation bias corrections. Consider integrating \textsc{FedIPW} or its aggregate-calibration extension to ensure your shared models accurately reflect the target population, especially when client enrollment is non-random.

Key insights

Selection biases in federated learning, particularly enrollment bias, can cause persistent mismatches between training and target-population objectives.

Principles

Client representativeness is critical in FL.
Incomplete bias correction leaves a bias floor.

Method

\textsc{FedIPW} uses inverse-probability weighting to correct for two-stage selection bias in federated learning, recovering the target-population mean update. An aggregate-calibration extension uses population summaries for partial correction.

In practice

Implement \textsc{FedIPW} for biased FL datasets.
Use aggregate calibration with limited client data.

Topics

Federated Learning
Selection Bias
Enrollment Bias
Participation Bias
FedIPW

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.