Data Bias Mitigation under Coverage Constraints & The Price of Fairness
Summary
A new bias mitigation framework addresses discriminatory outcomes and degraded performance in machine learning models, particularly for individuals at the intersection of multiple sensitive attributes like race and gender. This framework extends existing methods by incorporating coverage constraints to enforce sufficient representation across all groups, including intersectional subgroups, in training data. It strategically trades small approximation errors in bias for enhanced data efficiency, recognizing that achieving absolute zero bias can be data-intensive. The approach formulates bias mitigation as an integer linear program, optimizing strategies and quantifying the "price of fairness" as the minimum data modification cost relative to fairness tolerance. This is vital for legal compliance, where specific fairness thresholds are mandated, and for data governance, enabling informed decisions on bias reduction versus data modification costs. Evaluations on public datasets confirm the framework preserves predictive accuracy across various classifiers, highlighting the importance of coverage constraints for downstream ML performance.
Key takeaway
For Machine Learning Engineers designing fair models, you should integrate coverage constraints into your data preparation workflows to ensure adequate representation of intersectional subgroups. This approach allows you to make informed trade-offs between achieving specific fairness thresholds and managing data modification or purchasing costs, which is critical for both legal compliance and optimizing resource allocation. Quantifying the "price of fairness" helps you justify data investments for bias reduction.
Key insights
The framework balances bias reduction with data efficiency using coverage constraints and quantifies fairness costs.
Principles
- Intersectional bias requires specific measures.
- Data representation impacts ML fairness.
- Fairness has a quantifiable data cost.
Method
Extends a bias mitigation framework with coverage constraints, then formulates bias mitigation as an integer linear program to optimize strategies and characterize the "price of fairness."
In practice
- Use coverage constraints for subgroup representation.
- Quantify fairness costs for data purchasing.
- Balance bias reduction with data efficiency.
Topics
- Data Bias Mitigation
- Intersectional Fairness
- Coverage Constraints
- Integer Linear Programming
- Data Governance
- Fairness Trade-offs
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.