Bias and Fairness Testing
Algorithmic bias occurs when an ML model produces systematically different outcomes for different groups, particularly along protected characteristics such as race, gender, age, or disability status. Fairness testing measures these disparities and determines whether they exceed acceptable thresholds.
No single definition of fairness satisfies all contexts. Practitioners must choose fairness criteria that align with the deployment context, legal requirements, and stakeholder values. Many fairness metrics are mathematically incompatible (the impossibility theorem), so tradeoffs are inevitable.
Fairness Metrics
| Metric | Definition | Formula | When to Use |
|---|---|---|---|
| Demographic Parity | The probability of a positive prediction is equal across groups. Also called statistical parity or independence. | P(Y_hat=1|A=a) = P(Y_hat=1|A=b) | When equal selection rates are legally or ethically required regardless of base rates. |
| Equalized Odds | True positive rate and false positive rate are equal across groups. Also called separation. | P(Y_hat=1|Y=y,A=a) = P(Y_hat=1|Y=y,A=b) for y in {0,1} | When errors should be distributed equally across groups. Common in criminal justice, hiring. |
| Equal Opportunity | True positive rate is equal across groups. A relaxation of equalized odds focusing only on positive outcomes. | P(Y_hat=1|Y=1,A=a) = P(Y_hat=1|Y=1,A=b) | When it is most important that qualified individuals are treated equally. |
| Predictive Parity | Positive predictive value (precision) is equal across groups. Also called sufficiency. | P(Y=1|Y_hat=1,A=a) = P(Y=1|Y_hat=1,A=b) | When a positive prediction should mean the same thing regardless of group. |
| Calibration | Predicted probabilities correspond to actual outcome rates within each group. A model predicting 70% should be correct 70% of the time for all groups. | P(Y=1|S=s,A=a) = P(Y=1|S=s,A=b) for all scores s | When risk scores are used for decision-making (e.g., lending, recidivism). |
| Counterfactual Fairness | The prediction would remain the same if the individual had belonged to a different group, all else being equal. Requires a causal model. | Y_hat_A=a(U) = Y_hat_A=b(U) | When individual-level fairness is required and a causal model is available. |
Intersectional Analysis
Intersectional analysis evaluates model performance across combinations of protected attributes (e.g., Black women, elderly Asian men) rather than each attribute in isolation. This is critical because:
- A model may appear fair on gender and race separately but show significant bias at the intersection (e.g., fair for men overall, fair for Black people overall, but unfair for Black women specifically).
- Intersectional subgroups are often smaller, leading to higher variance in performance estimates. Use bootstrapping or confidence intervals.
- The number of intersectional groups grows combinatorially. Prioritize groups most likely to be harmed based on domain knowledge.
- Report sample sizes for each subgroup so readers can assess statistical reliability.
Fairness Testing Tools
| Tool | Developer | Language | Key Features | License |
|---|---|---|---|---|
| AI Fairness 360 (AIF360) | IBM Research | Python | 70+ fairness metrics, 10+ bias mitigation algorithms (pre-processing, in-processing, post-processing), interactive web demo. | Apache 2.0 |
| Fairlearn | Microsoft | Python | Metrics dashboard, constraint-based mitigation (ExponentiatedGradient, ThresholdOptimizer), scikit-learn compatible API. | MIT |
| Aequitas | University of Chicago DSAPP | Python | Audit tool focused on policy context. Generates bias audit reports with group-level metrics. Web-based interface. | MIT |
| What-If Tool | Google PAIR | Python / JS | Interactive visual exploration of ML model behavior. Fairness analysis, counterfactual exploration, partial dependence plots. Integrates with TensorBoard. | Apache 2.0 |
| Responsible AI Toolbox | Microsoft | Python | Unified dashboard combining error analysis, fairness assessment, model interpretability, and counterfactual analysis. | MIT |
Testing Methodology
- Define protected attributes relevant to the deployment context (e.g., gender, race, age, disability). Consider legal requirements in the jurisdiction.
- Select fairness metrics aligned with the harm model. Allocation harms (who gets what) favor demographic parity. Quality-of-service harms (who gets accurate results) favor equalized odds.
- Compute disaggregated metrics for each group and intersectional subgroup. Report sample sizes alongside metrics.
- Set disparity thresholds for acceptable differences (e.g., 80% rule from US EEOC, or a maximum 5% difference in TPR between groups).
- Apply mitigation if thresholds are exceeded: pre-processing (resampling, reweighting), in-processing (adversarial debiasing, constrained optimization), or post-processing (threshold adjustment, reject-option classification).
- Document results in the model card, including metrics before and after mitigation, chosen thresholds, and tradeoffs made.
Next: AI red teaming covers adversarial testing techniques for language models and generative AI.