3.3.2. Model Bias Detection with SageMaker Clarify
💡 First Principle: A model can amplify biases present in training data, producing predictions that systematically disadvantage certain groups. SageMaker Clarify detects these biases both before training (data-level) and after training (prediction-level), providing specific metrics and explanations. The exam tests both when to use Clarify and how to interpret its outputs.
Post-Training Bias Metrics (different from the pre-training metrics in 2.3.1):
| Metric | What It Measures | Concern If |
|---|---|---|
| Disparate Impact (DI) | Ratio of positive outcomes between groups | Far from 1.0 |
| Conditional Demographic Disparity (CDD) | Disparity conditioned on other attributes | Significant after controlling for legitimate factors |
| Counterfactual Fliptest | Whether changing a protected attribute changes the prediction | High flip rate |
SHAP Values (Shapley Additive Explanations): Clarify uses SHAP to explain individual predictions—which features contributed most and in which direction. This is critical for interpretability: you can tell a loan applicant not just that they were rejected, but which factors (income, credit history, employment length) drove the decision and by how much.
Partial Dependence Plots (PDPs): Show how a single feature's value affects predictions on average, across the entire dataset. Useful for understanding the learned relationship between a feature and the target.
⚠️ Exam Trap: Clarify's pre-training bias detection (data-level) and post-training bias detection (model-level) use different metrics. Pre-training uses CI and DPL (data distribution metrics). Post-training uses DI and CDD (prediction outcome metrics). A question about "bias in the training data" points to pre-training metrics. A question about "biased predictions" points to post-training metrics.
Reflection Question: A lending model has DPL of 0.0 (training data is perfectly balanced between demographic groups) but Disparate Impact of 0.6 (the model approves loans for Group A at 1.6× the rate of Group B). How is this possible, and what does it mean?