Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

5.1.1. Data Drift and Model Drift Detection

💡 First Principle: When the real world changes, the statistical patterns your model learned become stale. Data drift means the inputs have changed; model drift means the relationship between inputs and outputs has changed. Distinguishing between them determines whether you need to retrain the model, fix the data pipeline, or both.

Data drift and model drift are related but distinct problems, and the exam tests whether you know the difference:

Data drift (also called feature drift or covariate shift) occurs when the distribution of input features changes. For example, if your model was trained on data where average customer age was 35 and production traffic shifts to average age 55, that's data drift. The model hasn't degraded—the world moved away from the training data.

Model drift (also called concept drift) occurs when the relationship between features and the target variable changes. In fraud detection, new fraud techniques emerge that create patterns the model has never seen. The features may look the same, but what constitutes "fraudulent" has changed.

Prediction drift occurs when the distribution of model outputs changes, even if inputs appear stable. This often signals a subtler issue—perhaps a preprocessing bug is silently transforming data differently, or an upstream system changed its output format.

SageMaker Model Monitor detects drift by comparing production data against a baseline—a statistical profile of the training data. When it detects that feature distributions have shifted beyond a configurable threshold, it raises a CloudWatch alarm. SageMaker Clarify extends this to detect bias drift—whether the model's fairness metrics have degraded for specific demographic groups.

⚠️ Exam Trap: A question describing "model accuracy degradation over time" is testing drift detection (Model Monitor), not model debugging (SageMaker Debugger). Debugger is for training-time convergence issues. Model Monitor is for production-time drift. The exam frequently presents this distinction.

Reflection Question: Your e-commerce recommendation model's click-through rate has dropped 15% over two months. How would you determine whether this is caused by data drift, concept drift, or a data pipeline issue? Which specific SageMaker tools would you use?

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications