2.3.3. Data Labeling with Ground Truth and Mechanical Turk
š” First Principle: Supervised learning requires labeled data, and label quality directly determines model quality. Poor labels create a ceiling on model performance that no amount of algorithm tuning can break through. The exam tests your knowledge of AWS tools that create, manage, and quality-control the labeling process.
Getting accurate labels at scale is one of the most expensive and time-consuming parts of ML. Consider labeling 100,000 medical images for tumor detection: you need domain experts (radiologists), quality controls (inter-annotator agreement), and scale (you can't have one radiologist label everything). AWS provides tools to manage this three-way challenge.
| Service | What It Does | Labelers | Best For |
|---|---|---|---|
| SageMaker Ground Truth | Managed labeling workflows with auto-labeling | Private workforce, third-party, or Mechanical Turk | Image classification, object detection, text classification, semantic segmentation |
| Amazon Mechanical Turk | Crowdsourced human intelligence marketplace | Global crowd workforce | High-volume simple tasks (sentiment, image tagging, transcription) |
| Amazon A2I (Augmented AI) | Human review of ML predictions | Custom workforce | Reviewing low-confidence model predictions, compliance review |
Ground Truth's Auto-Labeling: Ground Truth starts with human labelers for an initial batch, then trains an internal model to auto-label remaining data. Only samples where the model is uncertain get sent to humans. This "active learning" approach can reduce labeling costs by 40-70%.
A2I (Augmented AI) is different from Ground Truth: it's not for creating training data but for reviewing model predictions in production. When an inference endpoint returns a low-confidence prediction, A2I routes it to a human reviewer. The reviewed result is returned to the caller, and the human-corrected labels can feed back into training data.
ā ļø Exam Trap: Ground Truth and A2I serve different purposes in the lifecycle. Ground Truth creates labeled training data (Stage 1). A2I reviews production predictions (Stage 4, between Deploy and Monitor). If a question asks about labeling data for model training, the answer is Ground Truth. If it asks about having humans review uncertain predictions, the answer is A2I.
Reflection Question: A company needs to label 500,000 product images across 50 categories. Budget is limited, and accuracy requirements are high. Which combination of AWS labeling tools would minimize cost while maintaining quality?