Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

7. Glossary

A/B Testing — Comparing two model versions in production by splitting live traffic between them. See §5.1.3.

Amazon Bedrock — Managed service for accessing and customizing foundation models from multiple providers. See §3.1.3. High exam relevance.

Amazon Comprehend — NLP service for sentiment analysis, entity recognition, and topic modeling. See §3.1.4.

Amazon Comprehend Medical — Specialized NLP service for extracting medical entities from clinical text. See §3.1.4.

Amazon EMR — Managed Hadoop/Spark framework for large-scale data processing. See §2.2.4.

Amazon Kinesis — Managed streaming data platform for real-time data ingestion and processing. See §2.1.3. High exam relevance.

Amazon Macie — Service that uses ML to discover and protect sensitive data (PII/PHI) in S3. See §5.3.4.

Amazon Rekognition — Pre-built computer vision service for image and video analysis. See §3.1.4.

Amazon SageMaker — Comprehensive ML platform covering data prep through deployment and monitoring. See §1.3. Highest exam relevance — appears in 60-80% of questions.

Amazon Textract — Service for extracting text, tables, and forms from scanned documents. See §3.1.4.

Asynchronous Endpoint — SageMaker endpoint type for large payloads (up to 1 GB) and long processing times (up to 1 hour), with auto-scale-to-zero capability. See §4.1.1. High exam relevance.

Auto Scaling — Automatically adjusting compute resources based on demand metrics. For SageMaker endpoints, commonly based on InvocationsPerInstance. See §4.2.1.

Automatic Model Tuning (AMT) — SageMaker's hyperparameter optimization service using Bayesian optimization, random search, or grid search. See §3.2.2.

AWS CloudFormation — Infrastructure as code service using declarative JSON/YAML templates. See §4.2.2.

AWS CloudTrail — Service that logs all API calls in an AWS account for audit and compliance. See §5.2.1, §5.3.4. High exam relevance for security questions.

AWS CodeBuild — Managed build service for compiling code, running tests, and producing artifacts. See §4.3.1.

AWS CodeDeploy — Service for automating application deployments with strategies like blue/green and canary. See §4.3.1.

AWS CodePipeline — CI/CD orchestration service for automating release pipelines. See §4.3.1.

AWS Config — Service that monitors and evaluates resource configurations against compliance rules. See §5.3.4.

AWS Cost Explorer — Tool for visualizing and analyzing AWS spending. See §5.2.2.

AWS Glue — Serverless ETL service for data integration, built on Apache Spark. See §2.2.4. High exam relevance.

AWS Glue DataBrew — Visual data preparation tool with 250+ built-in transformations. See §2.2.4.

AWS Glue Data Quality — Service for defining and monitoring data quality rules. See §2.3.2.

AWS IAM — Identity and Access Management service for controlling access to AWS resources. See §5.3.1. High exam relevance for security questions.

AWS KMS — Key Management Service for creating and controlling encryption keys. See §5.3.3. High exam relevance.

AWS Lambda — Serverless compute service for running code without managing servers. See §4.1.1.

Batch Transform — SageMaker feature for large-scale offline inference without maintaining a persistent endpoint. See §4.1.1.

Bayesian Optimization — Hyperparameter search strategy that uses past results to intelligently choose next parameter combinations. Default strategy for SageMaker AMT. See §3.2.2.

Bias Drift — Change in model fairness metrics over time in production, detected by Model Monitor with Clarify. See §5.1.1.

Blue/Green Deployment — Deployment strategy maintaining two environments; traffic switches from old (blue) to new (green) with instant rollback capability. See §4.3.3. High exam relevance.

Bring Your Own Container (BYOC) — Custom Docker container for SageMaker training or inference when pre-built containers don't support your framework. See §4.1.3.

Canary Deployment — Deployment strategy that routes a small percentage of traffic to the new version, monitoring for errors before increasing. See §4.3.3.

Class Imbalance (CI) — Pre-training bias metric measuring uneven distribution of target classes in training data. See §2.3.1.

CloudWatch — Monitoring service for AWS resources, providing metrics, logs, and alarms. See §5.2.1. High exam relevance.

Concept Drift — Change in the relationship between input features and target variable over time. See §5.1.1.

Confusion Matrix — Table comparing predicted vs. actual classifications, enabling calculation of precision, recall, and F1 score. See §3.3.1.

Customer-Managed Key (CMK) — KMS key created and managed by the customer, providing full control over key policies. See §5.3.3.

Data Drift — Change in the statistical distribution of input features compared to training data. See §5.1.1. High exam relevance.

Data Wrangler — SageMaker visual tool for data exploration, transformation, and feature engineering with minimal code. See §2.2.4, §1.3.2. High exam relevance.

Difference in Proportions of Labels (DPL) — Pre-training bias metric measuring label imbalance between demographic groups. See §2.3.1.

Distributed Training — Splitting training across multiple GPUs or instances using data parallelism or model parallelism. See §3.2.4.

Dropout — Regularization technique that randomly disables neurons during training to prevent overfitting. See §3.2.3.

Early Stopping — Training technique that halts training when validation loss stops improving, preventing overfitting and saving compute. See §3.2.2.

Epoch — One complete pass through the entire training dataset. See §3.2.1.

F1 Score — Harmonic mean of precision and recall, balancing both metrics. See §3.3.1.

Feature Engineering — Process of creating, transforming, and selecting input variables to improve model performance. See §2.2.2.

Feature Store — SageMaker service for storing, sharing, and reusing features across teams and models. See §1.3.2, §2.2.4.

Foundation Model — Large pre-trained model that can be fine-tuned for specific tasks (e.g., via Bedrock or JumpStart). See §3.1.3.

Ground Truth — SageMaker service for creating labeled training datasets using human annotators. See §2.3.3.

Hyperparameter — Model configuration value set before training (e.g., learning rate, number of trees). See §3.2.2.

Inference Recommender — SageMaker tool that load-tests models across instance types to find optimal cost-performance. See §5.2.2.

JumpStart — SageMaker hub of pre-trained models, solution templates, and example notebooks. See §3.1.3.

L1/L2 Regularization — Techniques that add penalty terms to the loss function to prevent overfitting. L1 promotes sparsity; L2 penalizes large weights. See §3.2.3.

Managed Spot Training — SageMaker feature for using Spot Instances for training with automatic checkpointing and resumption. See §5.2.2.

Model Monitor — SageMaker service for continuous monitoring of deployed models, detecting data quality, model quality, bias, and feature attribution drift. See §5.1.2. High exam relevance.

Model Registry — SageMaker service for versioning, cataloging, and managing approval workflows for models. See §3.2.5.

Network Isolation — SageMaker configuration that prevents containers from making any outbound network calls. See §5.3.2.

One-Hot Encoding — Encoding technique that converts categorical values into binary vectors. See §2.2.3.

Overfitting — Model performs well on training data but poorly on unseen data due to memorizing noise. See §3.2.3.

Parquet — Columnar data format optimized for analytics and ML workloads with efficient compression. See §2.1.1.

Pipe Mode — SageMaker training mode that streams data directly from S3 rather than downloading to local disk. See §3.2.4.

Production Variant — A model version deployed behind a SageMaker endpoint that receives a configurable percentage of traffic. See §5.1.3.

Regularization — Techniques to prevent model overfitting by constraining model complexity. See §3.2.3.

RMSE (Root Mean Square Error) — Regression metric measuring the square root of average squared prediction errors. See §3.3.1.

ROC-AUC — Classification metric measuring the area under the receiver operating characteristic curve, evaluating model discrimination ability. See §3.3.1.

SageMaker Clarify — Service for detecting bias in data and models, and explaining model predictions using SHAP values. See §3.3.2, §5.1.2. High exam relevance.

SageMaker Debugger — Service for debugging training convergence issues by capturing tensor data during training. See §3.3.3.

SageMaker Neo — Service for compiling and optimizing models for deployment on edge devices. See §4.1.4.

SageMaker Pipelines — ML workflow orchestration service for building, automating, and managing end-to-end ML pipelines. See §4.3.2. High exam relevance.

Script Mode — SageMaker training approach where you provide your own training script with a supported framework (TensorFlow, PyTorch). See §3.2.1.

Serverless Endpoint — SageMaker endpoint type that auto-scales to zero when idle, suitable for intermittent traffic. See §4.1.1.

Shadow Variant — A model version that receives production traffic copies for evaluation but doesn't serve predictions to users. See §5.1.3.

SHAP Values (Shapley Values) — Method for explaining individual predictions by quantifying each feature's contribution. See §3.3.2.

SMOTE — Synthetic Minority Over-sampling Technique for addressing class imbalance by generating synthetic training examples. See §2.3.1.

Spot Instances — EC2 instances available at up to 90% discount but subject to 2-minute interruption notice. See §5.2.2.

Underfitting — Model is too simple to capture underlying data patterns, performing poorly on both training and test data. See §3.2.3.

VPC Mode — SageMaker configuration where training jobs and endpoints run inside a customer's VPC for network isolation. See §5.3.2. High exam relevance.

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications