Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

6.3. Mixed-Topic Practice Questions

These questions span multiple domains to simulate the actual exam experience. Each includes a detailed rationale.

Question 1. A media company trains a content recommendation model weekly on user interaction data stored in Amazon S3. The training job takes 6 hours on an ml.p3.2xlarge instance. The team wants to reduce training costs by at least 50% without significantly impacting training time. Which approach meets this requirement?

A) Switch to an ml.m5.xlarge instance to reduce per-hour cost B) Use SageMaker managed Spot Training with checkpointing enabled C) Purchase a 1-year Reserved Instance for ml.p3.2xlarge D) Move the training data to Amazon EBS for faster I/O

Answer: B. Managed Spot Training uses Spot Instances that cost up to 70-90% less than On-Demand, with SageMaker automatically handling interruptions and resuming from checkpoints. Option A would save money but dramatically increase training time (CPU vs. GPU for ML training). Option C saves 30-40%, not 50%+. Option D addresses I/O performance, not cost.


Question 2. A financial services company deploys a fraud detection model on a SageMaker real-time endpoint. After three months, the model's false positive rate increases from 2% to 8%, while the false negative rate remains stable. No code changes have been made. What is the most likely cause and the appropriate first diagnostic step?

A) Model overfitting — retrain with more regularization B) Data drift in input features — check SageMaker Model Monitor data quality reports C) Concept drift — collect new labeled data and retrain D) Infrastructure issue — check CloudWatch endpoint metrics

Answer: B. The false positive rate increased while the false negative rate stayed stable, suggesting input feature distributions have changed (data drift) rather than the underlying fraud patterns changing (concept drift). Model Monitor data quality reports would reveal which features have drifted. Option A is incorrect because overfitting is a training-time issue. Option C is plausible but premature—diagnose before retraining. Option D is incorrect because infrastructure issues would affect latency/availability, not prediction accuracy.


Question 3. A healthcare startup wants to extract medical entities (diagnoses, medications, procedures) from unstructured clinical notes. They have no ML expertise on the team and need a solution deployed within two weeks. Which approach is most appropriate?

A) Train a custom NER model using SageMaker with a labeled clinical dataset B) Fine-tune a foundation model on Amazon Bedrock with clinical examples C) Use Amazon Comprehend Medical to extract medical entities D) Use Amazon Textract to extract text and then classify entities with Lambda

Answer: C. Amazon Comprehend Medical is a pre-built AI service specifically designed to extract medical entities from clinical text—no ML expertise or training data required, deployable immediately. Option A requires ML expertise and labeled data (weeks to months of effort). Option B requires some ML knowledge for fine-tuning. Option D extracts text from documents but doesn't understand medical entities.


Question 4. An ML team has a SageMaker Pipeline that retrains a model daily. They want to ensure that a new model is only deployed to production if it outperforms the current production model on a held-out test set. The deployment should gradually shift traffic to the new model. Which combination of services achieves this?

A) SageMaker Model Registry with approval gates + CodeDeploy with canary deployment B) SageMaker Pipelines with a condition step + SageMaker endpoint with production variants C) SageMaker Experiments to compare runs + manual deployment approval D) Lambda function to compare metrics + CloudFormation to update the endpoint

Answer: B. SageMaker Pipelines supports condition steps that evaluate whether a trained model's metrics exceed a threshold—this gates deployment automatically. Combined with production variants on the SageMaker endpoint, traffic can be gradually shifted to the new model (canary-style). Option A is close but adds unnecessary complexity with CodeDeploy when SageMaker's built-in variant traffic shifting achieves the same result. Option C requires manual intervention. Option D is over-engineered.


Question 5. A company processes customer support tickets using a text classification model. The model runs on a SageMaker serverless endpoint. During business hours (9 AM – 6 PM), the endpoint receives 100+ requests per minute. Outside business hours, it receives fewer than 1 request per hour. Users report that the first request each morning takes 15 seconds while subsequent requests complete in under 500ms. What should you do to improve morning response times while minimizing cost?

A) Switch to a real-time endpoint with scheduled scaling: scale up at 8:45 AM, scale down at 6:15 PM B) Switch to a real-time endpoint with auto scaling based on InvocationsPerInstance C) Keep the serverless endpoint but increase the provisioned concurrency D) Switch to a real-time endpoint running 24/7 with Savings Plans

Answer: A. The traffic pattern is highly predictable (business hours only), so scheduled scaling is ideal. Scaling up at 8:45 AM ensures warm instances by 9 AM, eliminating cold starts. Scaling down at 6:15 PM avoids paying for idle compute overnight. Option B would still have cold starts on the first morning request before auto scaling kicks in. Option C applies provisioned concurrency to serverless, which helps but is less cost-effective than scheduled scaling for this predictable pattern. Option D wastes money on 14 hours of unused compute per day.

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications