Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

3.1.2. SageMaker Built-in Algorithms and When to Apply Them

Each specialized algorithm has a specific data format preference and hyperparameter vocabulary. DeepAR expects JSON Lines with "start" timestamps and "target" arrays — feeding it tabular CSV causes cryptic failures. BlazingText in unsupervised mode (Word2Vec) needs one-sentence-per-line plain text, while supervised mode needs __label__ prefixed lines. Object Detection requires RecordIO with annotation manifests. Understanding these input contracts prevents the most common training failures and saves hours of debugging. When the exam mentions "time-series forecasting with related time series," DeepAR is almost always the answer. When it mentions "fast text classification with millions of categories," BlazingText supervised mode is the target.

💡 First Principle: SageMaker built-in algorithms are pre-optimized for AWS infrastructure—they scale automatically, support distributed training out of the box, and work with SageMaker's training/hosting pipeline with minimal configuration. The trade-off is less flexibility compared to custom code. The exam tests whether you know which built-in algorithm matches a given problem.

AlgorithmProblem TypeData TypeKey Exam Signal
Linear LearnerClassification / RegressionTabular"Linear relationship," "simple classification," "fast training"
XGBoostClassification / RegressionTabular"Tabular data," "feature importance," "gradient boosting," "structured data"
K-MeansClusteringTabular"Group similar items," "segmentation," "no labels"
Random Cut ForestAnomaly DetectionTabular / Time-series"Anomaly," "outlier," "unusual patterns"
DeepARTime-series ForecastingTime-series"Forecast," "multiple related time-series," "demand prediction"
BlazingTextText Classification / Word2VecText"Text classification," "word embeddings," "fast NLP"
Image ClassificationImage ClassificationImages"Classify images," "ResNet," "image labels"
Object DetectionObject DetectionImages"Locate objects," "bounding boxes," "detect items in image"
Semantic SegmentationPixel-level ClassificationImages"Pixel-level labeling," "segment regions," "autonomous driving"
Factorization MachinesRecommendation / Sparse dataSparse tabular"Recommendation," "sparse features," "click-through prediction"
LDA (Latent Dirichlet Allocation)Topic ModelingText"Discover topics," "topic modeling," "document themes"
IP InsightsAnomaly DetectionIP usage patterns"Unusual IP activity," "login anomalies"

When NOT to use built-in algorithms: If your problem requires a custom neural network architecture, a framework not natively supported, or an algorithm with hyperparameters that SageMaker's built-in version doesn't expose, use Script Mode or BYOC instead. The built-in algorithms are also less suitable when you need to customize the training loop itself (e.g., curriculum learning, custom loss functions).

⚠️ Exam Trap: XGBoost in SageMaker comes in two flavors: the built-in algorithm (container managed by AWS, limited hyperparameters) and the open-source version run through Script Mode (full XGBoost API). If a question asks about "SageMaker's XGBoost" without qualification, it means the built-in. If it mentions "custom XGBoost configuration" or "XGBoost script mode," it means the open-source version running in a managed container.

Reflection Question: A team needs to forecast daily demand for 500 products across 20 stores. Each product-store combination has 2 years of daily history. Which SageMaker built-in algorithm is designed for this exact scenario, and what makes it superior to training 10,000 individual models?

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications