3.1.1. Mapping Business Problems to ML Algorithms
💡 First Principle: ML problems fall into a small number of categories—classification, regression, clustering, anomaly detection, recommendation, and forecasting—and each category has algorithms that are well-suited to it. The first step in model selection is always identifying the problem category, not the algorithm.
Interpretability considerations: Some business contexts require model interpretability—you need to explain why the model made a prediction. Linear models and tree-based models are inherently interpretable. Neural networks are not. If a question mentions "regulatory requirements," "explain decisions to customers," or "audit trail for predictions," interpretability should drive your model choice toward simpler, explainable models—or toward using SageMaker Clarify's SHAP values to explain a complex model.
⚠️ Exam Trap: When a question describes a "prediction problem," determine whether it's classification or regression before selecting an algorithm. "Predict whether a customer will churn" is classification (binary output). "Predict how much a customer will spend" is regression (continuous output). The same algorithm family (XGBoost, Linear Learner) can do both, but the configuration differs—and the evaluation metrics differ entirely.
Reflection Question: A retail company wants to predict which products a customer will buy next, estimate their total quarterly spending, and identify fraudulent return patterns. Classify each as a problem type and recommend an algorithm.