3.1. Choosing a Modeling Approach
💡 First Principle: The right model is the simplest model that solves the business problem. Starting with the most complex architecture is tempting but wasteful—a logistic regression that achieves 92% accuracy on tabular data deploys faster, costs less, and is easier to explain than a deep neural network that achieves 93%. The exam tests whether you can match problem types to appropriate approaches on the complexity spectrum.
What fails when you choose the wrong modeling approach? Consider a team that spends three months fine-tuning a transformer model for tabular customer churn prediction. XGBoost, trainable in minutes, would have achieved the same accuracy. The team burned time and compute on an approach mismatched to the data type. Worse, the transformer model is harder to debug, explain to stakeholders, and monitor in production. Choosing wrong doesn't just waste resources—it creates ongoing operational debt.
Think of model selection like choosing transportation. A bicycle is perfect for a mile-long commute—adding a jet engine doesn't help. A freight ship is perfect for moving 10,000 containers across an ocean—a fleet of bicycles won't cut it. The exam gives you the distance (problem complexity) and cargo (data type) and asks you to pick the vehicle.
⚠️ Common Misconception: "Use a foundation model" is not always the right answer, even though Bedrock and JumpStart are heavily featured in AWS marketing. Foundation models excel at NLP and generative tasks but are overkill for structured tabular predictions. If the scenario describes tabular data with labels, XGBoost or Linear Learner is almost always preferred. The exam tests whether you can resist the "newest = best" bias.