1.4. Core Trade-offs in ML Systems
π‘ First Principle: Every ML architecture decision involves trade-offs, and the exam is fundamentally a trade-off exam. Think of it like a budget β you can't maximize cost savings, latency, and accuracy simultaneously, so the "correct" answer is the one that best satisfies the constraints described in the scenario. What happens when you ignore trade-offs? You over-engineer solutions, burn budget on unnecessary GPU instances, or deploy real-time endpoints for batch workloads. Can you identify which constraint matters most in a given scenario? Learning to read those constraints is the meta-skill the exam tests.
In traditional certification exams, questions often have a single objectively correct answer. ML engineering questions are different: the "right" service depends on latency requirements, budget, data volume, team expertise, and regulatory constraints. The exam gives you these constraints in the question stem and expects you to weigh them. If you've internalized the trade-offs, you can eliminate 2-3 options immediately.
Think of ML system design like planning a road trip. You can optimize for speed (highway, expensive tolls), cost (back roads, more time), comfort (rest stops, longer trip), or scenery (scenic route, unpredictable). No single route is "correct"βthe best route depends on your priorities. The exam gives you the priorities and expects you to pick the route.
β οΈ Common Misconception: Candidates often default to the most powerful (and most expensive) option β GPU instances, real-time endpoints, distributed training β assuming "best performance" is always correct. The exam deliberately includes budget and frequency constraints that make simpler, cheaper solutions the right answer. If the scenario says "once-daily batch scoring of 500 records," a serverless endpoint or even Lambda beats a provisioned GPU endpoint every time.