1.2.1. Core Service Categories and When They Apply
š” First Principle: Services exist to solve specific problems at specific scales. Choosing the right service means matching the problem's characteristics ā volume, velocity, access pattern, and cost constraints ā to the service designed for that profile.
The heaviest-tested services on this exam are AWS Glue (appears across ingestion, transformation, cataloging, and data quality), Amazon S3 (foundational to nearly every architecture), Amazon Redshift (the go-to warehouse with Spectrum, federated queries, and materialized views), Kinesis (streaming ingestion), and Lake Formation (governance and fine-grained access). If you understand these five deeply, you can answer a majority of exam questions.
For service selection on the exam, use this decision pattern: What is the data's velocity? ā What is the data's volume? ā What access pattern is needed? ā What's the budget constraint? This narrows candidates to 2ā3 services, and the scenario details will eliminate all but one.
ā ļø Exam Trap: AWS Glue and Amazon EMR both run Apache Spark. The distinction that matters on the exam: Glue is serverless (no cluster management, pay per DPU-second) while EMR gives you full cluster control (custom JARs, multiple frameworks, persistent or transient clusters). If the question mentions "least operational overhead," it's Glue. If it mentions "custom Spark libraries" or "Hadoop ecosystem," it's EMR.
Reflection Question: A startup needs to process 50 GB of CSV files nightly and load results into S3 as Parquet. They have no data engineering team. Which transformation service minimizes operational burden?