Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

1.2.1. Core Service Categories and When They Apply

šŸ’” First Principle: Services exist to solve specific problems at specific scales. Choosing the right service means matching the problem's characteristics — volume, velocity, access pattern, and cost constraints — to the service designed for that profile.

The heaviest-tested services on this exam are AWS Glue (appears across ingestion, transformation, cataloging, and data quality), Amazon S3 (foundational to nearly every architecture), Amazon Redshift (the go-to warehouse with Spectrum, federated queries, and materialized views), Kinesis (streaming ingestion), and Lake Formation (governance and fine-grained access). If you understand these five deeply, you can answer a majority of exam questions.

For service selection on the exam, use this decision pattern: What is the data's velocity? → What is the data's volume? → What access pattern is needed? → What's the budget constraint? This narrows candidates to 2–3 services, and the scenario details will eliminate all but one.

āš ļø Exam Trap: AWS Glue and Amazon EMR both run Apache Spark. The distinction that matters on the exam: Glue is serverless (no cluster management, pay per DPU-second) while EMR gives you full cluster control (custom JARs, multiple frameworks, persistent or transient clusters). If the question mentions "least operational overhead," it's Glue. If it mentions "custom Spark libraries" or "Hadoop ecosystem," it's EMR.

Reflection Question: A startup needs to process 50 GB of CSV files nightly and load results into S3 as Parquet. They have no data engineering team. Which transformation service minimizes operational burden?

Alvin Varughese
Written byAlvin Varughese
Founder•15 professional certifications