Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

4.2.1. SQL Analysis with Athena and Redshift

šŸ’” First Principle: Athena and Redshift both execute SQL, but their architectures serve different query patterns. Athena is serverless and scans S3 directly — ideal for infrequent, ad-hoc queries on data lake data with no upfront cost. Redshift maintains its own storage and query engine — ideal for frequent, complex, concurrent queries where sub-second response time and caching matter.

Athena key features: partition projection (calculate partitions mathematically instead of querying the Glue Catalog), workgroups (isolate queries by team with separate cost tracking and limits), CTAS (create tables from query results), and federated query (query data in RDS, DynamoDB, and other sources via Lambda connectors). Views in Athena create reusable query abstractions without materializing data.

Redshift SQL supports stored procedures for complex transformation logic, materialized views for precomputed aggregation, and workload management (WLM) queues for prioritizing different query types. The v1.1 syllabus explicitly tests SQL in both Redshift and Athena.

Decision FactorChoose AthenaChoose Redshift
Query frequencyOccasional, ad-hocFrequent, repeated
Data locationS3 (data lake)Loaded into Redshift or via Spectrum
ConcurrencyLow-moderateHigh (hundreds of users)
LatencySeconds to minutesSub-second (cached)
Cost modelPer TB scannedPer cluster-hour or RPU-second
SetupNone (serverless)Cluster or serverless configuration

āš ļø Exam Trap: Athena's MSCK REPAIR TABLE and partition projection both handle partitions but differently. MSCK updates the Glue Catalog by scanning S3 (slow for many partitions). Partition projection calculates partitions at query time without the catalog (fast, no crawl needed). For time-series data with many date partitions, partition projection is strongly preferred.

Reflection Question: An analytics team runs 20 standard reports every morning at 8 AM, each joining 3 tables with billions of rows. Throughout the day, they also run occasional ad-hoc queries on historical data in S3. What's the optimal architecture?

Alvin Varughese
Written byAlvin Varughese
Founder•15 professional certifications