Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

3.6. Reflection Checkpoint

Without a solid grasp of storage services, you risk confusing scenarios where Redshift Spectrum and Athena seem interchangeable — imagine picking the wrong one on 5 questions because you missed the key differentiator. Think of storage selection like choosing the right container: a filing cabinet, a warehouse, and a vault all store things but serve fundamentally different purposes.

Key Takeaways

Before proceeding, ensure you can:

  • Select the right data store based on access patterns: S3 for data lakes, Redshift for analytics, DynamoDB for key-value, RDS/Aurora for transactional
  • Explain when to use Redshift Spectrum vs loading data vs federated queries
  • Design DynamoDB tables with appropriate partition keys, sort keys, and GSIs
  • Describe Apache Iceberg's value proposition (ACID, time travel, schema evolution) over plain S3 data lakes
  • Distinguish between technical catalogs (Glue Data Catalog) and business catalogs (SageMaker Catalog)
  • Configure S3 lifecycle policies to optimize storage costs across hot/warm/cold tiers
  • Design Redshift distribution keys and sort keys for join performance
  • Explain vector indexes (HNSW, IVF) and when vector search applies to data engineering

Connecting Forward

In Phase 4, you'll learn how to operate and maintain the pipelines and data stores you've designed — monitoring with CloudWatch, troubleshooting Glue and EMR failures, ensuring data quality, and analyzing data with Athena and QuickSight. The data store knowledge from this phase directly informs how you monitor performance and optimize queries in Phase 4.

Self-Check Questions

  1. A media streaming company stores 500 TB of user viewing history. The data analytics team runs complex aggregation queries daily. The recommendation engine needs sub-millisecond access to each user's last 50 views. The compliance team needs 7-year retention of all data with infrequent access. Design the data store architecture using at least three AWS services, explaining why each is chosen.

  2. A company's data lake on S3 has grown to 200 TB. They need to correct historical records (GDPR deletion requests), track data lineage for audit, and support schema changes without rewriting files. Currently they use plain Parquet files. What AWS technology solves these requirements, and what specific capabilities does it provide?

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications