Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

6.3. Mixed-Topic Practice Questions

Question 1 (Domain 1 + Domain 2)

A retail company processes 10 TB of daily sales data. Raw CSV files arrive in S3 from 200 stores overnight. The data must be: (a) converted to Parquet, (b) partitioned by store and date, (c) cataloged for Athena queries, and (d) loaded as aggregated daily summaries into Redshift for dashboards. The team wants minimal operational overhead.

Which architecture meets all requirements?

A) S3 → EMR Spark job → S3 Parquet → Glue crawler → Redshift COPY B) S3 → Glue ETL job → S3 Parquet → Glue crawler → Redshift COPY C) S3 → Lambda → S3 Parquet → Athena CTAS → Redshift federated query D) S3 → Kinesis Firehose → S3 Parquet → Glue crawler → Redshift Spectrum

Answer: B. Glue ETL provides serverless Spark transformation (CSV → Parquet with partitioning), Glue crawlers catalog the output, and Redshift COPY loads aggregated summaries. This has the least operational overhead. A) EMR requires cluster management. C) Lambda can't handle 10 TB (memory/timeout limits). D) Firehose is for streaming, not batch file processing.


Question 2 (Domain 3 + Domain 4)

A healthcare company stores patient records in a data lake on S3. The security team requires: column-level access control (analysts cannot see SSN), all data access logged, and PII columns automatically detected when new datasets arrive. Which combination of services meets these requirements?

A) IAM policies for column access + CloudTrail for logging + Glue crawlers for PII detection B) Lake Formation for column-level security + CloudTrail for logging + Amazon Macie for PII detection C) Redshift row-level security + CloudWatch Logs for logging + Amazon Comprehend for PII detection D) S3 bucket policies for column access + AWS Config for logging + Amazon Macie for PII detection

Answer: B. Lake Formation provides column-level security on data lake tables. CloudTrail logs API access. Macie automatically discovers PII in S3 buckets. A) IAM policies can't enforce column-level access on S3/Athena. C) Redshift RLS is row-level, not column-level, and data is in S3, not Redshift. D) S3 bucket policies operate at the object level, not column level.


Question 3 (Domain 1 + Domain 3)

A financial services company needs to capture real-time changes from an Aurora PostgreSQL database and make them queryable in their S3 data lake within 5 minutes. They also need the ability to update and delete individual records in the data lake for GDPR compliance. The team wants minimal custom code.

Which architecture meets these requirements?

A) DMS CDC → Kinesis Data Streams → Lambda → S3 Parquet files B) Aurora PostgreSQL logical replication → EC2 consumer → S3 with versioning C) DMS CDC → S3 (Parquet) → Apache Iceberg table managed by Glue D) DynamoDB Streams → Kinesis Firehose → S3 → Athena

Answer: C. DMS CDC captures Aurora changes and writes to S3. Apache Iceberg enables row-level updates and deletes on S3 data (GDPR compliance). Glue manages the Iceberg table. A) Lambda writing individual records to S3 doesn't support updates/deletes on existing records. B) Requires custom code (EC2 consumer). D) DynamoDB Streams is for DynamoDB, not Aurora.


Question 4 (Domain 2 + Domain 4)

A data engineering team manages a Glue ETL job that processes customer data from S3 and loads it into Redshift. The job needs to read from an S3 bucket in Account A, access the Glue Data Catalog in Account A, and write to a Redshift cluster in Account B. All data must be encrypted with a customer-managed KMS key. What IAM and encryption configuration is required?

A) Create a Glue service role in Account A with cross-account S3 and Redshift access. Use SSE-S3 for encryption. B) Create a Glue service role in Account A with S3 and Catalog access. Create a Redshift IAM role in Account B. Use SSE-KMS with key policies granting both accounts access. C) Create identical Glue service roles in both accounts. Use SSE-C with customer-provided keys. D) Use a single IAM user with access keys shared between accounts. Use SSE-KMS with the default AWS managed key.

Answer: B. The Glue role in Account A accesses S3 and the Catalog. The Redshift role in Account B enables the COPY command. SSE-KMS with cross-account key policy grants meet the customer-managed key requirement. A) SSE-S3 doesn't use customer-managed keys. C) Identical roles in both accounts is unnecessary and SSE-C adds complexity. D) IAM users with shared access keys violate security best practices.


Question 5 (Domain 1)

A streaming data pipeline uses Kinesis Data Streams with 10 shards. Three consumer applications process the stream independently. During peak load, Consumer B falls behind and the Kinesis IteratorAge metric spikes to 30 minutes. Consumers A and C are processing normally. What is the most effective solution?

A) Add more shards to the stream to increase throughput. B) Enable enhanced fan-out for all three consumers. C) Enable enhanced fan-out for Consumer B only. D) Increase the stream retention period to 7 days.

Answer: C. Enhanced fan-out gives Consumer B a dedicated 2 MB/s pipe per shard, eliminating interference from Consumers A and C. Since A and C are processing normally, the issue is B's share of the standard throughput, not overall stream capacity. A) Adding shards increases total throughput but doesn't solve the consumer-level bottleneck. B) Enabling for all adds unnecessary cost. D) Longer retention prevents data loss but doesn't fix the processing delay.

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications