Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

2.1.4. Change Data Capture: DynamoDB Streams and AWS DMS

šŸ’” First Principle: Change data capture (CDC) turns database mutations into a stream of events. Instead of asking "what does the database look like now?" you capture "what just changed?" — enabling real-time synchronization, event-driven architectures, and incremental data lake loading without polling or full table scans.

CDC is the bridge between transactional databases and data pipelines. Every insert, update, or delete becomes an event that downstream systems can react to. This is fundamentally more efficient than periodic full exports — imagine dumping a 500 GB database every hour just to capture the 1,000 rows that changed.

DynamoDB Streams captures item-level changes in a DynamoDB table and stores them as an ordered sequence of change records for 24 hours. Each record contains the item's key, the old image (before), the new image (after), or both. You process the stream with Lambda triggers or Kinesis Client Library applications. Common patterns: replicating DynamoDB data to Redshift for analytics, triggering notifications on data changes, and maintaining materialized views.

AWS DMS (Database Migration Service) captures changes from relational databases (RDS, Aurora, on-premises MySQL/PostgreSQL/Oracle/SQL Server) and replicates them to targets like S3, Redshift, Kinesis, or Kafka. DMS supports both full-load migration and ongoing CDC replication. For data engineering, the ongoing replication mode is the CDC workhorse — it reads the database's transaction log and streams changes to the target in near-real-time.

Key DMS concepts for the exam: a replication instance runs the migration tasks, source and target endpoints define the connections, and a replication task specifies what to migrate (full load, CDC, or both). DMS can output CDC events to S3 as CSV or Parquet files, or to Kinesis Data Streams for real-time processing.

āš ļø Exam Trap: DynamoDB Streams and Kinesis Data Streams are different services despite similar names. DynamoDB Streams is specifically for DynamoDB change events with 24-hour retention and Lambda integration. Kinesis Data Streams is a general-purpose streaming service. If a question asks about capturing DynamoDB changes, the answer is DynamoDB Streams — not "sending data to Kinesis." (However, you can use Kinesis Data Streams as a DynamoDB Streams adapter for enhanced fan-out scenarios.)

Reflection Question: A company has a production Aurora PostgreSQL database and needs to load changes into their S3 data lake every few minutes without impacting production performance. Which service and approach do you recommend?

Alvin Varughese
Written byAlvin Varughese
Founder•15 professional certifications