2.1.1. Amazon Kinesis Data Streams
š” First Principle: Kinesis Data Streams gives you a durable, ordered buffer between data producers and consumers ā like a multi-lane highway with guaranteed lane ordering. Records enter a shard (lane), are stored for a configurable retention period, and multiple consumers can independently read the same data at their own pace.
Kinesis Data Streams (KDS) is the foundational AWS streaming service. Producers write records with a partition key, which determines which shard the record lands in. Within a shard, records maintain strict ordering. Consumers ā Lambda functions, KCL applications, or Managed Apache Flink ā read from shards and process records.
The key architectural concepts you need for the exam:
Shards and capacity. Each shard supports 1 MB/s ingest and 2 MB/s output (shared among consumers). If you need more throughput, add more shards. On-demand mode eliminates manual shard management ā KDS scales automatically up to the account's shard limit.
Partition keys and ordering. The partition key determines which shard receives a record. Records with the same partition key always go to the same shard, guaranteeing order. For IoT scenarios, use the device ID as the partition key; for clickstream, use the session ID. Poor partition key choice creates hot shards ā one overloaded shard while others sit idle.
Consumers and fan-out. Standard consumers share the 2 MB/s per shard. Enhanced fan-out gives each consumer a dedicated 2 MB/s pipe via HTTP/2 push, eliminating consumer interference. The exam loves this distinction: if a question mentions multiple consumers needing independent, low-latency reads, enhanced fan-out is the signal.
Retention. Default is 24 hours, extendable to 365 days. Extended retention makes streams replayable ā consumers can rewind and reprocess historical data. If a question mentions reprocessing or replaying stream data, retention configuration is relevant.
| Feature | Standard Consumer | Enhanced Fan-Out |
|---|---|---|
| Throughput | 2 MB/s shared per shard | 2 MB/s dedicated per consumer |
| Latency | ~200ms (polling) | ~70ms (push) |
| Cost | Lower | Higher per consumer |
| Use when | 1ā2 consumers per stream | Multiple consumers need independent reads |
ā ļø Exam Trap: Kinesis Data Streams does NOT deliver data directly to S3 or Redshift ā it's a buffer, not a delivery mechanism. You need a consumer (Lambda, KCL, Managed Flink) or Kinesis Data Firehose to move data from Streams to a destination. If a question asks for "delivering streaming data to S3 with minimal code," Firehose (possibly fed from Streams) is the answer, not Streams alone.
Reflection Question: An application produces 500 records/second, each 5 KB, with 3 downstream consumers. Should you use standard consumers or enhanced fan-out? How many shards do you need?