2.1.3. Streaming Data Ingestion with Kinesis and Kafka
💡 First Principle: ML systems that consume real-time data need a buffer between the data source and the processing layer. Without this buffer, spikes in data volume overwhelm processing, gaps in processing lose data, and the entire pipeline becomes fragile. Streaming services are that buffer—and the exam tests when to use each one.
Consider a fraud detection model that needs to score transactions in real time. Transactions arrive at variable rates—10 per second at 3 AM, 10,000 per second during a flash sale. Without a streaming buffer, the processing system must be provisioned for peak load (expensive) or risk dropping data during spikes (dangerous). Kinesis and Kafka decouple ingestion from processing, letting each scale independently.
| Service | Type | Key Characteristics | Best For |
|---|---|---|---|
| Kinesis Data Streams | Managed streaming | Shard-based, 1 MB/sec/shard write, 2 MB/sec/shard read, 24h–365 day retention | Real-time ML feature ingestion, custom processing |
| Amazon Data Firehose | Managed delivery | Auto-scales, no shards, near-real-time (60s buffer), transforms with Lambda | Delivering streaming data to S3/Redshift/OpenSearch for batch ML |
| Amazon MSK (Managed Kafka) | Managed Kafka | Open source, topic-based, unlimited retention, consumer groups | Teams already using Kafka, complex event processing |
| Amazon Managed Flink | Stream processing | SQL or Java/Scala, windowed aggregations, exactly-once semantics | Real-time feature computation, streaming ML inference |
Kinesis Data Streams vs. Firehose is one of the most frequently tested distinctions. Kinesis Data Streams gives you full control: you manage shards, write custom consumers, and process records in real time. Firehose is simpler: it auto-scales and delivers data to destinations (S3, Redshift, OpenSearch) with optional Lambda transformations, but it buffers data (minimum 60 seconds) so it's near-real-time, not truly real-time.
Streaming → Feature Store Pattern: A common architecture ingests streaming data through Kinesis, computes features with Flink or Lambda, and writes them to SageMaker Feature Store's online store for low-latency feature lookup during inference. The same features are also written to the offline store for batch training. This dual-write pattern ensures training and inference use consistent features.
⚠️ Exam Trap: When a question says "real-time" data delivery to S3, the answer is usually Firehose—even though Firehose has a buffer delay. In AWS terminology, "real-time delivery to storage" means Firehose. "Real-time processing" with custom logic means Kinesis Data Streams. The exam uses these terms precisely.
Reflection Question: A ride-sharing company needs to compute real-time surge pricing features from GPS and trip data, serve them during inference, and also use them for weekly model retraining. Which combination of streaming and storage services would you use?