Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

2.1.3. Streaming Data Ingestion with Kinesis and Kafka

💡 First Principle: ML systems that consume real-time data need a buffer between the data source and the processing layer. Without this buffer, spikes in data volume overwhelm processing, gaps in processing lose data, and the entire pipeline becomes fragile. Streaming services are that buffer—and the exam tests when to use each one.

Consider a fraud detection model that needs to score transactions in real time. Transactions arrive at variable rates—10 per second at 3 AM, 10,000 per second during a flash sale. Without a streaming buffer, the processing system must be provisioned for peak load (expensive) or risk dropping data during spikes (dangerous). Kinesis and Kafka decouple ingestion from processing, letting each scale independently.

ServiceTypeKey CharacteristicsBest For
Kinesis Data StreamsManaged streamingShard-based, 1 MB/sec/shard write, 2 MB/sec/shard read, 24h–365 day retentionReal-time ML feature ingestion, custom processing
Amazon Data FirehoseManaged deliveryAuto-scales, no shards, near-real-time (60s buffer), transforms with LambdaDelivering streaming data to S3/Redshift/OpenSearch for batch ML
Amazon MSK (Managed Kafka)Managed KafkaOpen source, topic-based, unlimited retention, consumer groupsTeams already using Kafka, complex event processing
Amazon Managed FlinkStream processingSQL or Java/Scala, windowed aggregations, exactly-once semanticsReal-time feature computation, streaming ML inference

Kinesis Data Streams vs. Firehose is one of the most frequently tested distinctions. Kinesis Data Streams gives you full control: you manage shards, write custom consumers, and process records in real time. Firehose is simpler: it auto-scales and delivers data to destinations (S3, Redshift, OpenSearch) with optional Lambda transformations, but it buffers data (minimum 60 seconds) so it's near-real-time, not truly real-time.

Streaming → Feature Store Pattern: A common architecture ingests streaming data through Kinesis, computes features with Flink or Lambda, and writes them to SageMaker Feature Store's online store for low-latency feature lookup during inference. The same features are also written to the offline store for batch training. This dual-write pattern ensures training and inference use consistent features.

⚠️ Exam Trap: When a question says "real-time" data delivery to S3, the answer is usually Firehose—even though Firehose has a buffer delay. In AWS terminology, "real-time delivery to storage" means Firehose. "Real-time processing" with custom logic means Kinesis Data Streams. The exam uses these terms precisely.

Reflection Question: A ride-sharing company needs to compute real-time surge pricing features from GPS and trip data, serve them during inference, and also use them for weekly model retraining. Which combination of streaming and storage services would you use?

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications