2.1.3. Amazon MSK (Managed Streaming for Apache Kafka)
💡 First Principle: Amazon MSK exists for organizations that already have Apache Kafka expertise or need Kafka-specific capabilities. Think of it as "bring your Kafka knowledge to AWS" — you get a fully managed Kafka cluster without the operational burden of managing ZooKeeper, broker failover, and storage scaling yourself.
Apache Kafka is an open-source distributed event streaming platform that's become an industry standard. MSK manages the Kafka infrastructure while preserving full compatibility with the Kafka API — existing Kafka producers, consumers, connectors, and Streams applications work without modification.
When does the exam point you toward MSK over Kinesis? Look for these signals: the team has existing Kafka expertise, the architecture uses Kafka Connect for source/sink integration, the scenario requires Kafka-specific features (log compaction, exactly-once semantics, or custom partitioning), or data retention requirements exceed what Kinesis offers economically.
MSK Serverless vs Provisioned. MSK Serverless automatically manages cluster capacity — you pay per data volume. MSK Provisioned gives you control over broker instance types, number of brokers, and storage volumes. The exam uses "least operational overhead" to signal Serverless, and "full control over cluster configuration" to signal Provisioned.
MSK Connect. A fully managed framework for running Kafka Connect connectors. It streams data between Kafka topics and external systems (databases, S3, OpenSearch) without custom code. If a question mentions integrating Kafka with other data stores, MSK Connect is often the answer.
| Feature | Kinesis Data Streams | Amazon MSK |
|---|---|---|
| Protocol | AWS proprietary API | Apache Kafka API |
| Scaling unit | Shard (1 MB/s in) | Partition (depends on broker) |
| Ordering | Per shard (partition key) | Per partition (message key) |
| Retention | 24h–365 days | Unlimited (tiered storage) |
| Ecosystem | KPL, KCL, Lambda, Flink | Kafka clients, Connect, Streams |
| Operational model | Fully serverless (on-demand) | Managed or Serverless |
| Best for | AWS-native architectures | Kafka-native teams, existing Kafka workloads |
⚠️ Exam Trap: Don't assume MSK is always better because Kafka is "more powerful." For AWS-native architectures where the team doesn't have Kafka expertise, Kinesis is simpler and integrates more naturally with Lambda, Firehose, and other AWS services. The exam rewards matching the solution to the team's skills and the architecture's needs, not picking the most feature-rich option.
Reflection Question: A company migrating from on-premises to AWS has existing Kafka producers writing to an on-premises Kafka cluster. They want to continue using their Kafka client libraries. Which AWS service should they choose, and why?