Amazon Managed Streaming for Apache Kafka (MSK) is a managed service that enables applications to process streaming data using Apache Kafka.

 

What is Apache Kafka?

Apache Kafka is an open-source stream-processing software platform, written in Scala and Java, that provides a unified, high-throughput, and low-latency platform for handling real-time data feeds.

 

Kafka has three key functions:

  1. Enable Publish and Subscribe to stream of records
  2. Store streams of records (in the order received) – enabled via Queuing
  3. Process streams of records in real time

 

Kafka architecture:

Kafka Architecture

 

There are five major APIs in Kafka:

  • Producer API – enables publishing of streams of records
  • Consumer API – enables subscription to topics, and processing of streams of records
  • Connect API – enables continuous pull from source data system into Kafka, or push from Kafka to sink data system
  • Streams API – enables transformation of streams of data from input topics to output topics
  • Admin API – enables managing and inspecting topics, brokers, and other Kafka elements

 


Kafka with Amazon MSK

Amazon MSK encapsulates many aspect of Kafka implementation, and makes it easier for applications to natively use Apache Kafka APIs, without going in implementation complexity.

 

Following diagram shows simplified view of how Amazon MSK works:

Amazon Managed Streaming for Apache Kafka (MSK) - How it works

Image courtesy of AWS

 

Key Points

  • Amazon MSK is a fully managed service – it scales underlying infrastructure for Kafka implementation.
    • It handles patching, and automatic recovery from failures
  • Amazon MSK also manages Apache ZooKeeper nodes
    • ZooKeeper is an open-source project that enables centralized services for distributed configuration, synchronization and naming registry.
  • You can migrate your Kafka applications to Amazon MSK, and continue to use the APIs your applications were using
  • Encryption – Amazon MSK encrypts data at rest using KMS CMK, or your own CMK. It also encrypts data in transit via TLS.
  • Amazon MSK is PCI, ISO, SOC 1/2/3 compliant, and HIPAA eligible.

Pricing

Amazon MSK is billed for following components:

  • Broker Instances – per hour (based on instance type)
  • Broker Storage – per GB per month
  • Data Transfer in and out of Amazon MSK clusters – standard AWS data transfer charges apply

 


Resources