Amazon Kinesis Data Analytics is a managed service that enables transformation and analytics of streaming data in real time, using Apache Flink.
What is Apache Flink?
Apache Flink is an open-source, unified stream-processing and batch-processing framework. Its streaming data-flow engine is written in Java and Scala.
- Kinesis Data Analytics service manages several aspects of Flink, and thus reducing complexity of using Flink with other AWS services
Following diagram shows simplified view of how Kinesis Data Analytics works:
Image courtesy of AWS
Key Components of Kinesis Data Analytics
Kinesis Data Analytics Application
This is the application that is continuously reading and processing streaming data, in real time.
- You write the application in SQL, or language supported by Apache Flink – such as Java, Scala, Python.
The Kinesis Data Analytics Application consists of three components:
- Input – this is the streaming source for your application.
- Application Code – a series of Apache Flink operators or SQL statements that process input and produce output.
- Output – this is the result of execution of application code operations.
Key Points
- Kinesis Data Analytics is a fully manage service, using Apache Flink.
- Kinesis Data Analytics supports standard SQL.
- You can use provided templates and interactive editor to build SQL queries.
- Use Cases – Streaming ETL, Real-time analytics, Stateful Event Processing (e.g., in case of online gaming)
Pricing
Kinesis Data Analytics is billed for following components:
- Kinesis Processing Unit (KPU) – per Hour
- A single KPU is 1 vCPU and 4 GB memory
- Running Application Storage – per GB per month
- Used for stateful processing capabilities
- (Optional) Durable Application Backups – per GB per month
- Used for storing snapshots for point-in-time recovery of applications
External Resources
- Amazon Kinesis Data Analytics Site
- Amazon Kinesis Data Analytics – Apache Flink Developer Guide
- Apache Flink Site