Amazon Kinesis Data Firehose is a managed service that enables delivering of real-time streaming data to storage and analytics destinations such as S3, Redshift, Elasticsearch Service, Splunk, and any custom HTTP endpoint(s).
Following diagram shows simplified view of how Kinesis Data Firehose works:
Image courtesy of AWS
Key Components of Kinesis Data Firehose
Delivery Stream
- The entity of Kinesis Data Firehose where you send the streaming data.
- Record – data is loaded onto Delivery Stream in units called Records.
- Each Record can be up to 1,000 KB
- Firehose buffers incoming streaming data up to a size and / or for a certain period of time before delivering to specified destination(s).
(Data) Source / Producer
- Application or device that generates or puts streaming data into Delivery Stream
- Examples – IoT devices, Server Log generators,
Destination
- A data store where your data is to be delivered.
- Firehose currently supports:
- S3
- Redshift
- Elasticsearch Service (ES)
- Splunk
Key Points
- Kinesis Data Firehose is a fully managed service that automatically scales to match the throughput of your data.
- It can batch, compress, transform, and encrypt data streams before loading into storage.
- Delivery is near-real-time (within 60 seconds)
- You can write to Kinesis Data Firehose using:
- Kinesis Data Streams
- Kinesis Agent
- AWS SDK
- CloudWatch Logs
- CloudWatch Events
- AWS IoT
- Use Cases: IoT Analytics, Log and Clickstream Analytics, Security monitoring
Data Flow scenarios for various Destinations
Delivery to S3
- Directly to S3
- If transformation is performed, the original data may be stored as backup in another S3 bucket
Image courtesy of AWS
Delivery to Redshift
- Data is first stored to S3, and then to Redshift cluster via COPY command.
- If transformation is performed, the original data may be stored as backup in an S3 bucket.
Image courtesy of AWS
Delivery to Elasticsearch Service (ES)
- Data is stored directly to ES
- If transformation is performed, the original data may be stored as backup in an S3 bucket.
Image courtesy of AWS
Delivery to Splunk
- Data is stored directly to Splunk
- Original data may be backed up to an S3 bucket.
Image courtesy of AWS
Pricing
Kinesis Data Firehose is billed for following components:
- Data ingested – per GB
- Rates are tiered (e.g., first 500 TB, next 1.5 PB, etc.)
- Data transformed – per GB
- Data delivered to VPC
- Amount of data – per GB, and
- Duration of active delivery stream – per hour, per AZ
External Resources