Amazon Athena is a Serverless and fully managed service that allows standard SQL based interactive queries on data stored in Amazon S3.
Key Points for Amazon Athena
- Fully managed – so no servers to manage, or no management needed for provisioning, patching, failures, etc.
- Athena helps you analyze unstructured, semi-structured, and structured data stored in Amazon S3.
- No ETL required
- You can use Athena to run ad-hoc queries using ANSI SQL, without the need to aggregate or load the data into Athena.
- Athena is built on Presto (a high performance distributed SQL query engine), and supports standard ANSI SQL.
- Athena works with wide variety of data formats, including:
- CSV, JSON, or columnar data formats such as Apache Parquet and Apache ORC.
- Athena uses Apache Hive DDL to define tables.
- Athena uses an internal Data Catalog to store information about the databases.
- You can also enable AWS Glue Data Catalog, which provides additional features.
- Athena can be accessed via AWS Console, API, or ODBC / JDBC driver.
- Athena integrates with Amazon QuickSight for easy data visualization.
- It can also be used with standard BI tools through ODBC / JDBC drivers
- You pay only for the queries that you run.
Following diagram shows a conceptual view of how Athena interacts with data in S3:
Pricing
Athena service is charged based on data scanned per each query you run:
- Data scanned – per TB
Notes:
- You can save 30-90% on your query costs and get better performance by compressing, partitioning, and converting your data into columnar formats.
- You are not charged for failed queries.
- There is an additional charge if you leverage AWS Glue Data Catalog.
External Resources