Amazon Athena is a Serverless and fully managed service that allows standard SQL based interactive queries on data stored in Amazon S3.

 

Key Points for Amazon Athena

  • Fully managed – so no servers to manage, or no management needed for provisioning, patching, failures, etc.
  • Athena helps you analyze unstructured, semi-structured, and structured data stored in Amazon S3.
    • No ETL required
    • You can use Athena to run ad-hoc queries using ANSI SQL, without the need to aggregate or load the data into Athena.
  • Athena is built on Presto (a high performance distributed SQL query engine), and supports standard ANSI SQL.
  • Athena works with wide variety of data formats, including:
    • CSV, JSON, or columnar data formats such as Apache Parquet and Apache ORC.
  • Athena uses Apache Hive DDL to define tables.
  • Athena uses an internal Data Catalog to store information about the databases.
    • You can also enable AWS Glue Data Catalog, which provides additional features.
  • Athena can be accessed via AWS Console, API, or ODBC / JDBC driver.
  • Athena integrates with Amazon QuickSight for easy data visualization.
    • It can also be used with standard BI tools through ODBC / JDBC drivers
  • You pay only for the queries that you run.

 

Following diagram shows a conceptual view of how Athena interacts with data in S3:

Amazon Athena Conceptual View

 


Pricing

Athena service is charged based on data scanned per each query you run:

  • Data scanned – per TB

 

Notes:

  • You can save 30-90% on your query costs and get better performance by compressing, partitioning, and converting your data into columnar formats.
  • You are not charged for failed queries.
  • There is an additional charge if you leverage AWS Glue Data Catalog.

 


External Resources