Amazon CloudWatch is a Monitoring and Management service that enables capturing key monitoring and operational data in the form of logs, metrics, and events in one centralized location, for AWS and on-premises resources and services.

 

Key Points

  • CloudWatch can natively collect metrics from most of AWS services and resources
  • You can leverage CloudWatch Agent or API to collect metrics from on-premises services and resources
  • CloudWatch allows up to 1-second visibility of metrics and log data and up to 15 months of data retention
  • Data retention is based on granularity, and each time-period data points are then aggregated into next time-period category:
    • Less than 60 seconds data points for 3 hours; aggregated to 1 minute metrics
    • 1 minute data points for 15 days; aggregated to 5 minutes metrics
    • 5 minute data points for 63 days; aggregated to 1 hour metrics
    • 1 hour data points for 455 days (15 months)
    • Note: you cannot delete metrics data. It simply expires at the end of retention period.
  • EC2 Standard monitoring is performed at 5 minute intervals, but detailed monitoring allows monitoring to be done at 1 minute intervals (at an extra cost)
  • CloudWatch Alarms can be created to trigger alerts
  • You can use IAM to specify which CloudWatch actions can a user perform.
    • You cannot limit access to CloudWatch data for specific resources. When you grant access to CloudWatch data, it’s for all the data (and for example, not just data from specific EC2 instances and not others)
    • You cannot use IAM roles with CloudWatch command line tools

 


Key Components of Amazon CloudWatch

 

CloudWatch Logs

CloudWatch Logs provide a centralize place to collect, monitor and analyze the logs from multiple sources, such as AWS services, your applications, and 3rd parties.

  • You can retain your logs and can specify retention period by log group (logical grouping of related logs)
  • You can query your log data using CloudWatch Logs Insights

 

CloudWatch Alarms

You can create CloudWatch alarms that monitor specific CloudWatch metrics and then trigger notification when specific threshold is breached.

  • Metric Alarm watches a single CloudWatch metric for a value or calculated value
  • Composite Alarm works based on a rule expression that considers alarm states of multiple alarms
  • Alarm history is available for 14 days

 

Configuring an Alarm requires following settings

  • Period – expressed in seconds, is the length of the time to evaluate the metric or expression for each data point
  • Evaluation Period – is the number of most recent periods, or data points, to evaluate when determining alarm state
  • Datapoints to Alarm – is the number of data points within the Evaluation periods that must be breaching to cause the alarm to go to the ALARM state.
  • Additionally, you can specify how to treat missing data points when evaluating an alarm.

 

Alarm States

  • OK – the metric or expression is within the defined threshold
  • ALARM – the metric or expression is outside of the defined threshold
  • INSUFFICIENT_DATA – the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the state

 

CloudWatch Events (CWE)

CloudWatch Events is a stream of system events describing changes in your AWS resources.

  • This is in addition to existing CloudWatch Metrics and Logs from these resources
  • Currently only these resources are supported:
    • EC2, Auto Scaling, and CloudTrail
    • Also, via CloudTrail, mutating API calls (that is, calls other than Describe, List, and Get) across all services are also visible in CloudWatch Events
  • You can create rules to trigger actions based on specific CloudWatch Events

 


Multi-dimensional usage of Amazon CloudWatch

 

Collect

  • Logs – three primary categories
    • Vended Logs – natively published logs (currently only from VPC Flow Logs and Route 53)
    • (AWS Service) Logs – published by AWS Services (fair number of AWS services support this)
    • Custom Logs – published by your applications and resources from within AWS environment, or from on-premises (via CloudWatch Agent or API)
  • Metrics – most of AWS services support capturing of key metrics (specific to that service)
  • Custom Metrics – from your applications and resources

 

Monitor

  • CloudWatch Dashboards enable customizable visual playground to view metrics and logs for easy analysis
  • CloudWatch Alarms enable setting thresholds based triggers and actions on metrics
  • Container Insights enable automatic dashboards for various metrics of deployed containers
  • CloudWatch Anomaly Detection enables use of machine-learning algorithms to analyze collected metrics and trigger actions
  • CloudWatch ServiceLens enhances the observability of your services and applications by enabling you to integrate traces, metrics, logs, and alarms into one place.
  • CloudWatch Synthetics allow you to create scripts (called canaries) that run on a schedule mimicking your customer actions to monitor your endpoints and APIs
    • Canaries are Node.js scripts that run as Lambda functions

 

Act

  • Auto Scaling can be triggered based on CloudWatch alarms
  • CloudWatch Events can trigger actions enabling automation

 

Analyze

  • CloudWatch metrics data can be analyzed in almost real-time, or you can analyzed months-worth of captured data for seasonality trends
  • CloudWatch Logs Insights enable customized queries with aggregations, filters, and regular expressions to gain useful insight from your captured log data

 


Notes on Monitoring vs Observability:

  • Monitoring is focused on operations (of an application, a resource, or interaction) to determine state (good / bad / warning) or to detect behavioral deviation
    • Focuses on the State (and its variations), thus focusing on the “effects”
  • Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.
    • Focuses on influencers of the State, thus enabling focus on the “causes”

 


Pricing

Fair amount of CloudWatch related metrics, alarms, etc., are covered in its Free Tier- see below.

Metrics

Basic Monitoring Metrics (at 5-minute frequency)

10 Detailed Monitoring Metrics (at 1-minute frequency)

 1 Million API requests (not applicable to GetMetricData and GetMetricWidgetImage)

Dashboard 3 Dashboards for up to 50 metrics per month
Alarms 10 Alarm metrics (not applicable to high-resolution alarms)
Logs 5GB Data (ingestion, archive storage, and data scanned by Logs Insights queries)
Events All events except custom events are included
Contributor Insights

1 Contributor Insights rule per month

The first one million log events that match the rule per month

Synthetics 100 canary runs per month

 

Please visit following page to see detailed pricing for usage beyond (above-mentioned) free tier:

AWS CloudWatch Pricing

 


External Resources