Amazon CloudWatch

Amazon CloudWatch is a Monitoring and Management service that enables capturing key monitoring and operational data in the form of logs, metrics, and events in one centralized location, for AWS and on-premises resources and services.

Key Points

CloudWatch can natively collect metrics from most of AWS services and resources
You can leverage CloudWatch Agent or API to collect metrics from on-premises services and resources
CloudWatch allows up to 1-second visibility of metrics and log data and up to 15 months of data retention
Data retention is based on granularity, and each time-period data points are then aggregated into next time-period category:
- Less than 60 seconds data points for 3 hours; aggregated to 1 minute metrics
- 1 minute data points for 15 days; aggregated to 5 minutes metrics
- 5 minute data points for 63 days; aggregated to 1 hour metrics
- 1 hour data points for 455 days (15 months)
- Note: you cannot delete metrics data. It simply expires at the end of retention period.
EC2 Standard monitoring is performed at 5 minute intervals, but detailed monitoring allows monitoring to be done at 1 minute intervals (at an extra cost)
CloudWatch Alarms can be created to trigger alerts
You can use IAM to specify which CloudWatch actions can a user perform.
- You cannot limit access to CloudWatch data for specific resources. When you grant access to CloudWatch data, it’s for all the data (and for example, not just data from specific EC2 instances and not others)
- You cannot use IAM roles with CloudWatch command line tools

Key Components of Amazon CloudWatch

CloudWatch Logs

CloudWatch Logs provide a centralize place to collect, monitor and analyze the logs from multiple sources, such as AWS services, your applications, and 3rd parties.

You can retain your logs and can specify retention period by log group (logical grouping of related logs)
You can query your log data using CloudWatch Logs Insights

CloudWatch Alarms

You can create CloudWatch alarms that monitor specific CloudWatch metrics and then trigger notification when specific threshold is breached.

Metric Alarm watches a single CloudWatch metric for a value or calculated value
Composite Alarm works based on a rule expression that considers alarm states of multiple alarms
Alarm history is available for 14 days

Configuring an Alarm requires following settings

Period – expressed in seconds, is the length of the time to evaluate the metric or expression for each data point
Evaluation Period – is the number of most recent periods, or data points, to evaluate when determining alarm state
Datapoints to Alarm – is the number of data points within the Evaluation periods that must be breaching to cause the alarm to go to the ALARM state.
Additionally, you can specify how to treat missing data points when evaluating an alarm.

Alarm States

OK – the metric or expression is within the defined threshold
ALARM – the metric or expression is outside of the defined threshold
INSUFFICIENT_DATA – the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the state

CloudWatch Events (CWE)

CloudWatch Events is a stream of system events describing changes in your AWS resources.

This is in addition to existing CloudWatch Metrics and Logs from these resources
Currently only these resources are supported:
- EC2, Auto Scaling, and CloudTrail
- Also, via CloudTrail, mutating API calls (that is, calls other than Describe, List, and Get) across all services are also visible in CloudWatch Events
You can create rules to trigger actions based on specific CloudWatch Events

Multi-dimensional usage of Amazon CloudWatch

Collect

Logs – three primary categories
- Vended Logs – natively published logs (currently only from VPC Flow Logs and Route 53)
- (AWS Service) Logs – published by AWS Services (fair number of AWS services support this)
- Custom Logs – published by your applications and resources from within AWS environment, or from on-premises (via CloudWatch Agent or API)
Metrics – most of AWS services support capturing of key metrics (specific to that service)
Custom Metrics – from your applications and resources

Monitor

CloudWatch Dashboards enable customizable visual playground to view metrics and logs for easy analysis
CloudWatch Alarms enable setting thresholds based triggers and actions on metrics
Container Insights enable automatic dashboards for various metrics of deployed containers
CloudWatch Anomaly Detection enables use of machine-learning algorithms to analyze collected metrics and trigger actions
CloudWatch ServiceLens enhances the observability of your services and applications by enabling you to integrate traces, metrics, logs, and alarms into one place.
CloudWatch Synthetics allow you to create scripts (called canaries) that run on a schedule mimicking your customer actions to monitor your endpoints and APIs
- Canaries are Node.js scripts that run as Lambda functions

Act

Auto Scaling can be triggered based on CloudWatch alarms
CloudWatch Events can trigger actions enabling automation

Analyze

CloudWatch metrics data can be analyzed in almost real-time, or you can analyzed months-worth of captured data for seasonality trends
CloudWatch Logs Insights enable customized queries with aggregations, filters, and regular expressions to gain useful insight from your captured log data

Notes on Monitoring vs Observability:

Monitoring is focused on operations (of an application, a resource, or interaction) to determine state (good / bad / warning) or to detect behavioral deviation
- Focuses on the State (and its variations), thus focusing on the “effects”
Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.
- Focuses on influencers of the State, thus enabling focus on the “causes”

Pricing

Fair amount of CloudWatch related metrics, alarms, etc., are covered in its Free Tier- see below.

Metrics	Basic Monitoring Metrics (at 5-minute frequency) 10 Detailed Monitoring Metrics (at 1-minute frequency) 1 Million API requests (not applicable to GetMetricData and GetMetricWidgetImage)
Dashboard	3 Dashboards for up to 50 metrics per month
Alarms	10 Alarm metrics (not applicable to high-resolution alarms)
Logs	5GB Data (ingestion, archive storage, and data scanned by Logs Insights queries)
Events	All events except custom events are included
Contributor Insights	1 Contributor Insights rule per month The first one million log events that match the rule per month
Synthetics	100 canary runs per month

Please visit following page to see detailed pricing for usage beyond (above-mentioned) free tier:

AWS CloudWatch Pricing

External Resources

October 22, 2020

Every Bit Cloud

Key Points

Key Components of Amazon CloudWatch

Multi-dimensional usage of Amazon CloudWatch

Pricing

External Resources

Related Posts

Understanding AWS CloudFormation

AWS CloudTrail

EC2 Launch Configuration and Launch Template