Overview of Amazon EC2 Auto Scaling

Amazon EC2 Auto Scaling is a fully managed service that enables your application to maintain availability (with desired performance) by automatically scaling up (add instances) or scaling down (terminate instances) per your defined configurations.


Benefits of EC2 Auto Scaling

  • Increased application availability (at or near desired performance)
  • Enhanced fault tolerance – through health checks and replacements
  • Optimized Cost – by keeping the capacity near needs and not at the peaks
  • You can (and mostly would) use it with Elastic Load Balancer


Key Terms for EC2 Auto Scaling


Fleet Management

  • Fleet is the term used for set of EC2 instances your application runs on.
  • Fleet management maintains the set of EC2 instances at the desired capacity, and continuously checks for instances’ health and replace the unhealthy ones with health instances


Scheduled Scaling

  • Schedule Scaling allows you to scale up or down as a function of time and date.
  • It is useful when you know your periodic traffic variation (e.g., your site gets 200% spike at 8 am on Mondays)


Dynamic Scaling

  • Dynamic Scaling enables automatic up or down scaling of instances, based on your defined configurations (e.g., based on thresholds of CPU utilization increase or decrease a specific number of instances)


Predictive Scaling

  • Predictive Scaling predicts future traffic and accordingly provisions the number of EC2 instances.
  • This service leverages Machine Learning to analyze patterns. A good prediction needs up to two weeks of historical data, but starts generating predictive scaling with as little as a day’s worth of data.
  • It works best for sites that have periodic fluctuations in traffic. It does not work well for totally random spikes.
  • Once the initial set of predictions have been made and the scaling plans are in place, the plans are updated daily and forecasts are made for the following 2 days.
    • That is – every 24 hours, Predictive Scaling forecasts 48 hours into the future, and schedules capacity changes for those 48 hours.
  • Predictive scaling maintains the minimum capacity based on historical demand; this ensures that any gaps in the metrics won’t cause an inadvertent scale-in.


Auto Scaling Group (ASG)

  • EC2 ASG is a set of EC2 instances that share similar characteristics, and this set is treated as a logical group by fleet management and dynamic scaling functions.
  • A Launch Configuration or Launch Template is used to define the characteristics of this group of EC2 instances.
  • ASG can span AZs but stay within a specific Region
  • You can leverage Lifecycle Hooks to perform activities at launch (such as key software install) as well as terminate (such as download of log files) time


Scaling Policies for EC2 Auto Scaling

Amazon EC2 Auto Scaling supports three main scaling policy types:

Target Tracking Scaling

  • Increase or decrease the current capacity based on a target value of a specified load metric (such as CPU utilization, etc.).
  • It works like a thermostat – you don’t specify how much to increase or decrease at a give time, rather a target, and then this feature does the rest to conform to your specific load metric.


Simple Scaling

  • Increase or decrease the current capacity based on a single scaling adjustment
  • You create CloudWatch alarm to specify the high and low thresholds, and also the number of instances to add or remove in each threshold breach
  • Every time a change is triggered, the subsequent change can only be triggered after the previous scaling activity or health check replacement has completed and the cooldown period has expired
    • Intent of Cooldown period is to allow observing effect of previous change before taking next action
    • Cooldown period can, however, result in unnecessary delay of scaling-up / down in cases where the underlying threshold was breached by considerable metric value change


Step Scaling

  • Increase or decrease the current capacity based on a set of scaling adjustments, known as step adjustments, that vary based on the size of the alarm breach.
  • It differs from Simple Scaling in one key sense – the size of increase or decrease is made based on the size of the breach. You specify
    • A lower bound for the metric value
    • An upper bound for the metric value
    • The amount by which to scale, based on the scaling adjustment type
  • This Scaling helps avoid the unnecessary delay due to cooldown period that happens in case of Simple Scaling


Functioning of Scaling Policies

  • A Scaling Policy tells EC2 Auto Scaling to monitor a specific CloudWatch metric, and what action to take when the associated CloudWatch alarm is triggered
  • EC2 Auto Scaling ensures that Scaling Policy calculations do not trigger actions that would result in pushing the capacity outside of minimum and maximum size limits



There is no extra charge for using EC2 Auto Scaling. You are charged based on the underlying resources (such as EC2, CloudWatch, Notifications, etc.)

Cost Optimization

  • You can create ASG to scale EC2 capacity across different EC2 instance types, AZs, and On-demand / Reserved Instances / Spot Instances to maximize the cost benefits


External Resources