Introduction

As businesses and applications grow, the demand for computing power fluctuates. Traditionally, managing this involved investing in expensive hardware and requiring manual interventions to scale. However, with cloud computing, you now have the ability to dynamically scale resources based on real-time needs. Auto Scaling ensures that your applications remain available and responsive, optimizing both cost and performance by automatically adjusting compute resources to handle varying traffic loads.

A key feature of AWS’s elasticity is the ability to use Auto Scaling Groups (ASGs) to automatically add or remove compute resources based on demand. This allows your applications to stay highly available even during traffic spikes while also scaling down during periods of low activity to save costs.

In this tutorial, you'll set up the foundation for a scalable architecture by creating a Virtual Private Cloud (VPC) with three public subnets spread across different availability zones (AZs). This ensures high availability by distributing resources across isolated locations, reducing the risk of downtime due to failures in a single zone. After configuring the VPC, you'll introduce an Auto Scaling Group and an Application Load Balancer (ALB) to distribute traffic across multiple EC2 instances, ensuring that your infrastructure dynamically adapts to changes in traffic demand.

By the end of this guide, you will learn how to:

Create a VPC with public subnets in different availability zones for high availability.
Launch EC2 instances within those subnets running a web application (Apache web server).
Set up an Application Load Balancer (ALB) to distribute traffic across instances.
Configure Auto Scaling Groups (ASG) to automatically scale instances based on CPU utilization.
Test Auto Scaling by simulating high traffic to observe how your infrastructure adapts.

Requirements

Before getting started, ensure that you have the following prerequisites in place to successfully implement Auto Scaling and Load Balancing for your web application. First, you'll need an active AWS account, which you can set up by creating a free-tier AWS account. The free-tier provides limited, cost-free access to various AWS services that will be used in this guide. Additionally, it's helpful to have a basic understanding of AWS console to follow along with the configurations more effectively. Finally, for visualizing your infrastructure setup, you can use draw.io, an easy-to-use online tool for creating architecture diagrams that can help you map out your environment.

Architecture Overview

Before diving into the setup, here’s a high-level diagram of the architecture you’ll be building:

Fig: Architecture Diagram of the Scalable Web Application

The components of the architecture include:

A VPC with a CIDR block of 10.0.0.0/16: This provides an isolated network environment for your resources in AWS.
Three Public Subnets: Each subnet resides in a different availability zone (AZ) to ensure high availability and fault tolerance.
- Public Subnet 1: 10.0.10.0/24
- Public Subnet 2: 10.0.20.0/24
- Public Subnet 3: 10.0.30.0/24
An Internet Gateway (IGW): Allows instances within the public subnets to access the internet.
A Route Table: Configures routing for internet-bound traffic from the subnets to the Internet Gateway.
EC2 Instances: These instances will run an Apache web server, deployed across the public subnets to handle web traffic.
An Auto Scaling Group (ASG): Automatically adjusts the number of EC2 instances based on traffic demand, using CPU utilization as a scaling metric.
An Application Load Balancer (ALB): Distributes incoming traffic across the EC2 instances to ensure load balancing and even request distribution.
A Target Group: Registers the EC2 instances behind the ALB and defines health check parameters, ensuring that only healthy instances receive traffic.

Step 1: Creating the VPC

Accessing the VPC Dashboard

To begin, you’ll create a VPC based on the architecture diagram above.

Log into AWS and navigate to the VPC Dashboard by selecting Services > VPC.
On the left-hand menu, click Create VPC.

Fig: VPC Dashboard

Step 2: Configure the VPC

You’ll now configure the VPC to form the core network environment for your resources.

On the Create VPC page, select the VPC and more option.
Under VPC settings:
- Name Tag: Enter a name for your VPC (e.g., WebappVPC).
- IPv4 CIDR Block: Enter 10.0.0.0/16 to give the VPC 65,536 IP addresses.
- IPv6 CIDR Block: Leave this option disabled unless IPv6 is required.
- Tenancy: Select Default unless dedicated hardware is needed.

Fig: VPC Configuration

Step 3: Configure Subnets

Next, you will configure three public subnets, each in a different availability zone for high availability.

Number of Availability Zones (AZs): Choose 3 to ensure high availability.
Number of Public Subnets: Enter 3.
Customize Subnet CIDR Blocks:
- Public Subnet 1: 10.0.10.0/24 in us-east-1a
- Public Subnet 2: 10.0.20.0/24 in us-east-1b
- Public Subnet 3: 10.0.30.0/24 in us-east-1c

These subnets will reside in different availability zones to ensure high availability across your infrastructure.

Fig: Subnet Configuration

Step 4: Configure Additional Resources

Route Table and Internet Gateway

When using the VPC wizard (the VPC and More option), AWS automatically handles the creation of route tables and an Internet Gateway. The wizard not only generates the required route table and Internet Gateway for your VPC but also sets up the necessary routes for internet-bound traffic. This ensures that instances in the public subnets can communicate with the internet.

Step 5: Review and Launch

After configuring the VPC, subnets, Internet Gateway, and route table, you’re ready to launch the VPC.

Review all the configurations on the Summary page.
Click Create VPC to launch the network environment.

Fig: VPC Creation Workflow and Preview

Your VPC is now created and ready for use, laying the foundation for a highly available, scalable web application architecture.

Section 2: Deploying an EC2 Instance

With the VPC setup complete, it's time to launch EC2 instances into one of the public subnets. These instances will serve as web servers for hosting your applications, allowing you to test connectivity and functionality within your newly created VPC.

Step 1: Launch an EC2 Instance

To launch an EC2 instance, follow these steps:

1. Navigate to the EC2 Dashboard

Log into the AWS Management Console.
Select EC2 from the list of services.
Click on Launch Instance.

Fig: EC2 Dashboard

2. Configure the Instance Details

Name and Tags: Assign a meaningful name to your instance (e.g., WebServer).
AMI Selection: Choose the Amazon Linux 2023 AMI. Amazon Linux is free-tier eligible and supports package management through yum.
Instance Type: Select t2.micro, which is free-tier eligible and suitable for testing and small applications.

Fig: Instance Type Selection

Step 2: Key Pair Setup

A key pair is required to securely access your instance via SSH. If you don’t have an existing key pair, create a new one:

Key Pair: Choose an existing key pair or create a new one.
- If creating a new key pair:
  - Enter a name for the key pair.
  - Download the .pem file and store it securely. You will need this file to SSH into your EC2 instance later.

Fig: Key Pair Setup

Step 3: Configure Network Settings

VPC: Select the VPC you created in the previous section (e.g., WebappVPC).
Subnet: Choose Public Subnet 1 (e.g., 10.0.10.0/24). This ensures that your instance has internet access via the Internet Gateway.
Auto-assign Public IP: Ensure that this option is enabled. This automatically assigns a public IP to the instance, which is essential for testing your web server over the internet.

Fig: Network Settings

Step 4: Configure Security Groups

Security groups control inbound and outbound traffic to your instance. Here's how to configure them:

Create or Choose a Security Group:
- Under the Network Settings section, select Create security group.
- Name it webapp-security-group.
- Inbound Rules:
  - Allow SSH (Port 22): Enables SSH access to your instance.
  - Source: For enhanced security, restrict to your IP address. For learning purposes, you can set it to 0.0.0.0/0 to allow access from anywhere.
  - Allow HTTP (Port 80): Permits web traffic to your Apache server.
  - Source: Set to 0.0.0.0/0 to allow access from the internet.
Outbound Rules: By default, the security group allows all outbound traffic. No changes are necessary unless specific outbound restrictions are required.

Fig: Security Group Setup

Step 5: User Data Configuration

Add the following User Data script under Advanced Details. This script will automatically install and start the Apache web server when the instance launches:

#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo "<h1>Hello world! This is $(hostname -f)</h1>" > /var/www/html/index.html

This script:

Updates the instance's packages.
Installs the Apache web server (httpd).
Starts the Apache service and enables it to start automatically on boot.
Creates a simple HTML page that displays the instance's hostname.

Fig: User Data Setup

7. Launch the Instance: After configuring the above settings, click Launch Instance.

Fig: Launching EC2 Instance

8. Testing the EC2 Instance

Once the instance has launched, you can test the setup by selecting the WebServer instance and copying its public IP address.

Testing the Apache Web Server:
- Open a web browser and navigate to the instance's public IP.
- You should see the message: Hello world! This is <hostname> displayed on the webpage, confirming that Apache is running and serving the HTML file created using the User Data script.

Fig: Testing Apache Web Server

Summary of What We've Done So Far

At this stage, we've successfully set up a single EC2 instance running an Apache web server within a public subnet of our VPC. By accessing this instance through its public IP, we confirmed that the web server is running and serving content. However, all incoming traffic currently hits just this one instance, meaning that if demand increases beyond its capacity, performance could degrade, or the server might even become unresponsive.

To address this, the next step is to set up Auto Scaling. This will allow us to automatically launch new EC2 instances as demand increases and terminate them when traffic reduces, ensuring that our application remains responsive and cost-efficient. We'll also configure an Application Load Balancer (ALB) to distribute traffic across multiple instances, ensuring even distribution and redundancy.

Section 3: Setting Up Auto Scaling and Load Balancing

Now that you’ve successfully deployed an EC2 instance, the next step is to implement Auto Scaling and an Application Load Balancer (ALB). This will ensure that your architecture dynamically scales in response to traffic demands, maintaining optimal performance and fault tolerance.

We’ll start by creating the Application Load Balancer (ALB), which will distribute incoming traffic across multiple instances. Then, we’ll configure the Auto Scaling Group (ASG) to automatically add or remove instances based on CPU utilization. For this tutorial, we'll use the default region us-east-1.

Setting Up Auto Scaling

1. Create a New Target Group

The ALB uses a Target Group to route traffic to specific EC2 instances. Let’s set one up:

Target Group Name: Enter a name for the target group (e.g., WebApp-TG).
Target Type: Select Instance.
Protocol: Choose HTTP.
Port: Enter 80 (for HTTP traffic).
VPC: Choose the same VPC (WebAppVPC).
Health Checks: Leave the default settings. The ALB will regularly check the health of your instances to ensure that only healthy instances receive traffic. Click Next and proceed to add targets.

Fig: Configuring Target Group

2. Register Targets

Register Existing Instances: Manually register your existing webapp-server EC2 instance. The Auto Scaling Group (ASG) will handle this automatically in the future. Click Next to proceed to registering targets and creating the Target Group.

Fig: Registering Existing Instance to Target Group

3. Review and Create

Review your Target Group configuration.
Click Create Target Group to finalize the setup.

Fig: Target Group Dashboard

Now that your Target Group is created, you can proceed to create your load balancer, which will route traffic to the instances in your Target Group.

Setting Up an Application Load Balancer (ALB)

An ALB is essential for distributing traffic across multiple instances, ensuring that no single instance becomes overwhelmed. Here’s how to set it up:

1. Access the EC2 Dashboard

In the AWS Management Console, go to the EC2 Dashboard.
On the left-hand side, under Load Balancing, select Load Balancers.

2. Create a New Load Balancer

Click on Create Load Balancer and select Application Load Balancer.

Fig: Creating an Application Load Balancer

3. Configure Load Balancer Settings

Name: Enter a name for your load balancer (e.g., WebApp-ALB).
Scheme: Choose Internet-facing to make the ALB accessible over the internet.
IP Address Type: Select IPv4.
VPC: Choose the VPC you created earlier (e.g., WebAppVPC).
Availability Zones: Select the availability zones and subnets where your EC2 instances are deployed (in this case, the three public subnets you created: 10.0.10.0/24, 10.0.20.0/24, and 10.0.30.0/24).

Fig: Configuring Load Balancer Settings

4. Configure Security Settings

Since you’re working with HTTP (port 80), there’s no need to configure an SSL certificate. You can skip this section for now. However, in production environments, you should consider enabling SSL for secure connections (HTTPS).

5. Configure Security Groups

Select or create a security group for your ALB. Ensure that it allows inbound HTTP (port 80) traffic from anywhere (0.0.0.0/0).

Once your ALB is created, it will start listening for HTTP requests and distributing them across instances via the Auto Scaling Group. However, before that, you need to create an AMI for your launch template that you will use in setting up your Auto Scaling Group.

Creating an AMI from Your Running EC2 Instance

Go to EC2 Dashboard: Navigate to Instances and select your running instance.

2.1. Actions: Click on Actions > Image and templates > Create Image.

Fig: Creating an AMI

2.2. Image Name: Provide a name for your AMI (e.g., webapp-server-image).

2.3. Instance Volumes: Review the instance storage configuration.

2.4. Create Image: Click Create Image to start the process.

2.5. Find AMI ID: Once the image is created, go to AMIs in the EC2 Dashboard to find your new AMI ID. You'll use this ID when configuring your launch template.

This AMI will be the base image for your Auto Scaling instances. Now that you have the AMI, you can continue to create the Launch Template for your Auto Scaling Group.

Creating a Launch Template for Auto Scaling

Go to EC2 Dashboard: Navigate to Launch Templates and click Create Launch Template.

2.1 Launch Template Name: Enter a name (e.g., WebApp-LT).

Fig: Launch Template Creation

2.2 AMI ID: Go to My AMIs and enter the AMI ID you created earlier (WebApp-webserver-image).

Fig: AMI ID Selection

2.3 Instance Type: Select an instance type (e.g., t2.micro).

2.4 Key Pair: Choose your existing key pair for SSH access (webapp-key-pair).

Fig: Key Pair Selection

2.5 Security Groups: Attach the security group created earlier for your EC2 instances (WebApp-Security-Group).

Fig: Security Group Attachment

2.6 Auto-assign Public IP: Under Advanced Network Configurations, enable public IP assignment.

Fig: Auto-assign Public IP

2.7 User Data: Add the user data for the instances that will be created by the auto scaling group. In this demo we are adding the same user data as our existing webserver.

Fig: Launch Template User data

2.8 Create Template: Click Create Launch Template.

This template will serve as the blueprint for launching instances in your Auto Scaling Group.

Step 2: Configuring Auto Scaling Groups (ASG)

Next, we’ll set up the Auto Scaling Group using our launch template to automatically adjust the number of EC2 instances based on CPU utilization, ensuring the application can handle traffic surges.

1. Navigate to Auto Scaling

In the AWS Management Console, go to EC2 Dashboard, and on the left-hand menu, under Auto Scaling, click on Auto Scaling Groups.

2. Create Auto Scaling Group

Click on Create Auto Scaling Group.

3. Configure Auto Scaling Group Settings

Auto Scaling Group Name: Enter a name for the group (e.g., WebApp-ASG).

Fig: Auto Scaling Group Settings

4. Configure Network Settings

VPC: Select your VPC (WebAppVPC).
Subnets: Choose the three public subnets you created (10.0.10.0/24, 10.0.20.0/24, and 10.0.30.0/24).
Fig: Auto Scaling Group Network Settings

5. Attach the Load Balancer

Select Attach to an existing load balancer.
Choose the Application Load Balancer created earlier (WebApp-ALB).
Select the Target Group (WebApp-TG) created for the ALB.
Turn on Elastic Load Balancing health checks.
Under Additional settings Enable default instance warmup of 60sclick Next to continue.

Fig: Attach Load Balancer to Auto Scaling Group

6. Configure Instance Numbers

Desired Capacity: Set to 1 (this means there will always be at least 1 instances running).
Minimum Capacity: Set to 1 (this means the ASG will never terminate the last running instance).
Maximum Capacity: Set to 4 (this allows Auto Scaling to scale up to 4 instances if needed).

Fig: Configuring Instance Numbers

7. Configure Scaling Policies

Now we’ll configure the scaling policies that will trigger instance creation or termination based on CPU utilization.

Select Target Tracking Scaling Policy.
- Policy Type: Choose Target tracking scaling policy.
- Metric type: Average CPU utilization
- Target Value: Enter 25% (this means new instances will be launched when average CPU utilization exceeds 25%, and instances will be terminated when it's below 25%). Leave any other setting at default and continue.

Fig: Configuring Auto Scaling Policies

8. Review and Create

Review the configuration, then click Create Auto Scaling Group.
Fig: Auto scaling group dashboard

The ASG will now automatically adjust the number of instances based on CPU utilization and distribute traffic across them using the ALB. We can evaluate this by testing the auto scaling using CPU stress.

Section 4: Testing Auto Scaling

To verify that your Auto Scaling setup is functioning correctly, you can simulate high traffic or load on the instances to trigger the creation of new instances.

Step 1: Connect to an EC2 Instance

SSH into Your EC2 Instance using EC2 Connect or the AWS CLI with the following command:
```
ssh -i /path/to/your-key.pem ec2-user@<Public-IP-of-EC2-Instance>
```

Step 2: Install the Stress Tool

Once connected, install the stress tool to simulate high CPU load:
```
sudo yum install -y stress
```

Fig: Installing the Stress Tool

Step 3: Simulate High CPU Load

Run the stress command to simulate high CPU load for a set period (e.g., 100 seconds):
```
stress -c 1 -i 1 -m 1 --vm-bytes 128M -t 100s
```

Fig: Simulating High CPU Load

This command generates CPU load on one core (since t2.micro has just 1 vCPU) for 100 seconds. As a result, the CPU utilization should exceed the 25% threshold set in the Auto Scaling Group (ASG), triggering the ASG to launch new instances to handle the increased load.

Fig: Auto Scaling Group Scaling Activity

Step 4: Verify Auto Scaling

In this section, we will verify that Auto Scaling is functioning as expected by simulating a CPU load, monitoring the Auto Scaling Group (ASG) activity, and checking the Load Balancer's traffic distribution across instances.

1. Monitor Auto Scaling Group Activity

Once you've triggered a high CPU load using the stress tool, Auto Scaling should kick in when the CPU utilization exceeds the threshold of 25%. You can monitor the ASG activity to confirm that new instances are being launched.

Navigate to Auto Scaling Groups in the AWS Management Console.
Select your Auto Scaling Group (WebApp-ASG) and go to the Activity tab.
You should observe scaling activities showing that new instances are being created because the CPU utilization exceeded the threshold.

Fig: Auto Scaling Group Activity Triggered by High CPU Usage

This activity confirms that Auto Scaling is working as intended, scaling the number of instances up in response to increased CPU usage.

2. Monitor Instance CPU Usage

Next, we can validate the specific CPU spike caused by the stress tool. The CPU usage of the instance will show a spike above 25%, which is what triggered the Auto Scaling.

Navigate to the EC2 Dashboard and select the instance running the stress test.
Go to the Monitoring tab and view the CPU utilization metrics.
You should observe a CPU spike to 40.9% during the stress test, which caused Auto Scaling to add additional instances.

Fig: CPU Usage Spike to 40.9% Due to the Stress Test

This shows that the instance was overloaded, which triggered the ASG to create additional instances.

3. Verify Load Balancer Traffic Distribution

Once multiple instances are running in the ASG, the Load Balancer should evenly distribute traffic across them. You can confirm this by checking the IP addresses the Load Balancer directs traffic to.

Open a Web Browser and navigate to the DNS name of your Application Load Balancer (ALB).
Refresh the page multiple times and observe the IP addresses assigned to the different EC2 instances.
You should see the IP address change between two or more instances, confirming that the Load Balancer is properly distributing traffic across multiple servers in the target group.

Fig: Load Balancer Distributing Traffic Across different Instances running the WebApp

This demonstrates that the Load Balancer is effectively balancing incoming requests across the instances, ensuring high availability.

4. Terminate Instances and Verify Auto Scaling

To further test the ASG's functionality, you can terminate or stop all running instances and observe how the ASG automatically replaces them to maintain the desired number of instances.

Navigate to the EC2 Dashboard and select all instances.
Terminate or stop all the instances.

Once terminated, the Auto Scaling Group will detect that no instances are running and automatically launch a new one to maintain the minimum instance count.

Go back to the Auto Scaling Group Activity in the EC2 dashboard to confirm that the ASG is performing checks and launching new instances.

Fig: All running Instances manually stopped or Terminated

Fig: New Instance Created by ASG After Termination

This confirms that the ASG is functioning correctly by ensuring that there is always at least one instance running, even if instances are terminated or fail.

This concludes the verification of your Auto Scaling Group, Load Balancer, and EC2 instances. You've successfully demonstrated the dynamic scaling of instances based on CPU usage, proper traffic distribution across multiple instances, and the resilience of your setup in automatically recovering from instance terminations.

Conclusion

In this tutorial, you've successfully set up a highly available and scalable web application architecture on AWS. By creating a Virtual Private Cloud (VPC) with three public subnets across different availability zones, deploying EC2 instances running an Apache web server, and configuring an Application Load Balancer (ALB) and Auto Scaling Group (ASG), you've ensured that your application can dynamically scale to handle varying traffic loads while optimizing costs and maintaining performance.

Key Takeaways

High Availability: Distributing resources across multiple availability zones ensures that your application remains available even if one zone experiences failures.
Scalability: Auto Scaling automatically adjusts the number of EC2 instances based on demand, ensuring your application can handle traffic spikes and scale down during low usage periods.
Load Balancing: The ALB efficiently distributes incoming traffic across multiple instances, preventing any single instance from becoming a bottleneck.
Cost Optimization: By only using the necessary resources, you optimize costs while maintaining performance and availability.

Important: Clean Up Your Environment

To avoid incurring unnecessary charges, it's essential to clean up the resources you created for this tutorial once you've completed the setup and testing. Follow these steps to terminate resources:

Terminate EC2 Instances:
- Navigate to the EC2 Dashboard.
- Select all instances created for this tutorial.
- Click Actions > Instance State > Terminate.
Delete Auto Scaling Group:
- Navigate to Auto Scaling Groups in the EC2 Dashboard.
- Select the ASG (WebApp-ASG).
- Click Delete.
Delete Load Balancer and Target Group:
- Navigate to Load Balancers.
- Select the ALB (WebApp-ALB).
- Click Actions > Delete.
- Navigate to Target Groups.
- Select the Target Group (WebApp-TG).
- Click Actions > Delete.
Delete VPC:
- Navigate to the VPC Dashboard.
- Select the VPC (WebAppVPC).
- Ensure all associated resources (subnets, gateways, route tables) are deleted.
- Click Actions > Delete VPC.

By cleaning up your environment, you ensure that you won't be billed for resources you no longer need, making your learning experience both effective and cost-efficient.

This completes the setup of an Auto Scaling Group with an Application Load Balancer to dynamically handle traffic spikes for a web application.

Resources

Here are some helpful resources for further reading on the topics covered in this post:

Step-by-Step Guide to EC2 Auto Scaling and Load Balancing on AWS