AWS Auto Scaling Strategies in High-Load Systems

Businesses usually strive for a growing number of users and requests. However, heavy loads may become an issue for your product or online service. Sudden and drastic load spikes can cause crashes and downtime that may lead to reputational damage and user dissatisfaction — and may potentially cause significant financial losses. This is why any software should have an efficient scaling strategy in place.

Creating a flexible scaling mechanism is quite challenging, as the load may be unpredictable. At Apriorit, we have tested and deployed dozens of products with various load requirements and patterns and discovered that the most reliable way to mitigate security risks for high-load systems is to implement autoscaling.

In this article, we use a practical example to compare three AWS-based scaling services by cost and performance. Then, we compare them side by side so you can easily determine which service is best for your product.

This article will be useful for businesses that plan to launch a new cloud application and need to choose a perfect scaling service for managing unpredictable or high loads.

The importance of software scaling for businesses

A high-quality software product should be able to handle an increasing volume of users, requests, or data without compromising performance, reliability, or cost-efficiency. Proper scaling is essential to ensure that your solution can grow to meet demand and adapt to varying workloads.

To implement scaling, you need to:

Analyze current workloads and performance characteristics
Set scaling metrics like CPU usage or request count that will trigger scaling actions
Create scaling policies that align with your application’s requirements, considering factors like response time objectives and cost constraints
Test scaling configurations under various scenarios to ensure they perform as expected
Monitor your application’s performance and adjust scaling policies as needed to optimize resource allocation

If your product has a high or unpredictable load, performing these tasks manually is quite challenging. Traffic spikes are sudden, and human intervention is often slow and error-prone. Additionally, manual scaling often requires over-provisioning resources to handle unexpected spikes, leading to wasted resources and increased costs. That’s why automatic scaling, or autoscaling, is a more suitable approach.

Autoscaling is a cloud computing technique that automatically adjusts the number of resources allocated to an application based on changes in demand. Autoscaling is better than traditional manual scaling because it offers real-time adaptability to demand changes, ensuring optimal resource utilization, cost savings, and high availability.

Need to build a scalable infrastructure?

Our team of experts can create a reliable and secure infrastructure that can meet any traffic demand.

Using AWS for autoscaling high-load systems

The largest and most popular cloud providers like Google, IBM, Amazon, Oracle, and Microsoft offer automation features, allowing you to set up autoscaling in your product. In this article, we focus on Amazon Web Services (AWS) and review the possibilities they offer to businesses.

AWS offers several important benefits, including:

A wide range of services within the AWS ecosystem
Global resource distribution
Robust security features
Cost optimization tools and pay-as-you-go pricing

There are multiple types of autoscaling in AWS, reflected in different services. For this article, we’ve chosen the following AWS autoscaling services:

AWS Lambda
Auto Scaling EC2
Warm pool Amazon EC2 Auto Scaling

aws services with autoscaling functionality

In our practical example, we will test the AWS autoscaling methods and perform calculations using the API server under these conditions:

30 million requests per month
30 seconds average request processing time
200 parallel requests from users (maximum)

We will calculate only the cost of capacity, without taking into account the use of other AWS services. Let’s learn about these services and their performance.

1. AWS Lambda

AWS Lambda is a serverless, event-driven computing service that allows you to run code for virtually unlimited types of applications or services without provisioning and maintaining servers.

This service is used to quickly deploy and scale applications on AWS using horizontal scaling. As traffic increases, Lambda increases the number of concurrent functions. You can customize the parallelism and increase it up to 1,000 units. If you want to increase this value even further, you need to request a limit increase from AWS support.

AWS Lambda functions have access to various AWS services through the AWS SDK. This SDK allows your Lambda functions to interact with other AWS services, such as Amazon S3 for storage, Amazon DynamoDB for databases, Amazon SQS for message queues, and many more.

The advantage of AWS Lambda is that you pay only for used Lambda resources and AWS services that are connected to it.

We will call a Lambda function and integrate it with the Lambda API server to handle incoming requests.

The average time to fulfill a user request is 30 seconds. To calculate the price, we use the basic AWS calculator.

According to the calculator, we will pay $1,874 to process 30 million monthly requests. Note that this only includes the costs of using the Lambda service.

Advantages of AWS Lambda include:

Fully autonomous system
Fast scaling speed
Payment only for used capacity

However, there are some downsides as well:

Need to constantly monitor the number of concurrent units
Significant cost increase for hosting servers with long query execution times

Lambda is serverless and executes events and requests as short-lived functions. This makes it different from another service we’ll talk about.

2. Auto Scaling EC2

Amazon Elastic Compute Cloud (EC2) offers a deep and broad computing platform with over 600 instances and a suite of the latest processors, storage, networking, operating systems, and shopping models. This allows EC2 to meet any system’s needs regardless of workload. It also provides features like AWS predictive scaling, which uses machine learning to predict changes in demand.

We’ll host our API server on EC2 instances that support the Auto Scaling option, allowing us to scale our system horizontally.

To scale our system, we added a load balancer to the algorithm that distributes requests among instances.

During testing, we found that T4g series EC2 instances were suitable for hosting our API. This type of instance handles 15–20% more requests than other types of instances that we tested throughout this article. For our experiment, we chose t4g.large — the specific instance type within the T4g series that has two CPUs and 8GB of memory. It can handle about 400 requests at a time, so to cover 30 million requests per month, we need two to three instances of this type.

During load hours, the instances scale up, and during idle hours only one instance is running to optimize costs. When configuring Auto Scaling Group, you can specify the minimum and maximum number of instances that can run depending on your traffic.

There are two AWS autoscaling strategies: by the number of requests and by the level of CPU utilization. Let’s see how this system will scale in these two instances.

EC2 Auto Scaling by number of requests

To test this AWS autoscaling strategy, we configured the system so that if there are 300 requests per instance, another instance will come online within a minute. Scaling took us about four minutes. After about one minute, we got an alarm, and the system started scaling. It took three minutes to scale up and add an instance.

We also noticed an issue: this type of system can start scaling even under a bot attack since all requests that go to our route go to the load balancer. After that, we tried scaling by CPU utilization.

EC2 Auto Scaling by CPU utilization

To set up automatic scaling by CPU utilization, we chose the following settings:

>=60% CPU load – 1 EC2 instance
>=70% CPU load – 2 EC2 instances
>=90% CPU load – 3 EC2 instances
<40% CPU load – all extra instances are disabled

In this case, we have a minimum amount of instances that work at all times, and depending on the CPU load, extra instances will be added.

When requests to our API start coming in and the load increases, it takes a minute and a half for the system to trigger an Alarm and start scaling up. The next instance was up and running in three minutes. Given this, scaling by CPU utilization looks more predictable and stable.

These are the cost calculations for this implementation per month:

Application Load Balancer — starting from $16.50
EC2 t4g.large — $48.50 for one instance

The total monthly capacity costs for EC2 Auto Scaling by CPU utilization are $119.50 at the time of writing. You can check the current price using the AWS calculator.

Although this implementation is quite cost-efficient, it has a significant downside: the scaling time. If the number of requests suddenly outgrows the capacity of one instance, it won’t have time to process them all. As a result, we’ll need to cancel requests so they don’t overload the system.

This means that users won’t be able to access the service and execute requests during the four minutes needed for scaling — and this can significantly harm the service’s reputation, as users expect the service to work at all times.

To sum up, the advantages of EC2 Auto Scaling are:

Autonomous and stable work
Cost-efficiency

The single and largest disadvantage of this configuration is the long scaling time. To solve this challenge, we can use a scaling acceleration feature within Amazon EC2 Auto Scaling called warm pool. Let’s calculate the cost and speed of this implementation next.

3. Warm pool Amazon EC2 Auto Scaling

A warm pool allows you to maintain a pool of pre-initialized EC2 instances that are ready to be brought into service quickly when needed. When an application needs to scale, the Auto Scaling group can engage the warm pool to provide the required capacity. This ensures that instances are ready to start serving application traffic quickly, which speeds up the response to a scaling event.

Pre-initialized instances can be started much faster because they are in a hibernated state and their memory contents are not deleted. While instances are hibernating, you pay only for Elastic Block Storage (EBS) volumes, including storage for RAM contents and elastic IP addresses attached to instances.

The default warm pool size is calculated as the difference between the maximum and desired capacity of the Auto Scaling Group.

To work with a warm pool, we will use the CPU-based scaling algorithm.

Instead of T4g, we’ll use T3a.large, as it’s more suitable for the warm pool option. Our test showed that after triggering the alarm, the instance was up in 25 to 35 seconds, which is faster than with the usual Auto Scaling without the warm pool option.

At the time of writing, the approximate cost of capacity for warm pool Amazon EC2 Auto Scaling is $163 a month, plus minor expenses for storage and an elastic IP. The cost is higher than autoscaling without a warm pool because t3a.large processes 15–20% slower than t4g, so you have to use one more EC2 instance. However, it’s still much more cost-efficient than using AWS Lambda. You can use the AWS calculator to discover your actual cost.

In summary, the warm pool feature provides these advantages:

Fast scaling
Cost-efficiency (compared to AWS Lambda)

Among the disadvantages are:

Slightly higher cost than Auto Scaling EC2 without the warm pool feature
Additional management complexity

Now that we’ve conducted our experiments with all three options, let’s compare them and find out which option is optimal for your product.

Related project

Building AWS-based Blockchain Infrastructure for International Banking

Explore how Apriort helped a blockchain development company build AWS-based blockchain infrastructure for their smart contracts.

Project details

blog-artcle-cover-3-Building-AWS-based-Blockchain-Infrastructure-for-International-Banking

Which AWS service to choose for your product: side-by-side comparison

We compared AWS Lambda, EC2 Auto Scaling, and Warm pool EC2 Auto Scaling according to these criteria:

Scaling time
Cost per month
Administrative requirements
General advantages

Let’s look at these AWS autoscaling tools and discuss what they’re best suited for.

AWS Lambda is a fast service that scales quickly and requires no administration.

It’s most suitable for short-lived operations such as:

Event data processing
Scheduled tasks

AWS Lambda has the following use cases:

Event handling. Events can be triggered by various AWS services or external sources, and when an event occurs, Lambda can execute the designated function in response to it.
Microservices. AWS Lambda has a serverless nature that makes it well-suited for building microservices, which are small, independent, and loosely coupled application components. Each Lambda function can represent a microservice with specific functionality, and functions can be developed, deployed, and scaled independently.
API servers. You can use AWS Lambda to create API servers, which are essential for building RESTful or GraphQL APIs. Lambda functions can handle incoming HTTP requests and execute the necessary logic as well as automatically scale the number of processed requests depending on the increase or decrease in traffic.
IoT applications. Lambda can process data from IoT devices, execute business logic based on sensor inputs, and manage device communication.

AWS Lambda is a good choice for event-driven applications in a wide range of domains, as it’s cost-efficient, highly responsive, and flexible.

Auto Scaling EC2 is a stable and cost-efficient system with great performance. It features a convenient administration panel you can use to monitor and manage your EC2 instances and their scaling policies.

Auto Scaling EC2 is suitable for hosting servers with moderate scaling times.

Here is what you can use Auto Scaling EC2 for in your product:

Web server deployment
Database deployment
Data processing or computational tasks
Auto-scalable applications

On the downside, Auto Scaling EC2 takes minutes to scale, which can be unacceptable for products with rapid traffic increases. This system will not allow you to scale immediately, and this may harm your product’s user experience and reputation.

Warm pool within Auto Scaling EC2 is a powerful feature within AWS EC2 that adds an extra layer of speed and responsiveness to your application’s scalability. It’s cost-efficient because during downtime, you don’t have to pay for a large number of machines.

Auto Scaling EC2 with warm pool is highly suitable for applications that require rapid, on-demand scaling and are sensitive to response times. Applications with such demands are common in domains like e-commerce, gaming, streaming, and real-time analytics.

In conclusion, Auto Scaling EC2 with warm pool is an ideal solution when response time is critical to your application’s success.

As you can see, every service has its pros and cons, and in some cases, one solution may not be enough. If your application experiences regular and sudden load spikes, you may need to combine several AWS autoscaling services to ensure your system’s stable and autonomous work.

Consolidation of services for smooth system operation

Sometimes, a service doesn’t have time to scale. A high volume of requests can quickly overload it, meaning that all requests need to be canceled. While the system is scaling, it’s unavailable to users. Combining all three systems can solve these issues and protect your company’s reputation and user satisfaction.

Here’s an example of a potential multi-service solution using AWS autoscaling best practices: when your system scales and the number of requests exceeds the number of requests your EC2 instance can handle, you can redirect requests to AWS Lambda. This will allow you to take some of the load away from EC2 while the system scales, giving you more stable performance and meaning that you won’t have to cancel requests and wait for AWS to bring online more EC2 instances.

This solution will be more expensive than Auto Scaling EC2 with a warm pool, but not by much: AWS Lambda will only run for 15 to 30 seconds before bringing up another EC2 instance and offloading the system.

Conclusion

Scaling is a crucial aspect of any server, as it allows you to meet growing demand and ensure that your services remain accessible and responsive. However, when a software product experiences sudden and drastic traffic spikes, you need to set up automated scaling. It allows you to adapt swiftly to changing workloads, all while optimizing costs.

You can use autoscaling services in AWS separately or combine them to set up efficient autoscaling for your product and ensure constant access for your users.

The Apriorit team is ready to help you choose and deploy the right scaling service or combination of services to ensure the stable and cost-efficient functioning of your application.

Need to build a scalable infrastructure?

Leverage Apriorit engineers’ expertise to get a scalable infrastructure that can handle any traffic demands.

AWS Auto Scaling Strategies for High-Load Systems: AWS Scale Service Comparison