Scalability vs. Elasticity

ChunTing Wu - Sep 5 '22 - - Dev Community

In system design, there are two single words are confusing, which are scalability and elasticity. They are so similar that we don't distinguish them correctly.

However, when we want to solve the issues caused by these two non-functional requirements individually, we need completely different approaches.

Therefore, in order to design the right approach we have to know them and recognize their properties. In this article, we will talk about what scalability and elasticity are and how to treat the symptoms.

Scalability

We often talked about horizontal scalability and vertical scalability, i. e., scale-out and scale-up.

But what's good and bad for scalability?

Let's take a look at an example.

CPU grows with the time. This is normal. As the system grows, the number of users increases, the feature requirements increase, and thus the resource consumption increases as well. In this case, resources refer to CPU, but it can be applied to other resources, such as memory, response time, etc.

So, if this diagram indicates a normal situation, what diagram indicates a problem?

When resource usage does not rise linearly, but exponentially, something is wrong. It could be a bottleneck in usage that results in additional overhead, or a dramatic drop in performance when the amount of data exceeds a certain amount.

On the other hand, what kind of shape represents good scalability?

Similarly, resource usage rises over time, but the rate of increase shrinks, which means that scalability becomes better. The ideal state is a horizontal line, where the resource usage remains constant regardless of the number of users and the increase in feature requirements.

In other words, how to make this line as straight and as close to the horizontal as possible is the issue to be addressed in solving the scalability.

Elasticity

After talking about scalability, let's look at an example of elasticity.

First, let's also look at the normal situation of a system.

Resource usage fluctuates dramatically over time, with spikes occurring at certain times. It is easy to understand why this happens. Take an e-commerce site for example, when there is a Black Friday sale or Christmas coming up, the system usage will be much higher than usual. Or a ticketing site, when a top singer is going to have a concert, will also attract a large number of people to come to grab tickets.

So what is good elasticity? It's about eliminating those spikes, of course.

There are two key points in this diagram.

  1. The red line is more stable than the blue line, but the red line is not a horizontal line.
  2. The red line is significantly higher than the blue line after the occurrence of a spike.

The reason for the first point is simple, because the system usage is not fixed, so the use of resources will vary with the system usage. But in general, it can still be maintained in a stable state and there will be no significant ups and downs.

The second point, I believe, is not difficult to understand: the reason why we can flatten the blue line as much as possible is not to reject all the user requests, but to postpone the instantaneous usage through "a sort of mechanism". Therefore, after the spike, there will always be a period of digestion, causing the red line to be higher than the blue line.

You may ask, though, the system can handle so many users without elasticity, so is it still necessary to improve the elasticity to make the effect of the red line?

The answer is, definitely. Why?

Because the highest point of the blue line is the limit of what the system can record, and more rejected user requests cannot enter the system and are not recorded. In the moment of the spike, the system's maximum capacity is defined there, and all requests exceeding it are rejected.

On the other hand, a more elastic system has a smoother resource usage and can handle more user requests without crashing the system.

The Myth of Autoscaling

Now, we understand the scalability and elasticity. So, can you answer one question?

Ask, does autoscaling address the need for scalability or elasticity?

The answer is, scalability.

First of all, the principle of autoscaling is to track specific metrics by sampling and start scaling when the threshold is exceeded for a period of time.

Therefore, it is already a while after the spike has occurred before scaling can begin. In addition, scaling is not immediately available; even containerized applications have to go through several steps to scale, such as loading the container image and cold-starting it on a new instance. A complex application can take a few minutes from the time it decides to scale to the time it is launched.

As a result, autoscaling is not about elasticity, but about scalability.

More importantly, autoscaling does not improve the whole system. In other words, autoscaling only allows the system to continue to grow with its original scalability, but does not make it better.

One exception is that if elastic requirements can be predicted, then autoscaling can be effectively applied.

For example, if the system receives a large number of requests at the beginning of the workday (e.g., 9:00 a.m.), then the system can be autoscaled by activating it at regular intervals. In this example, setting the time of the autoscaling between 8:30 am and 9:30 am can effectively solve the elasticity problem faced during that time.

However, in general, the autoscaling faces the issue of scalability.

Conclusion

In this article, we introduced scalability and elasticity and explained what it looks like after improvement.

Nevertheless, this article does not describe how to improve scalability, because scalability is a systemic issue and there is no one specific solution that can solve it all at once.

First, bottlenecks must be identified, then the root causes must be figured out, and finally different solutions must be proposed for each bottleneck. This requires an extra investment of time and human resources to enable the system evolution, and autoscaling is a compromise to buy time. By spending additional money, the system is given time to evolve and is not shut down by the inability to scale.

But autoscaling is not a long-term solution, and additional analysis, design and implementation costs are required to fundamentally solve the scalability issue.

On the other hand, how to solve the elasticity issue?

The answer is relatively simple, through caching or messaging. In order to tolerate a large number of requests in a moment, there must be a mechanism to postpone the current spike, and the most commonly used practice is caching.

However, there are many different ways to implement caching, and the balance between caching and consistency is an important issue. I have introduced several caching practices.

  1. Consistency between Cache and Database, Part 1
  2. Consistency between Cache and Database, Part 2

These are the basic versions of caching, but there are many more advanced caching practices that will be mentioned in subsequent articles.

The purpose of this article is only to explain the two words that are easily confused, and the only way to find a feasible solution among the many approaches is to correctly understand scalability and elasticity.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player