API slowdown has a great impact on the user experience or even loss of trust, so we always want to know the latency as early as possible.
Therefore, it is common to integrate performance tests and measurement benchmarks into automation testing.
But the concept of slow is very difficult to evaluate.
- Did it happen by accident? Or is it an ongoing status?
- Is it caused by the experimental environment? Or is it a side effect of the code?
- Is it the result of increased features? Or is it a defect?
It is difficult to have a clear answer to these questions.
Assuming we have a way to detect "slow", then we will send an alarm or a report. But if it's just a "coincidence" or a false alarm caused by experimental environments, eventually people will try to ignore these signals.
In addition, it is to be expected that APIs will slow down as features are added. When we have more parameters, more business logic, and more if-else
, it's no wonder that the response time slows down.
Furthermore, if we use threshold to avoid the above problem, then when we really recognize API slowdown, we won't know what the root cause of the slowdown is.
All these factors make the task of detecting API slowdowns extremely hard.
It's time for statistics to come into play
To address the mentioned issues, let's summarize the requirements.
- to be able to know specifically when the problem occurs
- to be able to allow accidental outliers
- to be able to avoid the normal phenomenon of slow rise
- to be able to issue alarms
First, let's look at a diagram.
- The horizontal axis indicates each test, which can be considered as time or release version, or even if integrated into CI/CD pipeline, that is each commit.
- The vertical axis is the latency of the target tests.
- The red line is the main character of this article, Linear Regression.
From the diagram, we can know before a certain time, basically the latency is slowly increasing, but after a certain time, the whole latency is rapidly increasing. And that time point is the problem we want to know.
Perhaps you will say, it is easy to find out by the naked eye by looking at the numbers each time? After all, it is quite intuitive.
The key is the scale of the latency. If the average of the data points from the group below is 0.1 seconds, and the group behind is 0.2 seconds, will you still feel very intuitive?
Therefore, we need a brain-friendly standard, i.e., linear regression.
Through linear regression, we can know the overall trend of the data set is up. However, this is not enough, because just knowing that the trend is up does not mean we can judge whether it is normal or abnormal.
The group of dense data points in the front is also up, so how can we determine that the trend is abnormal and the front is normal?
Let's look at another diagram.
Question: Is the latency of this diagram normal or abnormal?
It looks very similar to the previous diagram, there is a clear upward trend, so it should be abnormal, right? Really?
Let's add some reference points and zoom out a bit.
Have you found? The diagram above is actually a normal part of the original diagram, just a slow rise.
By simply doing linear regression analysis on a single data set, we may get wrong conclusions because we have no benchmark for comparison and cannot tell whether it is a natural phenomenon caused by increased features or a problem caused by defects.
Solution
Back to our question, how can we detect the problem as early as possible without false positives?
First, we need to choose a reference point, e.g., one day ago. Then, we perform a linear regression on the entire data to get a red line. Of course, we also need to perform a linear regression on the data before the reference point, which will lead to the green line.
The slope of the red line and the green line is used to calculate the angle as the judgment basis. In other words, when the angle between the red line and the green line is larger, then it is more likely to be abnormal.
Therefore, the threshold for setting an alarm is neither the latency itself nor the slope, but the angle of the two lines.
Given the slopes of the two lines are k1 and k2, then the formula for the angle θ is as follows.
In Python, it would be:
import math
thera = math.degrees(math.atan2(abs(k2 - k1), abs(1 + k1 * k2)))
# atan2 does not need to handle denominator zero error cases
# If we use atan
# we need to solve the case where k1 * k2 is equal to -1 first
We can do several experiments with simulated values to get our ideal alarm threshold.
Finally, in addition to the angle threshold, we also need another alarm indicator, which is the upper bound of the allowed latency.
Why do we need to set an upper bound?
Because if our latency keeps increasing in a green line trend, one day it will reach an unacceptable value, but if we just set the angle threshold, we will never know how bad the situation is until we have a disaster on the production.
Finally, the following is an example of how to compute linear regression in Python.
import numpy as np
from sklearn.linear_model import LinearRegression
# raw_x and raw_y are the results of each test
# raw_x can be a time, version or commit normalization
# raw_y are latencies
x = np.array(raw_x)[:, np.newaxis]
y = np.array(raw_y)[:, np.newaxis]
bfl = LinearRegression()
bfl.fit(x,y)
y_pred = bfl.predict(x)
slope = bfl.coef_[0, 0]
Conclusion
How do we know the API is slowing down?
We need two alerts.
- the angle between the current data line and the data line of the reference point after linear regression.
- the maximum upper bound of the latency.
When we have these two alarms, we can know exactly when the code is causing problems and if it is time to refactor the system.
In fact, this is a statistical approach to achieve the goal, but there are still some drawbacks to this. For example, how to set the threshold value of the slope angle? How to choose the reference point? What is the statistical range to be chosen? All these questions require experiments to find the answers.
Therefore, the most effective way is to introduce machine learning into the system of latency analysis, and judge whether the current result is normal or abnormal through the prediction model, rather than through human settings. It is much more difficult to make judgments by humans than by machines.
However, to build a machine learning model requires corresponding domain knowledge and skills, which may not be suitable for small organizations, so linear regression is a reliable and low cost solution.