Cache on APIs is a must-have feature for any service that aims to scale. It's a way to reduce the load on your servers and improve the response time for your users by serving the same content multiple times, getting the same result without having to process the request again.

The issue is that... implementing cache systems is hard, tricky and can cause a lot of headaches when it comes to invalidating the cache, purging and keeping the cache fresh with the latest data.

In this article we will explore the concept of stale cache, test different scenarios with examples and see how CDN providers like Bunny CDN can help us to keep the cache fresh without having to purge it with every new request.

Let's start by breaking down the pieces.

⌛️ Response time

When you make a request to an API, the response time is what it takes for the server to process the request and send a response back.

This time period can typically vary depending on multiple factors from its composition like the server's load, the network latency and the complexity of the request:

Network latency: the time it takes for the request to travel from the client to the server.
DNS lookup time: what it takes for the client to resolve the server's domain name to an IP address.
Connections and SSL Handshakes: a set of ticks and checks that the server needs to do before sending the response.

And when the requests finally arrive, then in the server side we have:

Processing time: what it takes for the server to process the request and send back the response.
Response size: this affects the time it takes to transfer the data over the network.

As the processing time in the server is something we can control by optimizing the code, the other factors are out of our control. Despite that we can reduce the time it takes to process the request by using a cache system.

There are others factors like "cold-start" time on serverless implementations that also increase the time it takes to process the request, but this is out of the scope of this article.

Now, how the cache system works?

A client requests information to a server, the server processes the request and sends back the response body, http status code and headers. Two of these headers are key to understand how the cache system works:

Cache-Control: Dictates the mechanism to control for whom, how and until when the response can be cached.
ETag: Is a unique identifier for the response state.

Currently, the HTTP/2 RFC Specification defines how the Cache-Control header should work telling the client (or intermediary caches like CDNs) how to behave. It also inherits the same directives from the HTTP/1.1 specification with some extra improvements.

In the other hand, the ETag is used like an index key: if the value hasn't change, the cache can be used; if not, a new fresh response is needed. It solves the problem of seeing if the data has changed or not, but it can't solve the problem of keeping the cache fresh with the latest data. This is where the concept of stale cache comes in.

Stale cache

By working with this concept the cache is served by an intermediary to the client even if it's expired while the server is still processing the request in the background to get the latest data. When the server gets the latest data, it updates the cache with the new data and serves it fresh to the client on the next request.

The Cache-Control header can be extended with the functionality described above. This is done via the directive stale-while-revalidate that tells the client (or intermediary) to serve the stale cache while revalidate in the background.

The directive can receive a value in seconds. This value is the time during the stale cache can be served while the server is processing the request to get the latest data; for instance, a header with this directive can look like this:

Cache-Control: max-age=604800, stale-while-revalidate=86400

Which translates to: "serve the cache for 7 days, but if the cache is expired then serve the stale cache for 1 day while the server is processing the request to get the latest data".

This level of customization is then used by CDNs to keep the cache fresh without having to purge it every time a new request is made. It basically always returns the cache but, when a new version is available, it updates the cache and serves that new version to any new request. Simply like magic.

🔒 `private` / `public` directives

The Cache-Control header can also support "some kind" of privacy level. The difference between having those values is that the public value will be interpreted as if they were allowed to store the response in a shared cache, while the private one is intended to be stored only in the client's (local caches or browsers).

Basically the public directive is used when the response is the same for all users, while the private directive is used when the response doesn't need to be saved as cache by a CDN.

Having understood the concept of stale cache, let's see how we can implement this on a real-world scenario.

⚡️ Some testings

Let's test this concept with a simple API that returns a random number that will be deployed on us-east-2 zone on AWS API Gateway.

Making a GET request to the endpoint /random will return a random number between 1 and 100 due to the cold-start time of serverless functions. The first request will take a little longer to respond, but the next requests will be faster.

This example shows some of the factors explained at the beginning of the article, subsequent requests can take between 200ms and 450ms of response time depending of the lambda invoke response, even with some of tracing being cached like DNS Lookup, TCP Handshake, SSL Handshake, this is pretty normal as the origin of the request is from South America and edge locations are not configured.

So, how can we improve this response time? Let's add a cache system to the API.

Let's introduce bunny.net 🐇

Bunny is a global content delivery platform that help us to delivers cached content faster. In this example, we will use Bunny to cache the responses of the API, see how the cache system works and how the stale cache can be implemented. They have a lot of features and capabilities, but for this example we will focus on CDN.

With an account already created, we can create a new "pull zone". This is the layer that sits in front of the API and act as intermediary caching everything being requested. Given the name xxxxxxx.b-cdn.net a domain will be created and we also need to add the "Origin URL" that is the api endpoint we want to cache.

While creating the pull zone, they provide the options to check "in what zones" we want to store the cached data, so the content is served from the closest region instead of traveling all the way to the origin. We will select all the zones available so the cache is everywhere 🤯.

With the pull zone created, is ready to be used as proxy for the API. Now let's move back the the endpoint code to make the header customization so bunny know what to do with our origin responses.

We can start making requests to the new domain pull zone domain, so anything requested to the CDN will be forwarded behind scenes to API but caching the response.

// instead of
GET https://api.example.com/random

// use the pull zone domain
GET https://mynewpullzone.b-cdn.net/random

By default our cache control header is set to no-cache, so every time we hit the pull zone URL provided it will make a request to the origin, so lets change that behavior by just adding max-age directive to the header.

"Cache-Control": `public, max-age=86400`,

This will tell the CDN to cache the response for 1 day, so the next requests will be served from the cache instead of making a request to the origin. And by enabling the previous regions zones, the request will be served from the closest edge location to the client getting a lightning fast response.

🔥 18ms of response time, that's a huge improvement from the previous 200ms - 450ms, and this is just the beginning.

The CDN-Cache header returned in the request tell us if the request was served by the CDN cache or not, in case the value is MISS, the request was made to the origin, if the value is HIT, the request was served from the cache.

There are other features that can be implemented. Bunny provide features like "Query String Sorting" that, when enabled the query parameters will be automatically sorted into a consistent order before checking the cache. For example, this will treat /random?width=200&height=100 and /random?height=100&width=200 as the same path.

But, there is still something missing. With this implementation the cached response will need to be updated when the expiration time is reached so the stale cache can be used, but even after the revalidate time is expired the CDN will hit a request to the origin to update the cache. This is not the best approach, so let's see how we can improve this.

Bunny supports an option that can be enabled from the dash panel inside the pull zone cache options: the "Stale Cache", which basically "allows you to temporarily serve stale files from the CDN cache, while updating resources in the background". We have two available options here:

While Origin Offline: Which will serve the stale cache while the origin is offline and down.
While Updating: The one we are interested in, this will serve the stale cache and behind scenes update the cache with the latest data, so we always get the fresh data blazing fast thanks to their edge locations.

Wrapping up

Implementing a cache system is crucial for improving the performance and scalability of your API services. By leveraging the concept of stale cache and using CDNs like Bunny.net, you can significantly reduce response times and server load. The Cache-Control header, with directives like stale-while-revalidate, allows you to serve stale content while fetching fresh data in the background, ensuring that your users always get the best possible experience. Remember, while Bunny.net is used in this example, these caching principles can be applied with any CDN provider that supports similar capabilities.

Happy caching! 🍻

Stale cache, the holy grail of performance