This blog post demonstrates how to auto-scale your Redis based applications on Kubernetes. Redis is a widely used (and loved!) database which supports a rich set of data structures (String, Hash, Streams, Geospatial), as well as other features such as pub/sub messaging, clustering (HA) etc. One such data structure is a List which supports operations such as inserts (LPUSH, RPUSH, LINSERT etc.), reads (LRANGE), deletes (LREM, LPOP etc.) etc. But that's not all!

Redis Lists are quite versatile and used as the backbone for implementing scalable architectural patterns such as consumer-producer (based on queues), where producer applications push items into a List, and consumers (also called workers) process those items. Popular projects such as resque, sidekiq, celery etc. use Redis behind the scenes to implement background jobs.

In this blog, you will learn how to automatically scale your Celery workers that use Redis as the broker. There are multiple ways to achieve this - this blog uses a Kubernetes-based Event Driven Autoscaler (KEDA) to do the heavy lifting, including scaling up the worker fleet based on workload and also scaling it back to zero if there are no tasks in the queue!

Please note that this blog post uses a Golang application (thanks to gocelery!) as an example, but the same applies to Python or any other application that uses the Celery protocol.

It covers the following topics:

Start off with the basics, overview of the application
Setup the infra (AKS, Redis) and deploy the worker application along with kEDA
Test the end to end auto-scaling in action

The sample code is available in this GitHub repository

To start off, here is a quick round of introductions!

Celery

In a nutshell, Celery is a distributed message processing system. It uses brokers to orchestrate communication between clients and workers. Client applications add messages (tasks) to the broker, which is then delivered to one or more workers - this setup is horizontal scalable (and highly available) since you can have multiple workers to share the processing load.

Although Celery is written in Python, the good thing is that the protocol can be implemented in any language. This means that you can have client and worker applications written in completely different programming languages (a Node.js based client and a Python based worker app), but they will be able to inter-operate, given they speak the Celery protocol!

KEDA

KEDA can drive the scaling of any container in Kubernetes based on the number of events needing to be processed. It adopts a plug-and-play architecture and builds on top of (extends) existing Kubernetes primitives such as Horizontal Pod Autoscaler.

A KEDA scaler is responsible for integrating with an external service to fetch the metrics that drives auto scaling. We will be using the KEDA scaler for Redis, that auto scales applications based on the length (number of items) of a Redis List.

KEDA deep-dive is out of scope of this blog post. To learn more, please refer to the following resources:

Before we dive into the nitty-gritty, here is a high level overview of the application.

High-level architecture

The application includes the following components:

Redis (used as the Celery broker)
A producer which simulates a client application that submits tasks
The worker application (running in Kubernetes) which processes the tasks

Producer application

The producer is a Go application that submits tasks to Redis (using gocelery library). You can check the code on GitHub, but here is a snippet:

    go func() {
        for !closed {
            _, err := celeryClient.Delay(taskName, rdata.FullName(rdata.RandomGender), rdata.Email(), rdata.Address())
            if err != nil {
                panic(err)
            }
            time.Sleep(1 * time.Second)
        }
    }()

It runs a loop (as a goroutine) and sends randomly generate data (full name, email and address).

The producer application is available as a pre-built Docker image abhirockzz/celery-go-producer, however, you could choose to build another image using the Dockerfile provided in the repo.

Celery worker

The Celery worker application processes this information (via the Redis job queue). In this case, the processing logic involves storing data in a Redis HASH (but it could be anything). You can check the code on GitHub, but here is a snippet:

    save := func(name, email, address string) string {
        sleepFor := rand.Intn(9) + 1
        time.Sleep(time.Duration(sleepFor) * time.Second)

        info := map[string]string{"name": name, "email": email, "address": address, "worker": workerID, "processed_at": time.Now().UTC().String()}

        hashName := hashNamePrefix + strconv.Itoa(rand.Intn(1000)+1)

        _, err := redisPool.Get().Do("HSET", redis.Args{}.Add(hashName).AddFlat(info)...)
        if err != nil {
            log.Fatal(err)
        }
        return hashName
    }

The sleep has been added on purpose to allow the worker application can pause anywhere between 0 to 10 seconds (this is random). This will help simulate a "high load" scenario and will help demonstrate the auto-scaling (details in the upcoming sections).

The worker application is available as a pre-built Docker image abhirockzz/celery-go-worker, however, you could choose to build another image using the Dockerfile provided in the repo.

KEDA `ScaledObject`

A ScaledObject associates the Deployment we want to auto scale (in this case, its the Celery worker application) with the source of the metrics (length of a Redis List):

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: redis-scaledobject
  namespace: default
spec:
  scaleTargetRef:
    kind: Deployment
    name: celery-worker
  pollingInterval: 15
  cooldownPeriod: 200
  maxReplicaCount: 10
  triggers:
    - type: redis
      metadata:
        addressFromEnv: REDIS_HOST
        passwordFromEnv: REDIS_PASSWORD
        enableTLS: "true"
        listName: celery
        listLength: "10"

Here is a summary of the attributes used in the manifest:

(spec.scaleTargetRef.deploymentName) specifies which Deployment to target for auto scale.
The trigger type is redis and the triggers.metadata section is used to provide additional details:
- the values for address in this example is REDIS_HOST, which is the name of the environment variable which is expected to be present in the Deployment (at runtime)
- listName is the name of the Redis List whose pending items are used to drive auto scale process
- listLength is the threshold (number of List items) over which a new Pod (for the specified Deployment) is created. In this example, a new Pod will be created for every 10 pending items in the Redis List (the number has been kept low for ease of testing)
maxReplicaCount defines the upper limit to which the application can scale i.e. it is maximum number of Pods that can be created, irrespective of the scale criteria

It's time to move on to the practical stuff. But, before you go there, make sure you have the following ready:

Pre-requisites

To work through the application in this blog, you will need:

An Azure account. You can create a free account to get 12 months of free services.
Docker
Kubernetes cluster along with kubectl: I have used Azure Kubernetes Service, although minikube should work just as well.
Install Azure CLI
Redis: I have used Azure Cache for Redis, but feel free to explore other options e.g. you can install one in your Kubernetes cluster using a Helm chart).

In the upcoming sections, we will:

Install KEDA
Deploy individual components - Celery worker, ScaledObject etc.
Generate load and test auto scaling

Base setup

To start off, please make sure to:

Install KEDA

KEDA allows multiple installation options. I will be using the YAML directly

KEDA components will be installed into the keda namespace.

kubectl apply -f https://github.com/kedacore/keda/releases/download/v2.1.0/keda-2.1.0.yaml

This will start KEDA Operator and the Metrics API server as separate Deployments

kubectl get deployment -n keda

NAME                              READY   UP-TO-DATE   AVAILABLE   AGE
keda-operator                     1/1     1            1           1h
keda-operator-metrics-apiserver   1/1     1            1           1h

Before you proceed further, wait for the Deployments to be READY

We can now deploy the components required to auto scale our application. Start by cloning this repository and change to the correct folder:

git clone https://github.com/abhirockzz/redis-celery-kubernetes-keda
cd redis-celery-kubernetes-keda

Deploy the Celery worker and KEDA `ScaledObject`

We need to deploy the Secret first since its used by the Celery worker Deployment. First, encode (base64) the password for your Redis instance (check Access Keys in Azure Portal) in order to store it as a Secret.

echo 'enter_redis_password' | base64

Replace this in the credentials attribute in secret.yaml. For example, if the password is foobared:

echo -n 'foobared' | base64

//output: Zm9vYmFyZWQ=

The final version of secret.yaml will look like this (notice the encoded password in the credentials attribute):

apiVersion: v1
kind: Secret
metadata:
  name: redis-password
type: Opaque
data:
  credentials: Zm9vYmFyZWQ=

Create the Secret:

kubectl apply -f deploy/secret.yaml

We are almost ready to deploy the Celery worker application. Before that, please update the consumer.yaml file with the Redis host and port. Here is the snippet:

...
          env:
            - name: REDIS_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: redis-password
                  key: credentials
            - name: REDIS_HOST
              value: [replace with redis host and port e.g. foobar.redis.cache.windows.net:6380]
            - name: REDIS_LIST
              value: celery
...

celery is the default name of the Redis LIST created by the Celery worker - please leave it unchanged.

Deploy the worker app, check the Pod and wait for status to transition to Running:

kubectl apply -f deploy/consumer.yaml

kubectl get pods -l=app=celery-worker -w

NAME                              READY   STATUS    RESTARTS   AGE
celery-worker-5b644c6688-m8nf4   1/1     Running   0          20s

You can check the logs using kubectl logs <pod_name>

To deploy the KEDA ScaledObject:

kubectl apply -f deploy/redis-scaledobject.yaml

Auto-scaling in action

We are all set to test the end-to-end setup!

Scale to zero 💥💥

Check the Celery worker Pod:

kubectl get pods -l=app=celery-worker

//output: No resources found

No resources found ??? Wait, we had a consumer application Pod ready, what just happened? Don't worry, this is KEDA in action! Because there are no items in the Redis List right now (hence no work for the worker), KEDA made sure that there are no idle Pods running.

This behavior can be controlled by the minReplicaCount attribute in the ScaledObject (refer to the KEDA documentation

KEDA uses the information in the ScaledObject to create a Horizontal Pod Autoscaler object:

kubectl get hpa 

NAME                          REFERENCE                  TARGETS              MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-redis-scaledobject   Deployment/celery-worker   <unknown>/10 (avg)   1         10        0          2m51s

Scale UP ⬆️

Let's run the Celery producer application and simulate some work by pushing items into the Redis List. Before you do that, switch to another terminal and start watching the consumer Deployment to track auto-scaling:

kubectl get pods -l=app=celery-worker -w

Go back to the previous terminal and run the application:

export REDIS_HOST=[replace with redis host and post info e.g. foobar.redis.cache.windows.net:6380]
export REDIS_PASSWORD=[replace with redis password]

docker run --rm -e REDIS_HOST=$REDIS_HOST -e REDIS_PASSWORD=$REDIS_PASSWORD abhirockzz/celery-go-producer

//output:
celery producer started...

Wait for a few seconds. In the other terminal, you will notice that Celery worker Pods are being gradually created:

celery-worker-5b644c6688-2zk5c   0/1   ContainerCreating   0     0s
celery-worker-5b644c6688-2zk5c   1/1   Running             0     4s
celery-worker-5b644c6688-h22hp   0/1   Pending             0     0s
celery-worker-5b644c6688-h22hp   0/1   Pending             0     0s
celery-worker-5b644c6688-h22hp   0/1   ContainerCreating   0     0s
celery-worker-5b644c6688-h22hp   1/1   Running             0     4s
celery-worker-5b644c6688-r2m48   0/1   Pending             0     0s
celery-worker-5b644c6688-r2m48   0/1   Pending             0     0s
celery-worker-5b644c6688-r2m48   0/1   ContainerCreating   0     0s
celery-worker-5b644c6688-r2m48   1/1   Running             0     3s

If you check the Deployment (kubectl get deployment/celery-worker), you will see something similar to this (depending upon how many Pods have been created):

NAME            READY   UP-TO-DATE   AVAILABLE   AGE
celery-worker   3/3     3            3           9m51s

You can check the Horizontal Pod Autoscaler as well. It should reflect the same stats:

kubectl get hpa

NAME                          REFERENCE                  TARGETS      MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-redis-scaledobject   Deployment/celery-worker   9/10 (avg)   1         10        3          8m15s

If you happen to check the logs from one of the worker application Pods, you will log output such as this:

...
2021/03/01 10:05:36 got info -  Benjamin Moore liammiller233@test.com 9 Franklin Circle,
Burrton, WY, 37213
2021/03/01 10:05:36 worker b2928e0f-2c79-a227-7547-7bd2acdaacba sleeping for 3
2021/03/01 10:05:39 saved hash info users:674
2021/03/01 10:05:39 got info -  Lily Smith jacobwilliams126@example.net 84 Jackson Ter,
New Deal, FM, 53234
2021/03/01 10:05:39 worker b2928e0f-2c79-a227-7547-7bd2acdaacba sleeping for 7
2021/03/01 10:05:46 saved hash info users:473
2021/03/01 10:05:46 got info -  William Williams joshuadavis821@example.com 32 Washington Rdg,
Baldock, MN, 60018
2021/03/01 10:05:46 worker b2928e0f-2c79-a227-7547-7bd2acdaacba sleeping for 9
2021/03/01 10:05:55 saved hash info users:275
...

While our workers are happily churning along, let's check Redis as well. Use redis-cli:

redis-cli -h [redis host e.g. foobar.redis.cache.windows.net] -p 6380 -a [redis password] --tls

First, check the length of the Redis List (named celery in this case). The output will reflect the number jobs that have been pushed in by the producer application and have not been processed yet.

llen celery

(integer) 10

The worker application creates HASH with user info (based on the random data it receives from the producer application). To check, use SCAN:

scan 0 match users*

1) "960"
2) 1) "users:169"
   2) "users:272"
   3) "users:855"
   4) "users:429"

Check a few entries (using hgetall). For e.g.

hgetall users:169
 1) "name"
 2) "Natalie White"
 3) "email"
 4) "ethanjackson245@test.net"
 5) "address"
 6) "20 Jefferson Ter,\nDerby Center, ME, 18270"
 7) "worker"
 8) "6769253c-9dc3-9232-1860-4bc01ce760a3"
 9) "processed_at"
10) "2021-03-01 10:13:11.230070643 +0000 UTC"

In addition to the user details, notice that the ID of the worker that processed this record is also available. This is to confirm that different worker instances are sharing the workload.

We had set 10 as the listLength in the ScaledObject manifest and specified maxReplicaCount as 10 (so the no. of Pods will be capped to this number).

Scale DOWN ⬇️

Stop the producer application.

Once all the items in the list are consumed and it's empty, the Deployment will be scaled down after the cooldown period is reached (200 seconds in this example). Eventually, the number of Pods will go back to zero. You can "rinse and repeat" this again and experiment with different values for the number of messages you want send (simulate load), the no. of replicas you want to scale to, a different thresholdCount etc.

Clean up

Once you're done, don't forget to delete the resources you created:

Delete the Celery worker app, ScaledObject and Secret: kubectl delete -f deploy
To uninstall KEDA: kubectl delete -f https://github.com/kedacore/keda/releases/download/v2.1.0/keda-2.1.0.yaml
Delete the AKS cluster if you don't need it anymore: az aks delete --name <cluster name> --resource-group <group name>
Delete the Azure Cache for Redis instance: az redis delete --name <cache name> --resource-group <group name>

Conclusion

We covered Redis Scaler in this blog, but KEDA offers many such scalers. KEDA deals with auto scaling your applications, but, what if you could run all of these application instances on an infrastructure other than the Kubernetes cluster nodes for e.g. a Serverless platform?

If this sounds interesting, do check out Virtual Nodes in Azure Kubernetes Service to see how you can use them to seamlessly scale your applications to Azure Container Instances and benefit from quick provisioning of pods, and only pay per second for their execution time. The virtual nodes add-on for AKS, is based on the open source project Virtual Kubelet which is an open source Kubernetes kubelet implementation.

Autoscaling Redis applications on Kubernetes 🚀🚀