Bertrand Quenin is a Staff Software Engineer at Intuit and Caelan Urquhart is the co-founder of Pipekit. This article is based on a presentation Caelan and Bertrand gave on November 6, 2023 at ArgoCon North America 2023. You can watch the entire session video here. For more information on this session, you can read the event details or download the slide deck.

This post is the first in a two-part series that looks at scaling cloud-native CI/CD with Jenkins and Argo. In this first post, we focus on Argo Workflows and the CI side of the pipeline. In Part Two, we’ll look at how to use Argo CD for the CD side of the pipeline.

Here in Part One, we will:

Help you understand the challenges of running Jenkins on top of Kubernetes at scale.
Show you how to use Argo Workflow alongside Argo CD to run your CI/CD pipelines. We'll cover Jenkins and Argo Workflows, see how they map, and also look briefly at an example to show the difference between the two.
Look at key considerations if you decide to migrate from Jenkins to Argo Workflows.

Let’s start by introducing Intuit. The goal for Intuit is to help and empower our customers to make the best financial decisions using our AI-driven platform. What does CI/CD at Intuit look like? We are currently running Jenkins on top of Kubernetes at scale. This means:

We have nearly 6,000 developers running 100,000 build jobs daily.
To support this, we run a Kubernetes cluster with about 150 nodes, on which are 200 Jenkins controllers.
We’re ranging between 1,000 to 1,500 build agents at any given point in time to run those builds.

The Challenges of Running Jenkins on Kubernetes at Scale

Name: Migrating CI/CD from Jenkins to Argo
Rating: 2.4 (2636 reviews)
Author: bertrand

At Intuit

Working with Jenkins at Intuit has been successful but also has its challenges. One of the most common complaints that we get is that it’s hard to figure out what's going on when your build fails. The UI is not very easy to use, and it can be slow. Definitely, improvements can be made on the user experience side.

What about operational considerations, such as high availability and disaster recovery? We use the open-source version of Jenkins, which doesn't come with those features built in. So, we had to implement our own. Unfortunately, for the big Jenkins servers, it can take up to an hour to fail over one Jenkins to another region. This definitely doesn’t meet our SLAs.

There is no unified control plane with Jenkins. We’re running about 200 Jenkins servers. Even though we’ve automated as much as possible, what happens every time we need to roll out a new Jenkins upgrade or a new plugin upgrade? It's a tedious task because we have 200 servers that we need to take care of like pets.

When it comes to cost and efficiency, Jenkins is not a cloud-native product. So, to run on top of Kubernetes, the execution model that was adopted was to have one pod per build, and this pod will have multiple containers. However, this pod and its containers will run for the whole duration of your build. This means that when the build is idle—such as waiting for user input to proceed to the next stage—the pod and its containers continue running, wasting cluster resources.

At Pipekit

At Pipekit, as a startup, we face similar challenges. As a startup, our focus is on having a lean and adaptable CI approach. Since Pipekit is a control plane for Argo Workflows, our value proposition is delivering a centralized place for customers to manage their workflows and any tools or integrations that they plug into those workflows. We manage multi-cluster workflows and even integrate with several SSO providers for RBAC.

As we've shipped more and more integrations and features, our CI quickly expanded. We wanted lean and adaptable CI pipelines. We wanted the ability to iterate and remix pipelines easily. We wanted to autoscale to minimize our costs as a startup. From a maintenance standpoint, we wanted a CI approach that was a bit more “set it and forget it”, reducing the amount of work we were doing to tune our Jenkins pipelines. Finally, since we were going to deploy with Argo CD, we wanted a tool that would easily integrate with rollouts and events.

With Jenkins, the challenges at Pipekit were similar to those at Intuit.

Our builds were running too long, with CI and test pipelines taking quite a while for each PR, slowing down our team. We wanted to get PRs reviewed faster.

Also, because we're running all our containers in a pod, this limited the size of some of our pipelines and wasted cloud resources. We didn't feel like we could completely leverage spot nodes to drive down costs.

Finally, although getting started with plugins was easy, the maintenance costs increased over time. Whenever we ran into an issue with our pipeline, we had to deal with trying to figure out which plugin caused it or whether a plugin update had a security vulnerability. All of these complexities started to add up.

As a team, we were already using Argo Workflows for data processing and other infrastructure automation. So, we asked: What could we accomplish by using Argo Workflows as our CI engine?

Why Argo Workflows for CI

At Pipekit, we took a hard look at what Argo has to offer and what we needed.

First, the big benefits stemmed from running each step in a pod—by default. That unlocked some downstream benefits, like dynamically provisioning resources for each of the steps in our pipeline. This was a big win for us. We could get more granular with each pipeline step, provisioning the right resources and autoscaling it down once it's done. If a build is waiting on someone for approval, we can just spin that down until their approval is there, and then spin up a pod to complete the pipeline.

The other significant benefit we had with Argo was parallelism by default. We could define dependencies whenever they exist throughout the pipeline, and Argo would automatically run steps that don't have dependencies and run in parallel. That helped us speed up our pipelines without much effort. If we were to do this in Jenkins, we would have to be a bit more prescriptive about where we would use parallelism; and if you change your mind about that down the road, you end up with some tech debt you’ll need to refactor. With Argo, we just declare dependencies whenever we see them, and Argo runs it as it sees fit.

On the maintenance side, Argo was lightweight to deploy as you had another Kubernetes resource on our cluster. Without so many plugin dependencies, it was a lot easier to maintain.

Of course, being in the Argo ecosystem was a benefit, as we wanted to transition seamlessly into deployment or running rollouts.

Finally, not everybody on the Pipekit team was familiar with Groovy and writing Jenkins pipelines. So, it helped to have a tool that we could write with YAML. Also, Python developers could use the Hera SDK to spin up CI.

Migration Considerations

This brings us to consider the pros and cons of Jenkins and Argo Workflows for CI.

We’d still like to give Jenkins its due. It's a 10-year tool, and the community is really strong. There are a lot of great resources out there, so getting questions answered can be very quick. Argo has a strong community now, but there's still not as much of that documentation online.

From a UI/UX standpoint, Jenkins is great. It’s built for a CI experience. We were used to some of those primitives, whereas Argo Workflows is more generic. We were also using Argo Workflows for data and other infrastructure automation. If we were going to migrate, we would encounter a UX difference.

For us at Pipekit, we felt like migrating to Argo led to great autoscaling and parallelism benefits. Granted, we needed to think about how we would pass objects between steps in our pipelines. However, that extra effort in the beginning—to figure that out by using volumes or artifacts—ends up benefiting you, as you can achieve better scale and efficiency in the pipeline.

Mapping between Jenkins and Argo Workflows

Before we dive into an example, let’s briefly cover how a Jenkins pipeline maps to an Argo Workflows pipeline.

To start, we have the Jenkins file (Groovy) that maps to the Argo Workflow definition, which is either YAML or Python, if you use one of the SDKs like Hera.

A step in Jenkins maps to a task in Argo.

A stage in Jenkins maps to a template in Argo. Templates come in different flavors:

The DAG template, which is the most popular.
A steps template, which declares a sequence of linear steps
A script template, where you can pass a quick testing script in Python to be run as part of a step in the pipeline.

The shared library in Jenkins maps well to what's called WorkflowTemplate in Argo Workflows. With this, we can parameterize and remix our pipelines better than we could with Jenkins.

With Jenkins plugins, there isn’t much of a one-to-one mapping to Argo. Yes, there are Argo Workflows plugins to be aware of, but they're not built like Jenkins plugins. Argo Workflows does have exit handlers, which we can use to integrate with third-party tools.

A CI/CD Pipeline Example

Now, let’s demonstrate what a standard CI-CD pipeline can look like with Jenkins and Argo Workflows.

Our example is fairly straightforward. It starts with building a container. Then, you want to publish that container in your container registry. That stage has three steps which can run in parallel

Publish test coverage
Run security scans on the container
Perform static code analysis

After these are completed, this is where you usually want to deploy. At Intuit, we like to keep track of our deployments. So, we would first create a Jira ticket for that deployment. Then, we would deploy the new container using Argo CD. If that is successful, then we close the Jira ticket.

What does this look like concretely in Jenkins, and then in an Argo Workflow? Let’s compare by looking at some code snippets from this repository.

The Jenkinsfile

We’ll start by looking at this Jenkinsfile. The first line references the shared library that we're going to use along that pipeline.

@Library(value = 'cicd-shared-lib', changelog = false) _

In the first section here, we instruct Jenkins where to run the agent. In this case, it's a Kubernetes agent, and this is typically where you would define your pod specification. This is related to what we mentioned above, in which you would have multiple containers running for the whole duration of your build.

agent {
    kubernetes {
        yamlFile 'KubernetesPods.yaml'
    }
}

Through the rest of the Jenkinsfile, we define our stages, and stages can be nested. Notice that there's no need to specify a git clone or git checkout—it's already part of your Jenkins pipeline. It's been set up for you.

In our stage steps, we have functions such as podmanBuild or podmanMount. These functions will be defined in the shared library.

steps {
    container('podman') {
        podmanBuild("--rm=false --build-arg=\"build=${env.BUILD_URL}\" --build-arg appVersion=${config.version} -t ${config.image_full_name} .")
        podmanMount(podmanFindImage([image: 'build', build: env.BUILD_URL]), {steps,mount ->
            sh(label: 'copy outputs to workspace', script: "cp -r ${mount}/usr/src/app/target ${env.WORKSPACE}")
        })
    }
}

In the first stage (lines 12-19), we are basically building the container. We invoke the podmanBuild and podmanMount functions. The podmanMount call (line 16) is just to extract the test coverage. Something that is nice with Jenkins is that all the files you have in your workspace are available to any of the steps of your pipeline. These files are going to be reused later.

In the next stage, (lines 23-30), we publish our image, again using podman.

Finally, we have a “Container Checks” stage (lines 32-58). In the case of Jenkins, if you want to run something in parallel, it must be explicit. It looks like this:

stage('Container Checks') {
    parallel {
        stage('Report Coverage & Unit Test Results') {
            steps {
                junit '**/surefire-reports/**/*.xml'
                jacoco()
                codeCov(config)
            }
        }

        stage('Security Scan') {
            steps {
                container('cpd2') {
                    intuitCPD2Podman(config, "-i ${config.image_full_name} --buildfile Dockerfile")
                }
             }
         }

         stage('Static Code Analysis') {
             steps {
                 container('test') {
                     echo 'Running static Code analysis: from JenkinsFile'
                     reportSonarQube(config)
                 }
             }
         }
     }
 }

We have a parallel section where we will actually reuse the test coverage that we have extracted from the podman call in line 16. They're still in the workspace, and we can use them here. Then, we run some security scans, and then some static code analysis.

Finally, we have the deployment stage (lines 63-85) where we start by creating a Jira ticket. Then, we deploy using Argo CD.

`stage('Deploy') {
    steps {
        container('cdtools') {
            gitOpsDeploy(config, 'qa-usw2', config.image_full_name)
        }
    }
}`

We use a gitOpsDeploy function, which uses Argo CD under the hood, on our QA environment using the new image name.

Once the deployment is completed, we close the Jira ticket.

The Argo Workflow

For the Argo Workflow, let’s look at argo-workflow.yaml, breaking it out into a few sections.

First, we have all of the arguments that we want to pass through the workflow (starting at lines 7-20), including the Git branch, our container tag, Jira ticket number, etc.

`parameters:
  - name: app_repo
    value: ""
  - name: git_branch
    value: ""
  - name: target_branch
    value: ""
  - name: is_pr
    value: ""
  - name: container_tag
    value: ""
  - name: jira_ticket_number
    value: ""`

Next, we have the volume claim (lines 24-32), since we need to be a bit more intentional about setting up object passing with our workflow. We define the directory on the cluster where we’ll store objects as we pass between steps. Then, we set the access mode here to ReadWriteMany so that we can enable parallel read-writes throughout the workflow and achieve a higher level of parallelism as we run our pipeline.

`volumeClaimTemplates:
  - metadata:
      name: workdir
    spec:
      accessModes: [ "ReadWriteMany" ]
      storageClassName: nfs
      resources:
        requests:
          storage: 1Gi`

When it comes to the actual pipeline and how this maps compared to Jenkins, this is the templates section (lines 35-115), where our DAG lives and the pipeline is defined. Here, we set up a DAG template with several tasks.

We use what's called the templateRef to reference WorkflowTemplates. If we've applied all those WorkflowTemplates to our cluster, workflows will automatically reference those. We have a directory with our Argo Workflow templates, where we have everything defined, and we simply reference that in our workflow manifest in this definition. So, for example, in our DAG, we define a git-checkouttask and a get-git-info task to get our SHAs.

`- name: git-checkout
  templateRef:
    name: git-checkout
    template: main
- name: get-git-info
  templateRef:
    name: get-git
    template: main`

These each reference a templateRef, which we have defined in our Argo Workflow templates directory. For example, the get-git templateRef is defined in this workflow template. This approach makes it much easier to iterate on a new pipeline by just referring to the workflow templates and passing in different parameters depending on what that pipeline needs.

Moving on to the container-build task, we see that it's also using a workflow template, and it depends on the git-checkout and get-git-info tasks to run. That's how we declare the shape of the pipeline and the order.

`- name: container-build
  arguments:
    parameters:
      - name: container_tag
        value: "{{workflow.parameters.container_tag}}-{{tasks.get-git.outputs.parameters.release-version}}"
      - name: container_image
        value: "pipekit-internal/foo"
      - name: dockerfile
        value: "Dockerfile"
      - name: path
        value: "/"
      templateRef:
        name: container-build
        template: main
      depends: git-checkout && get-git-info`

If a task doesn’t use that depends key, then it'll just run in parallel automatically.

Next, we run our unit tests (lines 62-65), container scan (lines 67-71), and code analysis (lines 72-87). All of these tasks ultimately depend on git-checkout and get-git-info.

Finally, we create the Jira ticket (lines 88-92). Similar to previous tasks, we have an update-jira workflow template. In that template, we define how we want to update Jira, pass in parameters (for opening, updating, closing, etc.), and use that template throughout the pipeline whenever it's needed.

Then, we have our deployment task.

`- name: deploy-application-preprod
  templateRef:
    name: deploy-application
    template: deploy-application
  arguments:
    parameters:
      - name: app_type
        value: "preprod"
      - name: gh_tag
        value: "{{tasks.get-git.outputs.parameters.release-version}}"
  depends: create-jira-ticket
  when: "{{workflow.parameters.is_pr}} == true && {{workflow.parameters.target_branch}} == master"`

This, of course, points to our deploy-application workflow template. We pass in our arguments here for Argo CD to use to then run the deploy. That's where we effectively have a nice, seamless integration from Argo Workflows to Argo CD in our deployment.

Lastly, we wrap it up with updating JIRA again at the end (lines 105-115).

Optional: Workflow metrics and Prometheus
We also have the optional ability to add in a simple, native way to emit Prometheus metrics from our Argo Workflow. We've done that (in lines 117-157) by adding a duration metric for how long our pipeline's running, as well as counters for successful and failed runs.

Final Tips and Wrap-up

As a reminder, take a look at the GitHub repository, which has the workflow templates and structure, so you can see how Argo Workflows works for this use case. You can also check out a working example of the code to run it locally or even on your cluster.

Migrating your Jenkins pipelines to Argo might seem a bit daunting, but have no fear! You don’t have to take the approach of a full weekend with no sleep to make it happen. We recommend taking it in a piecemeal approach.

At Pipekit, our approach was first to just trigger Argo jobs with Jenkins, not completely migrating off Jenkins all at once. This enabled us to get a feel for how it would work, how stable it would be, and get that buy-in to then have confidence that we can move over larger pipelines.

When we did start moving over, we started with some of the simpler tasks and pipelines. That made it easier to figure out how we would want to migrate everything else—like our complex Jenkins pipelines.

Other tips to use when you're migrating include:

Adopt workflow templates and parameterization from the very beginning.
Think of each step as something that you will reuse down the road, and this will enable you to accelerate your migration as you get more familiar with Argo Workflows.
Don't forget to tap into the Argo community.

At Pipekit, we also have a virtual cluster where you can set up a pipeline of your own with pre-built examples. So, you won’t need to worry about configuring your local cluster or anything like that. Just spin up a virtual cluster and run that on your own.

Intuit loves open source and is a major contributor to Argo CD and Argo Rollouts, We encourage you to check out those projects as well.