Name: Demystifying terraform
Rating: 2.7 (7064 reviews)
Author: devoops_

for software engineers who are new to Terraform and curious how it works.

What’s this all about

In this post we’ll try to figure out what is Terraform, why it was built and how it works. If you are a software engineer who’s about to start with Terraform and/or Infrastructure as a code concept in general, or you already have a little experience with it, then this post is mainly for you. However, it’s not going to be just a sketchy walk-through that shows basics of this wonderful piece of software. The idea here is to actually go a bit deeper, see what terraform is under the hood and try to build a simple provider for it.

Infrastructure as a Code

or in short IaC is not a fairly new concept. Surely, in the era of cloud this term and the idea behind it is pretty much everywhere. Most of the software that runs on cloud in one or another way is using IaC. Even if you just host a simple Wordpress with MySQL on a single free-tier AWS EC2 instance and you launched it up via AWS Console UI, you still used IaC in-directly, because AWS itself hardly relies on this concept. In fact, AWS Console is just a front-end that utilizes AWS API to interact with its infrastructure.

But a regular EC2 instance itself is nothing else but a Virtual Machine and surely these VMs are not managed manually by some human operators at that scale. All that cloud infrastructure is backed by things like Puppet for example, which is a configuration management platform that aims to configure and manage servers and other IT infrastructure via code.

Having an infrastructure represented and managed as a code is good for a number of things:

code is a specification of the resource. It’s the best spec documentation you can have, because it, most probably, reflects the actual state of that said resource;
you can expect that when you create multiple instances of some resource from the same code - these instances will have the same configuration. In other words, the code is a blueprint for an infra and it can be re-used. Pretty much like classes and objects in programming languages;
it’s code! That means you can apply the same practices as you usually do to any other sort of code - put it into git repo, apply changes from branches via PRs, review changes, run some CI over it and so on.

There are number of different IaC tools and systems that either have their own language to work with or they re-use existing programming or markup languages. For instance AWS CloudFormation uses JSON and YAML, which surely are quite well-known, but you’d have to use 3rd party tools to add some dynamic tooling there, such as loops, variables, string interpolations, etc. Terraform has it’s own language that comes with a number of above mentioned features.

What most of these tools have in common though is that their language models are usually declarative. That basically means that instead of telling how to do things, you just specify what you want to get. Of course, behind the scene there’s most probably a fairly complex software which was written using imperative PL. However, there are also some IaC systems that utilize more feature reach self-sustained languages, like Pulumi. But let’s not spend too much time talking about variety of different infrastructure-as-a-code solutions and focus on Terraform.

Basics

Let’s start with a simple task - creating an AWS S3 bucket. That’s something that a lot of engineers who ever worked with AWS came across once or more within their careers.

For now, let’s just do it with the AWS CLI:

> aws --version

aws-cli/2.15.21 Python/3.11.6 Linux/6.6.10-76060610-generic exe/x86_64.pop.22 prompt/off 

> aws s3api create-bucket \
     --bucket devoops-test-bucket \
     --region eu-central-1 \
     --create-bucket-configuration LocationConstraint=eu-central-1

{
    "Location": "http://devoops-test-bucket.s3.amazonaws.com/"
}

Quite easy, right? First command is there to just output the version of the CLI that I’m using, second one - to actually create a bucket with the name devoops-test-bucket on eu-central-1 region.

Now let’s say we want to enable versioning on our newly created bucket. There’s another CLI method for doing that

> aws s3api put-bucket-versioning \
     --versioning-configuration "Status=Enabled" \
     --bucket devoops-test-bucket

put-bucket-versioning method expects to have bucket name specified as well as versioning configuration. So once we ran it, our bucket now has versioning enabled.

It doesn't seem too shabby to do it with a CLI. It does the trick quite efficiently. However, the problem with this approach starts with scale. When you only need one bucket to just host some static page, for example, you will do just fine by using CLI or AWS Console, but when you need to have hundreds of buckets or other resources, like virtual machines, databases, etc., then it becomes a problem. At that scale you are loosing count of all the resources out there, what are their properties, which of those buckets have versioning enabled and so on. You, of course, can use the same CLI or Console to list them, go one by one and check their configurations, but wouldn't it be just nicer to have all of them as code in some repository? Wouldn't it be easier to simply open that project in your editor and look for things up right there?

Let's see how we can do the same thing as above but with terraform. We're going to start by creating a new directory for our very tiny project and adding main.tf file there.

> mkdir very-tiny-project 
> cd very-tiny-project
> touch main.tf

Terraform doesn't necessary need to have a main.tf file, you can name the file differently or have multiple files, but we'll just stick with main.tf for now.

Now, before jumping to the code, if you are not familiar with the terraform language, please check the official documentation regarding the syntax.

So let's start writing the code. First, we should configure terraform behavior within the project itself. Terraform provides an eponymous block for it, called terraform. There we want to declare that within this project we're going to work with AWS and setup our backend.

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  backend "local" {
    path = "./terraform.tfstate"
  }
}

Let's break it down a bit:

required_providers specifies the list of API providers that we are going to use. In our case it's only AWS, but it's allowed to have more than 1 provider. We're going to discuss providers in more depth further on and even will build our own.
backend "local" tells terraform where to store the state. In this case we just use a local host, but it's possible to use a number of alternatives, like Postgres, S3, Consul, etc. Don't worry if you are not familiar with the concept of state, as we'll also discuss this later. But for now, just think of it as everything that terraform knows about the resources that it created/manages within the projeсt. Here we'll store information about the S3 bucket that we're going to create.

Actually, state and providers are one of the core concepts that terraform was built upon and what makes it so flexible and highly adopted.

The next step is to configure specified providers.
We already told terraform that we want to work with AWS, but now we have to configure the provider itself. Basically, specify parameters that AWS needs to know, like region, authentication data and so on. Each provider defines their own set of configuration parameters. We can lookup for AWS provider documentation in the terraform registry. In my case, I only want to specify the region and everything else I want AWS provider to take from my default local AWS config at ~/.aws.

provider "aws" {
  region = "eu-central-1"
}

At this point, we cal already initialize our project by running terraform init. This command pulls providers and modules that are using in the code to your machine. Also, if you are using VS Code, Neovim with terraform LSP or other editors that support terraform language server, it's better to pull the providers and modules as soon as possible, because you will also get related autocompletion and suggestions.

After running the command, you'll see a bunch of text in the output.

> terraform init
Initializing the backend...

Successfully configured the backend "local"! Terraform will automatically
use this backend unless the backend configuration changes.
Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 5.0"...
- Installing hashicorp/aws v5.70.0...
- Installed hashicorp/aws v5.70.0 (signed by HashiCorp)
Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

This output tells us that terraform has successfully pulled dependencies and is ready to run some actions. If you look into the root of your project directory, you'll see something like this:

Inside .terraform/ we'll be able to find a couple of important things:

providers directory contains all the providers that are used within the project. The inside of this folder is structured in the following order Registry > Issuer > Provider > Version > Platform > Binary and licenses. Basically each provider is just a compiled binary of a Go code that implements interfaces used by terraform.
terraform.tfstate contains some backend configurations for our future state.

Apart from providers, you might as well find things like modules in this directory. To put it simple, modules are just another terraform projects that you can re-use in your own project, like public packages in Go. We are not going to use them in this example, but you can read more about them in the documentation.

In the project root directory we also have a lockfile. It basically locks the dependency versions, same as lots of package managers, like npm, Cargo or go get do, when you install dependencies.

Ok, now let's create the bucket itself. First we should reach out to the AWS Provider documentation to see how the S3 resource is structured.

On the left hand side we can see a list of the AWS Provider APIs. In this case they are structured by AWS resource mainly.

Let's locate S3. In the Argument reference section we can see all the various inputs that can be passed to the S3 resource. For now, let's just only pass the name.

resource "aws_s3_bucket" "devoops_test_bucket" {
  bucket = "devoops-test-bucket"
}

resource block here has 2 parameters apart from the inputs body: aws_s3_bucket and devoops_test_bucket: first one is the kind of a resource according to the AWS API and the second one is the local name of this resource. This name will be used as a resource identifier of its kind in the terraform state as well as inside the code.

Now if we run terraform plan, we'll be able to see what terraform will try to create for us

> terraform plan

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_s3_bucket.devoops_test_bucket will be created
  + resource "aws_s3_bucket" "devoops_test_bucket" {
      + acceleration_status         = (known after apply)
      + acl                         = (known after apply)
      + arn                         = (known after apply)
      + bucket                      = "devoops-test-bucket"
...

There'll be a number of various parameters that will be set by default, some of them will be known after apply. But we definitely know that the name that will be used is devoops-test-bucket, the one we specified in the resource block. Let's apply it


Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_s3_bucket.devoops_test_bucket: Creating...
aws_s3_bucket.devoops_test_bucket: Creation complete after 2s [id=devoops-test-bucket]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

This time, together with showing the plan, terraform asked whether we want to actually perform described actions. After typing yes, it created the S3 bucket for us on AWS.

Amazing! Now following our CLI example steps, we would like to also enable versioning for the S3 that we've just created. According to the provider documentation, to enable versioning we should create another resource called aws_s3_bucket_versioning. This resource expects from us a bucket identifier and versioning configuration block. Let's create them:

resource "aws_s3_bucket_versioning" "devoops_test_bucket_versioning" {
  bucket = aws_s3_bucket.devoops_test_bucket.id
  versioning_configuration {
    status = "Enabled"
  }
}

Now let's apply run the apply command and see that terraform doesn't try to create a bucket again, since it knows that the bucket already exists and the plan part shows that there will be only a versioning resource created

# aws_s3_bucket_versioning.devoops_test_bucket_versioning will be created
  + resource "aws_s3_bucket_versioning" "devoops_test_bucket_versioning" {
      + bucket = "devoops-test-bucket"
      + id     = (known after apply)

      + versioning_configuration {
          + mfa_delete = (known after apply)
          + status     = "Enabled"
        }
    }
Plan: 1 to add, 0 to change, 0 to destroy.

Notice how we didn't specify the bucket name as a string inside the versioning resource, but used aws_s3_bucket.devoops_test_bucket.id instead. This is one of the coolest features in terraform - ability to crate dependencies between resources and reference them across the project. By using aws_s3_bucket.devoops_test_bucket.id inside the versioning resource block, we actually tell terraform that it should first create an S3 bucket resource itself, then take its output and using the output bucket ID create a versioning resource.

At this point you can retort that terraform way had much more overhead than the CLI approach we took earlier. And you would be completely right about this one! However, what if you want to create 100 buckets, named as devoops-test-bucket-{bucket number}? It seems like a weird requirement for business, but, overall, it's quite a common case when you might need X amount of some resource in your infra with identical configuration.

To accomplish that with CLI you can either call 100 times the command aws s3api create-bucket and then a 100 more if you want to enable versioning for them, or you could write a script that could do that for you.

for bucket_idx in {0..100}; do
  aws s3api create-bucket \
     --bucket "devoops-test-bucket-${bucket_idx}" \
     --region eu-central-1 \
     --create-bucket-configuration LocationConstraint=eu-central-1

  aws s3api put-bucket-versioning \
     --versioning-configuration "Status=Enabled" \
     --bucket "devoops-test-bucket-${bucket_idx}"
done

With terraform, you could instead use the terraform's count and for_each meta-argument inside a resource:

resource "aws_s3_bucket" "devoops_test_bucket" {
  count  = 100
  bucket = "devoops-test-bucket-${count.index}"
}

resource "aws_s3_bucket_versioning" "devoops_test_bucket_versioning" {
  for_each = { for i, bucket in aws_s3_bucket.devoops_test_bucket : i => bucket }

  bucket = each.value.id
  versioning_configuration {
    status = "Enabled"
  }
}

The syntax is a bit odd, especially if you never worked with functional programming languages before, but you can read about meta arguments in terraform documentations, they are quite useful.

"So what?", you might think, "the bash script looks even easier". And again you would be right. But if you decide to delete these buckets, with terraform you won't need to write any more code to do that. It'll be only enough to run terraform destroy and it will first destroy 100 versioning resources and then those 100 buckets. And that's where the declarative approach shines! In bash script we were writing what needs to be done to create 100 buckets, but with terraform we wrote that we need 100 buckets with given parameters and never specified how to create them. Terraform and AWS provider figured its way of doing it and stored information about created resources into state (terraform.tfstate file in our case). So now, when we want to delete them, terraform will use the knowledge it has about these buckets from the state and just performs the destruction without any extra inputs required from us. And at this point it would be a good time to figure out what terraform state actually is.

Terraform state

Terraform state is an object that contains all relevant information regarding the resources that were created by terraform. It has a standardized structure and each resource introduces values regarding its state according to what provider considers to be relevant.
Let's look at the state and understand what's in there.

{
  "version": 4,
  "terraform_version": "1.9.7",
  "serial": 9,
  "lineage": "ac4e0d14-e07e-0d35-37c3-79ecd3a1c1d9",
  "outputs": {},
  "resources": [ ⋯
  ],
  "check_results": null
}

At the top level there's a project level metadata.

version represents the version of the state itself. During terraform development there were a number of changes done to the state data representation and thus it is versioned.

terraform_version is pretty obvious - just a version of terraform that was used to run this probject.

serial - this value increases every time terraform does changes to the resources and therefore their state representation.

lineage - a unique ID assigned to a state when it is created. If a lineage is different, then it means the states were created at different times and its very likely you're modifying a different state. Terraform will not allow this.

outputs specifies all the values that your project wants to export. For example, if you are building a module that will be used by other terraform projects, you will likely want to export related values, like IDs, names, ARNs, etc. If we add the following to our main.tf

output "bucket_name" {
  value = aws_s3_bucket.devoops_test_bucket.id
}

then the output object will look like this

{
  "bucket_name": {
    "value": "devoops-test-bucket",
    "type": "string"
  }
}

resources - this is an array that contains an information about all the remote objects that are managed by terraform in this project.

Demystifying terraform

What’s this all about

Infrastructure as a Code

Basics

Terraform state

Providers