Introduction

We will create a new Ansible role and a playbook to automate the installation of the Command line tools I always install on Ubuntu servers. Having the installer and Ansible role is not enough. It is always a good practice to document the role, what it is for and how people can use it, so we will discuss that too.

The new features we will learn about today are the following:

Using multiple tasks files and including a tasks file in the main.yml.
Using Ansible facts, disabling them and gathering a subset of the available facts.
Creating a symbolic link
Updating the apt repository cache
Using the folder "vars" in addition to "defaults"
Using regular expressions in Ansible

We will start from the source code of the 7th episode:
https://github.com/rimelek/homelab/tree/tutorial.episode.7

Before you begin
Ansible playbook and optional APT cache update
Ansible role
Documenting Ansible roles
Conclusion

Before you begin

Requirements

» Back to table of contents «

The project requires Nix which we discussed in Install Ansible 8 on Ubuntu 20.04 LTS using Nix
You will also need an Ubuntu remote server. I recommend an Ubuntu 22.04 virtual machine.

Download the already written code of the previous episode

» Back to table of contents «

If you started the tutorial with this episode, clone the project from GitHub:

git clone https://github.com/rimelek/homelab.git
cd homelab

If you cloned the project now, or you want to make sure you are using the exact same code I did, switch to the previous episode in a new branch

git checkout -b tutorial.episode.7b tutorial.episode.7

Have the inventory file

» Back to table of contents «

Copy the inventory template

cp inventory-example.yml inventory.yml

Change ansible_host to the IP address of your Ubuntu server that you use for this tutorial,
and change ansible_user to the username on the remote server that Ansible can use to log in.
If you still don't have an SSH private key, read the Generate an SSH key part of Ansible playbook and SSH keys
If you want to run the playbook called playbook-lxd-install.yml, you will need to configure a physical or virtual disk which I wrote about in The simplest way to install LXD using Ansible. If you don't have a usable physical disk, Look for truncate -s 50G <PATH>/lxd-default.img to create a virtual disk.
You will need an encrypted secret file which I wrote about in the Encrypt a file section of "Use SOPS in Ansible ro read your secrets".

Activate the Python virtual environment

» Back to table of contents «

How you activate the virtual environment, depends on how you created it. In the episode of The first Ansible playbook describes the way to create and activate the virtual environment using the "venv" Python module and in the episode of The first Ansible role we created helper scripts as well, so if you haven't created it yet, you can create the environment by running

./create-nix-env.sh venv

Optionally start an ssh agent:

ssh-agent $SHELL

and activate the environment with

source homelab-env.sh

Ansible playbook and optional APT cache update

» Back to table of contents «

Before we can talk about the role, we have to start with a playbook. Previously, we only had playbooks for specific tasks like installing and removing LXD. The goal is to have a playbook that installs the common dependencies with which you can play on the remote servers even without Ansible, so when you are trying to do something new, you don't have to start with yaml files without even knowing what you want to do in the end. Let's call this playbook file "playbook-system-base.yml", and for now, add only the role that we will create soon.

- hosts: all
  roles:
    - role: cli_tools

We still assume that all our machines that we configure in the inventory file are targets. It will change, but not in this post.

This ansible role will contain the installation of lots of APT packages. We could have other roles that want to install APT packages, so we also want to make sure the APT cache is up-to-date. It would be a waste of time to update the cache in every role, so we will update it in a pre task:

- hosts: all
  pre_tasks:
    - name: APT update
      become: true
      changed_when: false
      when: config_apt_update | default(false, true) | bool
      ansible.builtin.apt:
        update_cache: true
  roles:
    - role: cli_tools

In this case we use the built-in "apt" module to update the cache, without installing anything, but apparently, updating the cache will also mean the task will always report a change. To disable that, we add changed_when: false to the task. We also want a way to skip the updater pre task. When you have to run a playbook 10 times in two minutes while you are developing it, updating the cache every time is simply not necessary. We add a condition which will use the new config_apt_update variable. If it is not defined in the inventory file, we use "false" as default value, but you can always override it from command line.

./run.sh playbook-system-base.yml \
  -e config_apt_update=true

I will define it in my inventory, so this is how the global vars section looks like now:

all:
  vars:
    ansible_user: ansible-homelab
    sops: "{{ lookup('community.sops.sops', 'secrets.yml') | ansible.builtin.from_yaml }}"
    config_apt_update: true
    config_lxd_zfs_pool_disks:
      - /dev/disk/by-id/scsi-1ATA_Samsung_SSD_850_EVO_500GB_S2RBNX0J103301N-part6

Ansible facts

» Back to table of contents «

There is one more line we need to add to the playbook.

When you run a playbook, as the very first step, Ansible detects devices and collects information for example about networks and the version of the Linux distribution. The collected information will be available through variables and these are the facts. Sometimes you don't need these facts, and you want to speed up the execution of the playbook, especially when you have to run it on 100 servers or on just a couple but very often during development. If that is the case, you can set gather_facts: false in the playbook like this:

- hosts: all
  gather_facts: false
  pre_tasks:
    - name: APT update
      become: true
      changed_when: false
      when: config_apt_update | default(false, true) | bool
      ansible.builtin.apt:
        update_cache: true
  roles:
    - role: cli_tools

If you use roles you didn't write, and you don't want to find out what facts they need, just leave the facts gathering enabled.

Now you may think you understand the difference between the variables we used before and the facts, but in fact, you can also define facts using the set_fact builtin module. So one more important difference is the scope. You can define a variable in a task, but that variable will not be available in the next task. Facts are available everywhere, and you can also cache them, so when you define a fact, run the playbook, remove the definition and rerun the playbook, you can still read the fact from the cache. Of course it depends on the used cache plugin, and the default is memory. So by default, the facts are not available when you run a playbook the second time. If ou want to see how persistent fact caching works, the following example can show it.

Run the following commands in terminal:

export ANSIBLE_CACHE_PLUGIN=jsonfile 
export ANSIBLE_CACHE_PLUGIN_CONNECTION="$PWD/var/cache"

Use the following playbook:

- hosts: localhost
  gather_facts: false
  tasks:
    - ansible.builtin.set_fact:
        cacheable: true
        mytest: hello
    - ansible.builtin.debug:
        var: ansible_facts.mytest

After running the playbook, you will find a file named "localhost" in the folder you specified in the plugin connection.

{
    "mytest": "hello"
}

Then run the following playbook:

- hosts: localhost
  gather_facts: false
  tasks:
    - ansible.builtin.debug:
        var: ansible_facts.mytest

And Ansible will still remember the value of "mytest":

ok: [localhost] => {
    "ansible_facts.mytest": "hello"
}

Ansible role

Ansible role overview

» Back to table of contents «

The new Ansible role will be called "cli_tools". The structure of the role will be the following:

defaults/
- main.yml: The place for default parameter values.
vars/
- main.yml: A file to store helper variables which are not intended to be changed by the user. You can use this file if the alternative is storing the variables in the tasks file, which requires creating a block only for those variables.
tasks/
- main.yml: The default tasks file that we always used
- yq.yml: An additional tasks file which we can refer to and load in the main.yml.
README.md: This is basically the documentation of the role containing everything that helps the user to understand how the role can be used, what it expects to be already installed and so on. We will discuss it in more details later.

Creating a symbolic link

» Back to table of contents «

Most of the packages I install on a Debian-based Linux can be installed from an APT repository, but using the built-in apt module that we already used before is not really interesting, so let's just jump to the interesting part. Sometimes, I just want to have an alias for a command, and that's where I will create a symbolic link like now to point to the pygmentize command. The built-in files module can create a symbolic link if the state field is "link".

    - name: Create "highlight" as a symbolic link to "pygmentize" | Install formatting tools for scripting and user-friendly outputs
      become: true
      ansible.builtin.file:
        state: link
        src: /usr/bin/pygmentize
        dest: "{{ cli_tools_highlight_dest }}"

The destination could have been static, but I wanted to make it changeable, so I will have a default value for that in defaults/main.yml.

Using multiple tasks files

» Back to table of contents «

Installing yq will be complicated, but I don't want to complicate my main tasks file.

The built-in include_tasks module can load another tasks file and expects the name of the file and executes the tasks in it.

- name: Include tasks from another file
  ansible.builtin.include_tasks: file.yml

It can be useful in different situations, but in this case, I didn't want to keep the most complicated installation process in the main file. The main.yml can also be shorter this way. For more details about how this module can be used, don't forget to check the documentation I linked above.

The following code is the part of main.yml in the cli_tools role which shows all the 3 modules I used in the main.yml, and also includes a block. Most of the tasks will be familiar since we used the APT module before and I also shared the symlink part, but the last task is an include.

roles/cli_tools/tasks/main.yml

- name: Install formatting tools for scripting and user friendly outputs
  block:

    - name: APT packages | Install formatting tools for scripting and user-friendly outputs
      become: true
      ansible.builtin.apt:
        name:
          - jq # to handle json files
          - python3-pygments # to highlight codes with "pygmentize"

    - name: Create "highlight" as a symbolic link to "pygmentize" | Install formatting tools for scripting and user-friendly outputs
      become: true
      ansible.builtin.file:
        state: link
        src: /usr/bin/pygmentize
        dest: "{{ cli_tools_highlight_dest }}"

    - ansible.builtin.include_tasks: yq.yml

I didn't use the "name" parameter in the last task, because the tasks in the included file will have names, so it wouldn't really help to understand the role better and wouldn't add more value to the logs either. It was not my idea. I named every single task until I read about this point of view and I agreed. Unfortunately, I don't have a link to the source.

Install the latest yq from GitHub

Using the GitHub API to get the latest release

» Back to table of contents «

We can finally discuss the most interesting part. I want to install "yq" from GitHub, which will require two more default variables in defaults/main.yml:

cli_tools_yq_version:
cli_tools_yq_dest: /usr/local/bin/yq

The version number is empty, which will mean that I want to install the latest version. I tried to find a link directly to the latest release, but it turned out, there was no such link. However, the GitHub API can tell us which one is the latest. If you just want to get the URL to download the latest version, you can try the following in the terminal:

curl -sL https://api.github.com/repos/mikefarah/yq/releases/latest

It will return a json which is too long to show it, but let's see the relevant part:

{
  "html_uri": "https://github.com/mikefarah/yq/releases/tag/v4.40.5",
  "assets": [
    {
      "browser_download_url": "https://github.com/mikefarah/yq/releases/download/v4.40.5/yq_linux_amd64"
    },
    {
      "browser_download_url": "https://github.com/mikefarah/yq/releases/download/v4.40.5/yq_darwin_arm64"
    }
  ]
}

This will be really important, because it has all the information we need, and it has it more than once.

Let's see how you can call the API endpoint from Ansible:

- name: Get latest version info as json
  when: cli_tools_yq_version | default('', true) == ''
  ansible.builtin.uri:
    url: https://api.github.com/repos/mikefarah/yq/releases/latest
  register: _yq_latest

The built-in uri module allows us to call the endpoint and save the json response into a variable. Of course we want to do that only if the requested version number is empty, that's why we compare the version number to an empty string.

Get the version number of the latest release

» Back to table of contents «

In the previous section, you could see that we could get the download url from the json response, which contains the version number, the architecture and also the operating system. The response also shows that these are the only differences in the download URLs. The download URL is the only thing we need, but sometimes we want to specify the version number instead of getting the latest version. So instead of using the above information to filter to the URL that we know exactly how it looks like, we can just build the URL from scratch. The first important part of that URL is the version number, but the version number can also be found in the html_url field, which does not require to list the release files.

Assuming you already have jq on the server, you can run the following:

curl 'https://api.github.com/repos/mikefarah/yq/releases/latest' -s  \
  | jq -r '.html_url' \
  | xargs -- basename \
  | sed 's/^v//'

Output:

4.50.5

We need to the version number in Ansible. We registered the json response in _yq_latest. It will have a property called "json", which is not a string. It is in fact a decoded version of the json string, since The "uri" module recognized json in the HTTP response header. The above bash command can be replaced with the following Jinja template in Ansible:

    _yq_latest_version_number: "{{
      _yq_latest.json.html_url
        | basename
        | regex_replace('^v(.*)', '\\1')
      }}"

We also used a very simple regular expression telling Ansible to remove the leading "v" from the version number. Removing the "v" is not really important. It was just my preference to work with only the numbers.

We now have the latest version number, and we know that we want to use that as the default value and also be able to override it. This is how you do it:

_yq_desired_version_number: "{{ cli_tools_yq_version | default(_yq_latest_version_number, true) }}"

Get the architecture and operating system of the server

» Back to table of contents «

The next important thing after the version number is the release name. The release always starts with "yq_" followed by the operating system and the architecture. We will need the uname command to get name of the operating system (darwin on macOS and linux on Linux) and the arch command to get the CPU architecture. Unfortunately, amd64 can also be called x86_64 and arm64 can also be called aarch64, so let's use sed to fix that.

uname
arch

Output:

Linux
x86_64

While we could use the uname and the arch commands to get the operating system and the CPU architecture in the terminal, we can use facts in Ansible. Since we disabled the fact gathering, we have to use the built-in setup module to get the architecture and the operating system.

- name: Collect architecture facts
  ansible.builtin.setup:
    gather_subset: architecture

After that you can get operating system and the architecture from the ansible_facts variable.

- vars:
    info:
      os: "{{ ansible_facts.system }}"
      arch: "{{ ansible_facts.architecture }}"
  debug:
    var: info

Although I prefer using ansible_facts, so I can search for where I'm using facts, you could use the variables prefixed with ansible_.

- vars:
    info:
      os: "{{ ansible_system }}"
      arch: "{{ ansible_architecture }}"
  debug:
    var: info

Saving helper variables in addition to defaults

» Back to table of contents «

Variables in Ansible can be defined in many places. In a role, we can have defaults, but we can also have variables which are not for changing them (although we could change them too), but only for organizing our templates, so we don't have to define all the variables in the tasks files.

The architecture and the operating system is the two most important pieces of information to build the final URL. We have to convert those to a format that can be used in the download URL.

uname | tr '[:upper:]' '[:lower:]'
arch \
  | sed 's/x86_64/amd64/' \
  | sed 's/aarch64/arm64/'

We converted the nme of the operating system to lowercase, and replaced the architecture with the alternative names. Yes, in this project we support only these two.

Output:

linux
amd64

In Ansible, we will save the templates in vars/main.yml, so it is another folder called "vars/" at the same level as "defaults/".

cli_tools_yq_archs:
  x86_64:  amd64
  amd64:   amd64
  aarch64: arm64
  arm64:   arm64

cli_tools_yq_os: "{{ ansible_facts.system | lower }}"
cli_tools_yq_arch: "{{ cli_tools_yq_archs[ansible_facts.architecture] }}"
cli_tools_yq_release_name: "{{ 'yq_' + cli_tools_yq_os + '_' + cli_tools_yq_arch }}"

This is how we will always get arm64 or amd64. Since we get the OS name from fact, it would also work on macOS. It doesn't mean the whole role would work, since we also use the APT package manager, but you could try to move the yq installation into a separate role. Whether you want to use multiple tasks files or a new role, it's up to you.

The last thing we did was defining the full release name.

Installing the desired version of yq

» Back to table of contents «

We finally have all the information the build the download url:

_url_base: https://github.com/mikefarah/yq/releases/download/
_url: "{{ _url_base }}v{{ _yq_desired_version_number }}/{{ cli_tools_yq_release_name }}"

We can use the following task, but in this case, we choose the built-in get_url module instead of uri.

- name: Install yq
  become: true
  failed_when: _yq_install.status_code not in [200, 304]
  vars:
    _yq_latest_version_number: "{{
      _yq_latest.json.html_url
        | basename
        | regex_replace('^v(.*)', '\\1')
      }}"
    _yq_desired_version_number: "{{ cli_tools_yq_version | default(_yq_latest_version_number, true) }}"
    _url_base: https://github.com/mikefarah/yq/releases/download/
    _url: "{{ _url_base }}v{{ _yq_desired_version_number }}/{{ cli_tools_yq_release_name }}"
  ansible.builtin.get_url:
    url: "{{ _url }}"
    dest: "{{ cli_tools_yq_dest }}"
    owner: root
    group: root
    mode: 0775
    force: true

There is one parameter I have to explain.

force: true

Without this parameter wo couldn't update an already installed yq. It tells Ansible to override the downloaded file.

Skip downloading when the existing version is the desired one

» Back to table of contents «

Previously, we always overwrote the installed version, which required downloading the file every time. To avoid that we need the version of the already installed yq, and do that only if it is already installed.

To find out if the file is already downloaded, we can use the built-in stat module.

- name: Check if {{ cli_tools_yq_dest }} exists
  ansible.builtin.stat:
    path: "{{ cli_tools_yq_dest }}"
  register: _yq_existing_dest_check

Now the boolean _yq_existing_dest_check.stat.exists variable tells you whether it exists or not. In the terminal, you would get the version number like this:

yq --version

Output:

yq (https://github.com/mikefarah/yq/) version v4.40.5

It's not just a version number, so we will use regular expression again, but first we get the version info in Ansible:

- name: Get the version information of the existing yq command
  changed_when: false
  when: _yq_existing_dest_check.stat.exists
  ansible.builtin.command: "{{ cli_tools_yq_dest }} --version"
  register: _yq_existing_version_info

I used the cli_tools_yq_dest parameter so the task will work even if the path of the base folder is missing from the PATHS environment variable.

We need to apply the following filter on the version info:

regex_replace('.*version v(\\d+\\.\\d+\\.\\d+).*', '\\1')

As a template variable:

_yq_existing_version_number: "{{ _yq_existing_version_info | regex_replace('.*version v(\\d+\\.\\d+\\.\\d+).*', '\\1') }}"

We will also need to add the following condition to the task:

  when:
    - not _yq_existing_dest_check.stat.exists or _yq_existing_version_number != _yq_desired_version_number

The final task is below:

- name: Install yq
  become: true
  when:
    - not _yq_existing_dest_check.stat.exists or _yq_existing_version_number != _yq_desired_version_number
  vars:
    _yq_existing_version_number: "{{ _yq_existing_version_info | regex_replace('.*version v(\\d+\\.\\d+\\.\\d+).*', '\\1') }}"
    _yq_latest_version_number: "{{
      _yq_latest.json.html_url
        | basename
        | regex_replace('^v(.*)', '\\1')
      }}"
    _yq_desired_version_number: "{{ cli_tools_yq_version | default(_yq_latest_version_number, true) }}"
    _url_base: https://github.com/mikefarah/yq/releases/download/
    _url: "{{ _url_base }}v{{ _yq_desired_version_number }}/{{ cli_tools_yq_release_name }}"
  ansible.builtin.get_url:
    url: "{{ _url }}"
    dest: "{{ cli_tools_yq_dest }}"
    owner: root
    group: root
    mode: 0775
    force: true

Full yq tasks file

» Back to table of contents «

Now let's see how yq.yml looks like:

roles/cli_tools/tasks/yq.yml

- name: Collect architecture facts
  ansible.builtin.setup:
    gather_subset: architecture

- name: Get latest version info as json
  when: cli_tools_yq_version | default('', true) == ''
  ansible.builtin.uri:
    url: https://api.github.com/repos/mikefarah/yq/releases/latest
  register: _yq_latest

- name: Check if {{ cli_tools_yq_dest }} exists
  ansible.builtin.stat:
    path: "{{ cli_tools_yq_dest }}"
  register: _yq_existing_dest_check

- name: Get the version information of the existing yq command
  changed_when: false
  when: _yq_existing_dest_check.stat.exists
  ansible.builtin.command: "{{ cli_tools_yq_dest }} --version"
  register: _yq_existing_version_info

- name: Install yq
  become: true
  when:
    - not _yq_existing_dest_check.stat.exists or _yq_existing_version_number != _yq_desired_version_number
  vars:
    _yq_existing_version_number: "{{ _yq_existing_version_info | regex_replace('.*version v(\\d+\\.\\d+\\.\\d+).*', '\\1') }}"
    _yq_latest_version_number: "{{
      _yq_latest.json.html_url
        | basename
        | regex_replace('^v(.*)', '\\1')
      }}"
    _yq_desired_version_number: "{{ cli_tools_yq_version | default(_yq_latest_version_number, true) }}"
    _url_base: https://github.com/mikefarah/yq/releases/download/
    _url: "{{ _url_base }}v{{ _yq_desired_version_number }}/{{ cli_tools_yq_release_name }}"
  ansible.builtin.get_url:
    url: "{{ _url }}"
    dest: "{{ cli_tools_yq_dest }}"
    owner: root
    group: root
    mode: 0775
    force: true

Run the final playbook

./run.sh playbook-system-base.yml \
  -e config_apt_update=true

Documenting Ansible roles

» Back to table of contents «

When you write an Ansible role, you can forget about the parameters and how you can use them. You can forget about some requirements which are needed before you use the role. It is a good practice to have a README file in the root folder of the role. If you want to share the role, it is even more important.

The README file could have any structure, but the recommended one is the following markdown structure:


text
role_name
=========

Description

Requirements
------------

List of requirements like the supported operating systems

Role variables
--------------



```yaml
role_variable: value
```



Description of the above variable

Dependencies
------------

List of dependencies like other roles

Example playbook
----------------



```yaml
- hosts: all
  roles:
    - role: role_name
      role_variable: value
```



License
-------

The name of the license

Author information
------------------

Your name or the name of your team and optional email address.

The description part is usually short, but I thought it would be a good idea to describe all the tools that the role would install, so mine is really long. I don't want to share the whole documentation, but you can find it on GitHub.

Conclusion

» Back to table of contents «

This is how a very simple task becomes a very complicated. I wanted to show you what command line tools I usually install on my Linux servers, which become a separate article. In Ansible, it required talking about Ansible facts and organizing our variables better. In my original role, I never overwrote the existing yq binary, and when I needed a new version, I could just remove the binary on the server and rerun the playbook. If you have many servers, it is better to automatically checking whether you have the desired version or not. It also demonstrated what it means to detect the existing state if there is no module to do that for you.

Now that we have a role to install the most important command line tools, we can reuse it later. For example, when we use Ansible to run new virtual machines in which we also want to have these tools and more. Coming soon in a following tutorial.

The final source code of this episode can be found on GitHub:

https://github.com/rimelek/homelab/tree/tutorial.episode.8

rimelek / homelab

Source code to create a home lab. Part of a video tutorial

README

Name: Using facts and the GitHub API in Ansible
Rating: 4.4 (4258 reviews)
Author: rimelek

This project was created to help you build your own home lab where you can test your applications and configurations without breaking your workstation, so you can learn on cheap devices without paying for more expensive cloud services.

The project contains code written for the tutorial, but you can also use parts of it if you refer to this repository.

Tutorial on YouTube in English: https://www.youtube.com/watch?v=K9grKS335Mo&list=PLzMwEMzC_9o7VN1qlfh-avKsgmiU8Jofv

Tutorial on YouTube in Hungarian: https://www.youtube.com/watch?v=dmg7lYsj374&list=PLUHwLCacitP4DU2v_DEHQI0U2tQg0a421

Note: The inventory.yml file is not shared since that depends on the actual environment so it will be different for everyone. If you want to learn more about the inventory file watch the videos on YouTube or read the written version on https://dev.to. Links in the video descriptions on YouTube.

You can also find an example inventory file in the project root. You can copy that and change the content, so you will use your IP…

View on GitHub

Using facts and the GitHub API in Ansible

Introduction

Table of contents

Before you begin

Requirements

Download the already written code of the previous episode

Have the inventory file

Activate the Python virtual environment

Ansible playbook and optional APT cache update

Ansible facts

Ansible role

Ansible role overview

Creating a symbolic link

Using multiple tasks files

Install the latest yq from GitHub

Using the GitHub API to get the latest release

Get the version number of the latest release

Get the architecture and operating system of the server

Saving helper variables in addition to defaults

Installing the desired version of yq

Skip downloading when the existing version is the desired one

Full yq tasks file

Documenting Ansible roles

Conclusion

rimelek / homelab

Source code to create a home lab. Part of a video tutorial

README