We will create a new Ansible role and a playbook to automate the installation of the Command line tools I always install on Ubuntu servers. Having the installer and Ansible role is not enough. It is always a good practice to document the role, what it is for and how people can use it, so we will discuss that too.
The new features we will learn about today are the following:
Using multiple tasks files and including a tasks file in the main.yml.
Using Ansible facts, disabling them and gathering a subset of the available facts.
If you want to run the playbook called playbook-lxd-install.yml, you will need to configure a physical or virtual disk which I wrote about in The simplest way to install LXD using Ansible. If you don't have a usable physical disk, Look for truncate -s 50G <PATH>/lxd-default.img to create a virtual disk.
How you activate the virtual environment, depends on how you created it. In the episode of The first Ansible playbook describes the way to create and activate the virtual environment using the "venv" Python module and in the episode of The first Ansible role we created helper scripts as well, so if you haven't created it yet, you can create the environment by running
Before we can talk about the role, we have to start with a playbook. Previously, we only had playbooks for specific tasks like installing and removing LXD. The goal is to have a playbook that installs the common dependencies with which you can play on the remote servers even without Ansible, so when you are trying to do something new, you don't have to start with yaml files without even knowing what you want to do in the end. Let's call this playbook file "playbook-system-base.yml", and for now, add only the role that we will create soon.
-hosts:allroles:-role:cli_tools
We still assume that all our machines that we configure in the inventory file are targets. It will change, but not in this post.
This ansible role will contain the installation of lots of APT packages. We could have other roles that want to install APT packages, so we also want to make sure the APT cache is up-to-date. It would be a waste of time to update the cache in every role, so we will update it in a pre task:
In this case we use the built-in "apt" module to update the cache, without installing anything, but apparently, updating the cache will also mean the task will always report a change. To disable that, we add changed_when: false to the task. We also want a way to skip the updater pre task. When you have to run a playbook 10 times in two minutes while you are developing it, updating the cache every time is simply not necessary. We add a condition which will use the new config_apt_update variable. If it is not defined in the inventory file, we use "false" as default value, but you can always override it from command line.
There is one more line we need to add to the playbook.
When you run a playbook, as the very first step, Ansible detects devices and collects information for example about networks and the version of the Linux distribution. The collected information will be available through variables and these are the facts. Sometimes you don't need these facts, and you want to speed up the execution of the playbook, especially when you have to run it on 100 servers or on just a couple but very often during development. If that is the case, you can set gather_facts: false in the playbook like this:
If you use roles you didn't write, and you don't want to find out what facts they need, just leave the facts gathering enabled.
Now you may think you understand the difference between the variables we used before and the facts, but in fact, you can also define facts using the set_fact builtin module. So one more important difference is the scope. You can define a variable in a task, but that variable will not be available in the next task. Facts are available everywhere, and you can also cache them, so when you define a fact, run the playbook, remove the definition and rerun the playbook, you can still read the fact from the cache. Of course it depends on the used cache plugin, and the default is memory. So by default, the facts are not available when you run a playbook the second time. If ou want to see how persistent fact caching works, the following example can show it.
The new Ansible role will be called "cli_tools". The structure of the role will be the following:
defaults/
main.yml: The place for default parameter values.
vars/
main.yml: A file to store helper variables which are not intended to be changed by the user. You can use this file if the alternative is storing the variables in the tasks file, which requires creating a block only for those variables.
tasks/
main.yml: The default tasks file that we always used
yq.yml: An additional tasks file which we can refer to and load in the main.yml.
README.md: This is basically the documentation of the role containing everything that helps the user to understand how the role can be used, what it expects to be already installed and so on. We will discuss it in more details later.
Most of the packages I install on a Debian-based Linux can be installed from an APT repository, but using the built-in apt module that we already used before is not really interesting, so let's just jump to the interesting part. Sometimes, I just want to have an alias for a command, and that's where I will create a symbolic link like now to point to the pygmentize command. The built-in files module can create a symbolic link if the state field is "link".
-name:Create "highlight" as a symbolic link to "pygmentize" | Install formatting tools for scripting and user-friendly outputsbecome:trueansible.builtin.file:state:linksrc:/usr/bin/pygmentizedest:"{{cli_tools_highlight_dest}}"
The destination could have been static, but I wanted to make it changeable, so I will have a default value for that in defaults/main.yml.
Installing yq will be complicated, but I don't want to complicate my main tasks file.
The built-in include_tasks module can load another tasks file and expects the name of the file and executes the tasks in it.
-name:Include tasks from another fileansible.builtin.include_tasks:file.yml
It can be useful in different situations, but in this case, I didn't want to keep the most complicated installation process in the main file. The main.yml can also be shorter this way. For more details about how this module can be used, don't forget to check the documentation I linked above.
The following code is the part of main.yml in the cli_tools role which shows all the 3 modules I used in the main.yml, and also includes a block. Most of the tasks will be familiar since we used the APT module before and I also shared the symlink part, but the last task is an include.
roles/cli_tools/tasks/main.yml
-name:Install formatting tools for scripting and user friendly outputsblock:-name:APT packages | Install formatting tools for scripting and user-friendly outputsbecome:trueansible.builtin.apt:name:-jq# to handle json files-python3-pygments# to highlight codes with "pygmentize"-name:Create "highlight" as a symbolic link to "pygmentize" | Install formatting tools for scripting and user-friendly outputsbecome:trueansible.builtin.file:state:linksrc:/usr/bin/pygmentizedest:"{{cli_tools_highlight_dest}}"-ansible.builtin.include_tasks:yq.yml
I didn't use the "name" parameter in the last task, because the tasks in the included file will have names, so it wouldn't really help to understand the role better and wouldn't add more value to the logs either. It was not my idea. I named every single task until I read about this point of view and I agreed. Unfortunately, I don't have a link to the source.
We can finally discuss the most interesting part. I want to install "yq" from GitHub, which will require two more default variables in defaults/main.yml:
The version number is empty, which will mean that I want to install the latest version. I tried to find a link directly to the latest release, but it turned out, there was no such link. However, the GitHub API can tell us which one is the latest. If you just want to get the URL to download the latest version, you can try the following in the terminal:
This will be really important, because it has all the information we need, and it has it more than once.
Let's see how you can call the API endpoint from Ansible:
-name:Get latest version info as jsonwhen:cli_tools_yq_version | default('', true) == ''ansible.builtin.uri:url:https://api.github.com/repos/mikefarah/yq/releases/latestregister:_yq_latest
The built-in uri module allows us to call the endpoint and save the json response into a variable. Of course we want to do that only if the requested version number is empty, that's why we compare the version number to an empty string.
In the previous section, you could see that we could get the download url from the json response, which contains the version number, the architecture and also the operating system. The response also shows that these are the only differences in the download URLs. The download URL is the only thing we need, but sometimes we want to specify the version number instead of getting the latest version. So instead of using the above information to filter to the URL that we know exactly how it looks like, we can just build the URL from scratch. The first important part of that URL is the version number, but the version number can also be found in the html_url field, which does not require to list the release files.
Assuming you already have jq on the server, you can run the following:
We need to the version number in Ansible. We registered the json response in _yq_latest. It will have a property called "json", which is not a string. It is in fact a decoded version of the json string, since The "uri" module recognized json in the HTTP response header. The above bash command can be replaced with the following Jinja template in Ansible:
We also used a very simple regular expression telling Ansible to remove the leading "v" from the version number. Removing the "v" is not really important. It was just my preference to work with only the numbers.
We now have the latest version number, and we know that we want to use that as the default value and also be able to override it. This is how you do it:
The next important thing after the version number is the release name. The release always starts with "yq_" followed by the operating system and the architecture. We will need the uname command to get name of the operating system (darwin on macOS and linux on Linux) and the arch command to get the CPU architecture. Unfortunately, amd64 can also be called x86_64 and arm64 can also be called aarch64, so let's use sed to fix that.
uname
arch
Output:
Linux
x86_64
While we could use the uname and the arch commands to get the operating system and the CPU architecture in the terminal, we can use facts in Ansible. Since we disabled the fact gathering, we have to use the built-in setup module to get the architecture and the operating system.
Variables in Ansible can be defined in many places. In a role, we can have defaults, but we can also have variables which are not for changing them (although we could change them too), but only for organizing our templates, so we don't have to define all the variables in the tasks files.
The architecture and the operating system is the two most important pieces of information to build the final URL. We have to convert those to a format that can be used in the download URL.
We converted the nme of the operating system to lowercase, and replaced the architecture with the alternative names. Yes, in this project we support only these two.
Output:
linux
amd64
In Ansible, we will save the templates in vars/main.yml, so it is another folder called "vars/" at the same level as "defaults/".
This is how we will always get arm64 or amd64. Since we get the OS name from fact, it would also work on macOS. It doesn't mean the whole role would work, since we also use the APT package manager, but you could try to move the yq installation into a separate role. Whether you want to use multiple tasks files or a new role, it's up to you.
The last thing we did was defining the full release name.
We can use the following task, but in this case, we choose the built-in get_url module instead of uri.
-name:Install yqbecome:truefailed_when:_yq_install.status_code not in [200, 304]vars:_yq_latest_version_number:"{{_yq_latest.json.html_url|basename|regex_replace('^v(.*)','\\1')}}"_yq_desired_version_number:"{{cli_tools_yq_version|default(_yq_latest_version_number,true)}}"_url_base:https://github.com/mikefarah/yq/releases/download/_url:"{{_url_base}}v{{_yq_desired_version_number}}/{{cli_tools_yq_release_name}}"ansible.builtin.get_url:url:"{{_url}}"dest:"{{cli_tools_yq_dest}}"owner:rootgroup:rootmode:0775force:true
There is one parameter I have to explain.
force:true
Without this parameter wo couldn't update an already installed yq. It tells Ansible to override the downloaded file.
Skip downloading when the existing version is the desired one
Previously, we always overwrote the installed version, which required downloading the file every time. To avoid that we need the version of the already installed yq, and do that only if it is already installed.
To find out if the file is already downloaded, we can use the built-in stat module.
-name:Check if {{ cli_tools_yq_dest }} existsansible.builtin.stat:path:"{{cli_tools_yq_dest}}"register:_yq_existing_dest_check
Now the boolean _yq_existing_dest_check.stat.exists variable tells you whether it exists or not. In the terminal, you would get the version number like this:
yq --version
Output:
yq (https://github.com/mikefarah/yq/) version v4.40.5
It's not just a version number, so we will use regular expression again, but first we get the version info in Ansible:
-name:Get the version information of the existing yq commandchanged_when:falsewhen:_yq_existing_dest_check.stat.existsansible.builtin.command:"{{cli_tools_yq_dest}}--version"register:_yq_existing_version_info
I used the cli_tools_yq_dest parameter so the task will work even if the path of the base folder is missing from the PATHS environment variable.
We need to apply the following filter on the version info:
-name:Collect architecture factsansible.builtin.setup:gather_subset:architecture-name:Get latest version info as jsonwhen:cli_tools_yq_version | default('', true) == ''ansible.builtin.uri:url:https://api.github.com/repos/mikefarah/yq/releases/latestregister:_yq_latest-name:Check if {{ cli_tools_yq_dest }} existsansible.builtin.stat:path:"{{cli_tools_yq_dest}}"register:_yq_existing_dest_check-name:Get the version information of the existing yq commandchanged_when:falsewhen:_yq_existing_dest_check.stat.existsansible.builtin.command:"{{cli_tools_yq_dest}}--version"register:_yq_existing_version_info-name:Install yqbecome:truewhen:-not _yq_existing_dest_check.stat.exists or _yq_existing_version_number != _yq_desired_version_numbervars:_yq_existing_version_number:"{{_yq_existing_version_info|regex_replace('.*versionv(\\d+\\.\\d+\\.\\d+).*','\\1')}}"_yq_latest_version_number:"{{_yq_latest.json.html_url|basename|regex_replace('^v(.*)','\\1')}}"_yq_desired_version_number:"{{cli_tools_yq_version|default(_yq_latest_version_number,true)}}"_url_base:https://github.com/mikefarah/yq/releases/download/_url:"{{_url_base}}v{{_yq_desired_version_number}}/{{cli_tools_yq_release_name}}"ansible.builtin.get_url:url:"{{_url}}"dest:"{{cli_tools_yq_dest}}"owner:rootgroup:rootmode:0775force:true
When you write an Ansible role, you can forget about the parameters and how you can use them. You can forget about some requirements which are needed before you use the role. It is a good practice to have a README file in the root folder of the role. If you want to share the role, it is even more important.
The README file could have any structure, but the recommended one is the following markdown structure:
text
role_name
=========
Description
Requirements
------------
List of requirements like the supported operating systems
Role variables
--------------
```yaml
role_variable: value
```
Description of the above variable
Dependencies
------------
List of dependencies like other roles
Example playbook
----------------
```yaml
- hosts: all
roles:
- role: role_name
role_variable: value
```
License
-------
The name of the license
Author information
------------------
Your name or the name of your team and optional email address.
The description part is usually short, but I thought it would be a good idea to describe all the tools that the role would install, so mine is really long. I don't want to share the whole documentation, but you can find it on GitHub.
This is how a very simple task becomes a very complicated. I wanted to show you what command line tools I usually install on my Linux servers, which become a separate article. In Ansible, it required talking about Ansible facts and organizing our variables better. In my original role, I never overwrote the existing yq binary, and when I needed a new version, I could just remove the binary on the server and rerun the playbook. If you have many servers, it is better to automatically checking whether you have the desired version or not. It also demonstrated what it means to detect the existing state if there is no module to do that for you.
Now that we have a role to install the most important command line tools, we can reuse it later. For example, when we use Ansible to run new virtual machines in which we also want to have these tools and more. Coming soon in a following tutorial.
The final source code of this episode can be found on GitHub:
Source code to create a home lab. Part of a video tutorial
README
This project was created to help you build your own home lab where you can test
your applications and configurations without breaking your workstation, so you can
learn on cheap devices without paying for more expensive cloud services.
The project contains code written for the tutorial, but you can also use parts of it
if you refer to this repository.
Note: The inventory.yml file is not shared since that depends on the actual environment
so it will be different for everyone. If you want to learn more about the inventory file
watch the videos on YouTube or read the written version on https://dev.to. Links in
the video descriptions on YouTube.
You can also find an example inventory file in the project root. You can copy that and change
the content, so you will use your IP…