This is the 4th article in my series about infrastructure monitoring. Let’s recap shortly. I discussed the motivation for automating my infrastructure and defined four sets of requirements: infrastructure management, application management and service discovery. The first two articles explained my infrastructure, the steps to install the basic OS, and using Ansible for system management and package installation.
This article is a tutorial about writing a complex Ansible role that will install the service monitoring and discovery software Hashicorp Consul. Consul will fulfill the following application management requirement:
- AM4: An interface in which I see the status of each application (health, exposed endpoints)
This article originally appeared at my blog.
Ansible Roles
Roles are a concept that enables reusability. A role defines a set of tasks that are executed against a host to achieve a desired state. This is not different from what a playbook does, but roles provide a standard structure and can be reused in different playbooks or even by other users [^1].
Roles that should be executed on hosts are named inside a playbook. For example, if we want to install the new_role
on all hosts, we can define the playbook as follows [^2].
- name: Install new role
- hosts: all
- roles:
- new_role
Roles have a basic directory structure. By executing ansible-galaxy role init new_role
, you will get the following diretory layout.
new_role
├── README.md
├── defaults
├── files
├── handlers
├── meta
├── tasks
├── templates
├── tests
└── vars
What goes into these directories? In tasks and handlers include the concrete tasks that executing your role encompasses. In files, you put static configuration file, and in templates, you put files that will include variables. Finally, variables include all dynamic values that your playbook needs, and defaults encompasses variables that other roles or playbooks might overwrite. If you want to share your role on ansible galaxy, include relevant information in meta.
Now let’s apply this knowledge to write an Ansible role for install Consul on all our nodes.
Consul
Hashicorps Consul is a service monitoring and discovery software. It helps us to define, monitor and connect services that run on our nodes. A service is any software, for example a web server, an application, and it can run on the nodes itself or inside a container (Docker). Consul provides a command line interface, HTTP API enpoints, and a Web UI to monitor all systems.
Install Consul with a Playbook
In order to install Consul, we need to do this step on each node:
- Determine the correct Linux distribution and hardware
- Create consul group, user and directories
- Download and link the consul binary
- Copy configuration files
- Enable Consul with systemd
All these steps will be fully automated with Ansible!
Step 1 - Determine the correct Linux Distribution and Hardware
We actually need to differentiate three versions
- raspi-0: Armelv5
- raspi-4 and raspi-3: armhfv6
- linux-server: linux amd64
Here is the corresponding Ansible task. The consul version is configured as a role variable. Then, for each host, I define the src file name and the src url to use them in later tasks.
- name: Set vars when architecture is armv7l
set_fact:
consul_src_file: "consul_{{ consul_version }}_linux_armhfv6.zip"
consul_src_url: "https://releases.hashicorp.com/consul/{{ consul_version }}/consul_{{ consul_version }}_linux_armhfv6.zip"
consul_src_hash: "https://releases.hashicorp.com/consul/{{ consul_version }}/consul_{{ consul_version }}_SHA256SUMS"
when: ansible_facts.architecture == 'armv7l'
Step 2 - Create Consul Group, User and Directories
For security reasons, it is advisable to create dedicated users that will own the consul configuration files and binaries. We can use the Ansible modules group
and user
. Finally, we create the directory.
- name: Create consul group
group:
name: consul
state: present
system: true
- name: Create consul user
user:
name: consul
group: consul
shell: /bin/false
home: /etc/consul/
state: present
- name: Create consul dir
file:
state: directory
path: "{{ consul_base_directory }}"
owner: consul
mode: 0755
Step 3 - Download and Link the Consul Binary
We will download the binary to a temporary directory, unzip it to the target directory, and provide a symlink. When downloading, Ansible provides an option to check the download result with a SHA256 checksum - we will also use this.
- name: Get consul binary
get_url:
url: "{{ hostvars[inventory_hostname].consul_src_url }}"
dest: /tmp
checksum: sha256:{{ hostvars[inventory_hostname].consul_src_hash }}
- name: Unzip consul binary
unarchive:
src: /tmp/{{ hostvars[inventory_hostname].consul_src_file }}
dest: "{{ consul_base_directory }}"
remote_src: true
mode: 0744
- name: Create symlink
file:
src: "{{ consul_base_directory }}/consul"
dest: /usr/local/bin/consul
state: link
Step 4 - Copy Configuration Files
In the config file, we specify important information like the name of the datacenter, the directory from which to read the config files, the node name and central IP address to which the nodes will join. Here is the template file, in which we will print concrete values during execution of the tasks.
{
"datacenter": "{{ consul_datacenter }}",
"data_dir": "{{ consul_base_directory }}",
"log_level": "INFO",
"enable_local_script_checks": true,
"enable_syslog": true,
"node_name": "{{ ansible_facts.hostname }}",
"retry_join": ["{{ consul_main_server_ip }}"]
}
Consul reads all configuration files from a directory in alphabetic order. We will copy the basic configuration consul.json
to all nodes, and additionally the server.json
to the server.
- name: Copy consul config
template:
src: config.json.j2
dest: "{{ consul_base_directory }}/config.json"
mode: 0644
notify: Restart consul
- name: Copy consul server config
template:
src: server.json.j2
dest: "{{ consul_base_directory }}/server.json"
mode: 0644
when: consul_role == 'server'
notify: Restart consul
Step 5 - Enable Consul with Systemd
Consul is just a binary. If we want to start it during booting, we need to provide a sytemd service file. We will use the Consul service file template, copy it to the host, and enable the service.
- name: Copy consul service file
template:
src: consul.service.j2
dest: /etc/systemd/system/consul.service
mode: 0644
notify: Restart consul
- name: Set file permissions
file:
path: "{{ consul_base_directory }}"
owner: consul
group: consul
recurse: true
- name: Enable consul service
systemd:
service: consul.service
enabled: true
notify: Restart consul
Running the Playbook
We can finally run the playbook. Executed on my machine, it looks like this:
Consul will be automatically installed on all nodes. The very first time I ran this playbook, I understood the power of Ansible to apply complex installation steps to an arbitrary number of nodes! My infrastructure only consists of 5 nodes, but the playbook could configure 50 nodes too.
Using Consul
Now that Consul is running, let’s see some of its capabilities. In Consul, you should define 3 or 5 server nodes, and any number of client nodes. Server node take the role of monitoring and exchanging the cluster state, things like number of nodes, services running on these nodes, and more.
Consul offers an a rich set of command line commands. For now, I will only show how nodes can connect to each other to form the cluster.
Let’s manually start a Consul agent on the command line. We see important properties like the node name, datacenter, and IP address.
$ consul —agent
==> Starting Consul agent…
Version: 'v1.5.2'
Node ID: '094163f6-9215-6f2c-c930-9e84600029da'
Node name: 'raspi-3-1+'
Datacenter: 'infrastructure_at_home' (Segment: '<all>')
Server: true (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-E
Another command shows all members of the same cluster.
$ consul members
Node Address Status Type Build Protocol DC Segment
raspi-3-2 192.168.2.202:8301 alive server 1.7.1 2 infrastructure_at_home <all>
raspi-4-1 192.168.2.203:8301 alive server 1.7.1 2 infrastructure_at_home <all>
raspi-4-2 192.168.2.204:8301 alive server 1.7.1 2 infrastructure_at_home <all>
raspi-3-1 192.168.2.201:8301 alive client 1.7.1 2 infrastructure_at_home <default>
Because of the config file that we used, all nodes are already connected. Nodes can join a cluster manually by using consul join <ipaddress>
.
Finally, Consul also provides a Web UI that shows the nodes in the server, their health status, and the services running on them.
Conclusion
This article explained how to write a complex Ansible playbook for installing Consul. We walked through the five installation steps that created the required users, downloaded and linked the binary, copied templated config files (for differentiating server and agent nodes) and to start the service at runtime.
Consul provides the management view of all nodes in my cluster. For now, we barely scratched the surface of what is possible. In the next post, we will continue to set up the infrastructure by installing Docker and Hashicorp Nomad, a job scheduler.
Footnotes
[1]: On Ansible Galaxy you can search and download roles for a vast set of tasks and operating systems.
[2]: Important note: Roles are executed before any other tasks in your playbook! So, in this example, the task to print the hostname will be run after the role. If you need tasks to run beforehand, use the pre_tasks
declaration.