Ansible is an extremely powerful data center automation tool: most of its power comes from not being too strict into defining a structure - this enables it to be used into extremely complex scenarios as well as to very quickly set it up in quite trivial scenarios.
But this is a two edged sword: too many times I saw POC for adopting it permed POC with too poor requirements, thinking they can reuse what they experimented as a baseline for structuring Ansible: this is a very harmful error that quickly lead to unmaintainable real life environments with duplicated code and settings, often stored into structures without a consistent logic or naming, so losing the most of the benefits of such a great automation tool.
Ansible inventory best practices: caveats and pitfalls is the post from where we begin exploring how to properly structure Ansible to get all of its power without compromises, structuring things in an easy and straightforward way suitable for almost every operating scenario.
Gather The Requirements
As solutions architects must know, the very first thing for evaluating a product is not to start using it or reading trivial tutorials - this is the perfect recipe for delivering a mess.
The very first thing to do is define the requirements.
Use case Scenarios
The only way to define requirements is having a clear vision of the use cases.
Let's start from answering the question "What":
What
- Configuring infrastructural services, this spans from configuring the OS environment of Virtual Machines and bare metal to configuring services such as Database Engines, Application Servers, Web Servers, Load Balancers and even Networking Devices or Appliances
- Delivering Services: mind that a service is the sum of single configuration items such as Database Instances, Virtual Host, Application Server instance, GIT Repository, but it also has infrastructural requirements, such as firewall exceptions, that must be addressed before its delivery. In addition to that, an automation tool must also implement rolling releases with strategies such as canary releases or blue-green.
We must also answer “Who”, so the actors - this is very important for adding all the necessary security measures into the design.
Who
In this fictional example, users are from the below teams:
- Networking Team (operate only networking devices)
- IT Operation Team (operate both Virtualization Environment, the OS environment, Application Server and Load Balancers)
- Database Team (operate only databases)
- Security Team (manages the PKI, Identity Services, Proxies and Web Application Firewalls)
- Services Team (specialists about each service: delivering, monitoring, troubleshooting, ….)
The next question is “When”.
When
some of them operate during working days, others 24x7: since the solution we are delivering (Ansible) must work for all of them, the answer to this question, that actually is the availability level, is “24x7”.
Last question for defining requirements is “Why” - the question "How" is answered by the actual solution, so it is not part of the requirements.
Why
Because we want to improve the time-to-market when delivery of service, using a solution that is straightforward enough to limit human errors to the bare minimum and promote a quick onboarding of new staff.
Now that we have the requirements, we can start investigating the tool's features to see if they are enough and how to use them to match the requirements.
Ansible Configurations Sources - A Quick Walkthrough
Since the Inventory is not the only configuration source, in order to learn how to structure it we must of course know which are the others, how do they work and how do they relates to the Inventory: only this way we can make a proper guess drawing up an optimal structure that do not overlap or even worse clash with the other configuration sources.
The Ansible Inventory
The Ansible Inventory is a configuration source capable of providing both topological and playbook settings configurations: it provides them as a document, most of the time formatted as INI or YAML. The actual source can be a single file (often called "hosts", or "hosts.yml" as well as the result of concatenating multiple files within a directory: it is even possible to use an application or script as inventory files - this enables generating inventories on the fly based on subsets of the CMDB that make sense for each specific run, a feature exploited for example by Katello (the Red Hat Network Satellite Server's upstream project).
The inventory contains
- Target Hosts
- Hostgroups
- Variables
Target Hosts
Any host listed in the inventory is called target host,
An inventory can be as simple as follows:
localhost
pgsql-ca-up1a001
git-ca-up1a001
In this very trivial inventory we declare 2 target hosts ("pgsql-ca-up1a001" and "git-ca-up1a001") along with the "localhost" target.
In real life things are of course much more complex, and Inventories also define groups of target hosts..
Hostgroups
Entries in the inventory contained into square brackets (stanza definitions in the INI format terms) are used to define groups of target hosts - they are called hostgroups.
Hostgroup's purpose is twofold:
- they are used to provide targets while running Ansible avoiding to list hosts one by one
- they are used to bound variables - binding variables to hostgroups binds them to every target host that is direct or indirect member of that hostgroup, sparing from having to manually listing them for each single host - we will discuss this shortly
Figuring out a good strategy for defining hostgroups looks trivial, but it isn't.
Host_vars
Variables provided by the inventory are called host_vars: the most common and easy way for declaring them is listing them into a file with the same name of the target host the variables are bound to, storing that file beneath the "host_vars" directory within the inventory's directory tree.
The following snippet for example defines the "ansible_connection" host_var with the value "local":
ansible_connection: local
It must be put in the "environment/host_vars/localhost.yml" file so to have it assigned only to the "localhost" target host. With this variable set like so, Ansible does not use any kind of connection while running plays affecting "localhost" as target: it just runs the tasks.
Group_vars
To avoid unnecessary and cumbersome to maintain repetitions, it is possible to list variables common to every host member (directly or indirectly) of a specific hostgroup by putting them into a file with the name of that hostgroup, stored beneath the "group_vars" directory - these variables are commonly called group_vars indeed.
Mind of the special built-in group "all": its purpose is to define variables for every target host in the inventory..
For example, we can assign a variable with the name of the current environment to every host in the inventory by adding to the following contents to the "environment/group_vars/all.yml" file:
environment: lab
Listing The Ansible Inventory's Contents
If necessary, it is possible to inspect the contents of an Inventory processed by Ansible by running:
ansible-inventory --list -i /ansible/environment
or, if you prefer to display the generated graph:
ansible-inventory --graph --vars -i /ansible/environment
Vars Files
The second most used configuration source are vars_files: these files can be used to provide variables that are specific for each run - for example re-defining performance settings of a service managed by Ansible for performance tuning. These YAML formatted files containing variables that are be loaded during the play.
Other Configuration Sources
Other ways of providing configuration to Ansible are:
- command line variables: these variables are set while running the "ansible-playbook" statement
- lookups: this method is used to get values from third party external services, such as secrets
- environment variables: these are variables fetch from the shell environment using lookups
Facts
Ansible also has some special variables that are called "facts". These special variables can be:
- host facts: they are gathered when connecting to the host, or fetched from a cache containing the previously discovered ones
- runtime facts: they are defined - or redefined on the fly somewhere during the play (same way as you define and redefine variables when working with any programming language)
- local facts: they are loaded from JSON or INI formatted files stored beneath the target host's "/etc/ansible/facts.d" directory
Ansible Configuration Objects
Now that we know the configuration sources, we can focus on which are the configuration objects and formats supported by Ansible - these are:
- strings - mind that strings containing only number or only booleans are automatically mapped to numbers or booleans
- dictionaries
- list
- lists of dictionaries
These objects are most of the time provided using YAML format, despite JSON and INI formats being supported if necessary.
The Solution
We are ready to draw a solution that must fit every requirement we defined so far: instead of just describing the solution, in this paragraph I'm providing a fully functional example, describing the rationales behind each specific choice.
A good use case to hit all every requirement is delivering PostgreSQL on target hosts. This use case indeed requires :
- to be able to specify the PostgreSQL version to install
- to be able to provide host specific performance tweaks
- to be able to provide cluster specific topological settings, such as rules for the system firewall
- to be able to create database specific for single deliverables
As by best practices, instead of writing a full playbook with all the necessary tasks on our own, we are first having a look into the online Ansible Galaxy to see if there's any already available well supported Ansible role.
A quick check shows that it exists the "galaxyproject.postgresql" Ansible role: since it looks like an official (and so well maintained) one, we can just use this shelf Ansible role, including it in the playbook we are writing.
Let's install the "galaxyproject.postgresql" Ansible role using the "ansible-galaxy" command line utility as follows:
ansible-galaxy role install -p /ansible/roles galaxyproject.postgresql
the above statement runs the "ansible-galaxy", specifying to install the downloaded role in the "/ansible/roles" directory within the container ("-p" command line parameter).
After installing it, have a look to the "ansible/roles/galaxyproject.postgresql/README.md" file to see the variables that can be passed to the role and that so must be put into the configuration structure we are about to describe and implement.
Initial Bare-minimum Configuration
We must of course start with the bare minimum configuration - first we must create the inventory's directory:
mkdir -m 755 ansible/environment
once done, we configure the bare minimal "host_vars" - create the host_vars sub-directory:
mkdir -m 755 ansible/environment/host_vars
and define the "ansible_connection" host_var only for the "localhost" target host - just create the "ansible/environment/host_vars/localhost.yml" file with the following contents:
ansible_connection: local
as we said, this is a special variable used to tell Ansible not to use any kind of connection while running plays affecting localhost as target, but just run the tasks.
Addressing IT Operations And Networking Team's Needs
Let's start by seeing how to address the IT Operations team and the Networking team common needs.
Infrastructure's Slice Targets
The first need is to have meaningful targets that enable running statements on a large scale.
By the operational perspective, an infrastructure can be typically sliced into different purpose-specific ways:
- target hosts belonging to the same availability zone
- target hosts belonging to the same datacenter
- target hosts belonging to the same cluster
All of the above can be further grouped by os family.
When doing operational tasks it can happen relatively often to perform operations on hosts belonging to these slices or to subsets generated by the intersections of them.
The best practice is, besides defining host_groups for every cluster (or you won't be able to define group_vars), providing also host_groups for each of the above kind of slices: this enable to be extremely quick when performing time sensitive operations such as shutting down every host of an availability zone during an incident, quickly cordoning a datacenter, and so on.
As an example, create the "ansible/environment/hosts" inventory file with the following contents:
localhost
# security tier: 1, environment: prod, os: unix/linux, svc: load-balancers, cluster: 0
[lb_ca_up1]
lb-ca-up1a001
lb-ca-up1b002
# security tier: 1, environment: prod, os: unix/linux, svc: git, cluster: 0
[git_ca_up1]
git-ca-up1a001
git-ca-up1b002
# security tier: 1, environment: prod, os: unix/linux, svc: postgresql, cluster: 0
[pgsql_ca_up1]
pgsql-ca-up1a001
pgsql-ca-up1b002
pgsql-ca-up1c003
# security tier: 1, environment: prod, os: unix/linux, availability-zone: a
[ca_up1a]
pgsql-ca-up1a001
git-ca-up1a001
lb-ca-up1a001
# security tier: 1, environment: prod, os: unix/linux, availability-zone: b
[ca_up1b]
pgsql-ca-up1b002
git-ca-up1b002
lb-ca-up1b002
# security tier: 1, environment: prod, os: unix/linux, availability-zone: c
[ca_up1c]
pgsql-ca-up1c003
# os: unix/linux, environment: prod, availability-zone: a
[ca_upNa:children]
ca_up1a
# os: unix/linux, environment: prod, availability-zone: b
[ca_upNb:children]
ca_up1b
# os: unix/linux, environment: prod, availability-zone: c
[ca_upNc:children]
ca_up1c
# os: unix/linux
[ca_up:children]
ca_upNa
ca_upNb
ca_upNc
As said, we can limit the actual targets by specifying the "--limit" clause using pattern matching and booleans.
To test it and so verify the actual target we can run the "ansible" statement with the "--list-hosts" option.
For example, to limit the targets to every unix machine in the "ca" datacenter not belonging to the availability zone "a":
ansible all -i /ansible/environment/hosts --list-hosts -l 'ca_up:!ca_upNa'
the output is as follows:
hosts (4):
lb-ca-up1b002
git-ca-up1b002
pgsql-ca-up1b002
pgsql-ca-up1c003
Target Specific Settings
A very common practice - not only in Ansible - is to assign labels to objects. I strongly recommend doing this because it provides several benefits, including implementing safety measures to prevent playbooks form running tasks on wrong hosts by mistake.
For example, it is possible to create the "hosts_labels" list for the "pgsql-ca-up1a001" host and assign the "postgresql" label - just create the "ansible/environment/host_vars/pgsql-ca-up1a001.yml" file with the following contents
host_labels:
- postgresql
To exploit it, it is enough to add a "when" condition in the PostgreSQL related playbook to run tasks only when the "hosts_labels" list contain the "postgresql" label.
Both the IT Operations Team and the Networking team have the need of delivering settings typically related to the infrastructure's topology, such as
- network devices settings
- operating system settings (sysctl tweaks, system-firewall rules, corporate-wide PKI's trust-stores, ...)
- service-specific settings, such as performance tweaks, cluster-level settings and so on
These settings are then merged with templates to generate the actual managed configuration files.
This kind of settings are perfect candidates for being stored within the inventory as host_vars or group_vars.
to see it in action, create the "group_vars" sub-directory as follows:
mkdir -m 755 ansible/environment/group_vars
In our Lab, as an example, we install PostgreSQL 14: - a quick look to the "ansible/roles/galaxyproject.postgresql/README.md" file shows us that the PostgreSQL version to install is set by the "postgresql_version" variable, so we assign the "postgresql_version" group_var to "14" to the "pgsql-ca-up1" hostgroup so to have it inherited by every host belonging to it - just create the "ansible/environment/group_vars/pgsql_ca_up1.yml" file with the following contents:
postgresql_version: 14
We can of course also add other settings, such as:
postgresql_backups_dir: "/srv/backups"
In this case "postgresql_backups_dir" is the path to the default directory where to store PostgreSQL's database backups.
We can also add the system firewall rules:
firewall:
- # from subnet apps_p1_a_s0 to service pgsql
# ticket: NET-54271
rule: apps_p1_a_s0_to_pgsql
src_ip: 192.168.254.0/24
service: postgresql
action: accept
state: enabled
in this example we are granting access to the PostgreSQL service from the whole 192.168.254.0/24 subnet.
Addressing Database Administrators Team's Needs
Database Administrators very often do not really get along with automation tools such as Ansible (I'm not saying all of them) - in my personal experience the most of them like to operate the old way directly on their system.
A way to let them operate as they like thus yet having an automation is agreeing that, instead of directly operating the database engine configuration file, they must modify a configuration file that is used by Ansible to generate the actual configuration file used by the Database engine, and to just run the playbook to apply the changes.
This can be achieved by setting local facts directly on the database hosts.
Local Facts
Local facts are JSON formatted files containing settings that Ansible can read directly from the target host. It is possible to provide them JSON formatted files compatible with the Ansible roles, so that the only thing the Database Administrators have to do is configuring these JSON file and ask someone of the IT operations team to run Ansible on behalf of them.
This approach is often a win-win:
- the central inventory does not grow with these settings
- DBA does not need to learn how to write Ansible roles and playbooks: they just need to learn the syntax of these JSON files
As an example, we can create some local facts on the "pgsql-ca-up1a001" host: connect to the host and create the "/etc/ansible/facts.d" directory:
sudo mkdir -m 755 /etc/ansible /etc/ansible/facts.d
now create the "/etc/ansible/facts.d/postgresql.fact" facts file with the following contents:
{
"conf": [
{ "listen_addresses": "'*'" },
{ "max_connections": 50 }
],
"pg_hba_conf": [
"host all all all md5"
]
}
we can check the outcome by running:
ansible pgsql-ca-up1a001 -m ansible.builtin.setup -a "filter=ansible_local"
Playbooks
Playbooks are a complex topic that deserves a thorough explanation in a dedicated post, again talking about best practices, caveats and pitfalls - there's no room for a bottom up approach without a well defined standard, especially about naming and structure, unless you like working in a mess (I saw a lot of mess around, sadly).
Playbooks can be classified by grouping them by purpose - there exists:
- delivery targeted playbooks - these are playbooks aimed at delivering configurations (such as firewall rules) or configuration items (such as database schemas) to existing services
- deploy targeted playbooks - these are playbooks aimed at deploying services
These playbooks can be further classified as follows:
-
- playbooks aimed at deploying a service without dependencies
- playbooks aimed at deploying a service with dependencies - for example an application that requires a database schema. In this case it is very convenient developing them so that they are configured using a blueprint: this eases configuration management, since everything is in the same place and the configuration structure itself enables an easy understanding of the overall service details. In addition to that, it makes it very easy to deploy other service instances, by simply creating a copy of the blueprint and modifying the settings as necessary.
The "Ansible playbooks best practices: caveats and pitfalls" post goes through all of this in details.
Footnotes
Structuring the Ansible inventory the proper way is the first step for proficiently working with Ansible - in this post we saw an example that addresses a quite complex use in a real life case. But as usual things must always be tailored on the specific needs, so always use your own brain - take enough time to gather requirements, challenge them, define standards and then do a proper design.
I saw a lot of semi-official and even official and blazoned framework that are very specific in tracking the progress of the development (“more governance for everyone!” - TM), completely forgetting the design process (they just pass from gathering users stories directly to development) - use that approach with Ansible, and you will soon realize how fragile and risky is blindly following them.
2 thoughts on “Ansible inventory best practices: caveats and pitfalls”