One of the HAProxy strengths is not being very strict about its configuration structure, enabling it to create configurations suitable for fitting very messy scenarios. Sadly this is also its biggest maintainability pitfall: especially if you want to automate its configuration using automation tools and templates, it is up to you to define the best possible standard configuration structure fitting your needs.
The "HAProxy Tutorial - A Clean And Tidy Configuration Structure" post is an insight providing guidelines on how to structure the HAProxy configuration in an effective way, promoting the sharing of floating IP addresses and using easy to edit maps for load balancing the traffic forwarding it to the correct destination. In addition to that, it also provides a way for splitting the statistics so to have them displayed only for the scope of each specific balanced service instead of as a whole.
Why Best Practices Are Needed
In the ideal world everything is always perfect and working as expected, but in real life in my experience too often I saw poorly designed solutions, not designed using standard patterns and even worse not in compliance with best practices. This is a very bad attitude, often due to a bad understanding of Agile's principles and framework and often poorly justified saying the culprit are too short milestones. The only sure outcome with this attitude is dramatically increasing the operational risks, pushing all the responsibilities on the operational teams.
Sadly this messy approach has often a severe impact also on the HAProxy configuration tidiness: HAProxy has intentionally been designed to enable combining directives in multiple ways: it is of course an handy feature, since it enables writing configurations that can address very complex and dirty solutions, but the drawback of this versatility is that as much the HAProxy's settings grows, the more it become complex, difficult to understand end error prone.
Best Practices
Before going further with the post's contents, I want to tell what in my personal experience are the best practices:
- A load balancer is just, ... a load balancer: it is not a firewall and most of all it is not a web application firewall - it is best to have these specific security softwares on other hosts or appliances. You can of course on the load balancer perform minimal protocol modifications, such as adding headers and setting cookies, but avoid too complex setups that just negatively impact performances and makes the whole setup difficult to understand and maintain (a bad habit that also raises the risk of human error by the way).
- Set up the load balancer in an high-available fashion, so it does not become itself the single point of failure.
- Try to keep the number of floating IP addresses small: always try to share the same floating IP address, dynamically choosing the load balance destinations using matching criteria such as the source IP address, the TLS Server Name Identifier (SNI) set by the client, the HTTP Host header and so on.
- Always try to take backend routing decisions using lower level protocols - for example, when dealing with HTTPS, unless you really need to inspect it (for example for adding headers or cookies), use the TLS Server Name Identifier (SNI) set by the client as the matching criteria for choosing the backend.
- Size and fine tune your system for boosting performance, and define connections and limit carefully evaluating your system's capacity to avoid resource starvations.
- When dealing with TLS, whenever possible, enable OCSP stapling, especially if the balanced services are public available services
- Design a standard configuration structure using patterns: this, besides making the configuration easier to understand, enables also to create it automatically from manifests and templates.
This post mostly focuses on providing a standard pattern based approach that can be used as a guideline for structuring the configuration for an HAProxy that can bear a huge amount of balanced services: the obvious advantage is that everything remains simple to understand, manage and even automate. Of course there can be individual corner cases you may have to manual address, but as said they must be very low in number (otherwise the problem is not technical, it is your solution architect), and you must try to address them without changing too much the proposed standard structure with design pattern.
Basic Concepts
Defining a standard always starts from defining some concepts and agreeing on some naming, just to avoid misunderstandings..
Cluster Name
A cluster generally speaking is just a group of hosts.The same way a hostname is assigned to each cluster member, the best practice is to assign also a name to the cluster: this is really helpful since this name can be used for example as an identifier for group of settings common to the whole cluster when dealing with automations, or just to have a name to put in the CMDB.
The best practice with cluster names is to derive it from the hostnames, creating a summary name using only the common part. For example, if we have the hostnames:
- haproxy-ca-up1a001
- haproxy-ca-up1a002
a few words in my naming scheme:
- "haproxy" is the technology running on the host
- "ca" is the corporate branch
- "u" stands for Unix-like - it is used for a quick guessing of the responsible team by simply looking at the hostname
- "p1" stands for production, security tier 1
- "a" is the datacenter
- "0" is the cluster number
- "01" is the progressive number
having said that, the summary name - that is the cluster name - is:
haproxy-ca-up1a-0
High-Available IP Addresses
Floating IP addresses are high-available IP addresses, meaning that despite they can be used by a node at a time, they are actually owned by more nodes, enabling them Floating IP addresses are high-available IP addresses, meaning that although they can be used by a node at a time, they are actually owned by more nodes, enabling them to float to another node if the one currently owning them would crash or become unreachable. The temporary assignment and floating of the IP addresses is often implemented by using the VRRP protocol, for example running services such as Keepalived.
Just as an example, let's suppose the following high available IP addresses
- 10.100.100.250
- 10.100.100.251
High-Available IP's FQDNs
High-available FQDNs are Fully Qualified Domain Names assigned (using a DNS A record) to the floating IP addresses. Since these IP addresses do not belong to a single node, but to the whole HAProxy cluster, their name must enable them to guess which cluster belongs.
For this reason, the best practice is to add a progressive number to the existing cluster name, and then add the domain.
For example, if the cluster name is:
haproxy-ca-up1a-0
and the domain name is:
p1.carcano.corp
then the floating High Available IP's FQDNs can be:
- haproxy-ca-up1a-0-1.p1.carcano.corp - in DNS configure an "A" record resolving to 10.100.100.250
- haproxy-ca-up1a-0-2.p1.carcano.corp - in DNS configure an "A" record resolving to 10.100.100.251
and so on.
Backend
A backend is an object aimed at spreading the traffic on a pool of instances of the same service running on different hosts in different availability zones. The backend is configured also to run health checks, excluding from the pool failed service instances, admitting them back again when they restore operations. Backends have of course other fine grain tunings, such as performance/rate limiting, sticky sessions and much more.
High-Available Service
It is the service defined on the load balancer: it provides load balancing and high availability by spreading the traffic on the backend's member services. It is set up by binding a listener to a single high-available IP address and port (or a set of contiguous ports) and providing all the necessary settings for balancing and (if necessary) inspecting the traffic: if the protocol enables to define backends selection using matching criteria (such as TLS Helo Server name Identifier (SNI), HTTP Host header, ...), then it is possible to have one listener shared among multiple high-available services, conversely it is necessary to define an additional instance of the listener, binding it to another high-available IP address.
High-Available Services FQDN
It is the FQDN of each high-available service. For example, if load balancing a git instance, the high-available FQDN can be:
git-ca-up1a-0-1.p1.carcano.corp
The high-available FQDN must resolve to the high-available IP's FQDN of the high-available IP address the high-available service is bound to. In DNS terms, continuing from the previous examples, if the available IP address to which the high-available. service is bound to is "10.100.100.250", then you must add a DNS CNAME record "git-ca-up1a-0-1.p1.carcano.corp" resolving to "haproxy-ca-up1a-0-1.p1.carcano.corp".
Using CNAME has the advantage that if the high-available IP would ever be changed, it is only necessary to change the high-available IP's FQDN: all the high-available services FQDN will be preserved, since they resolve on the high-available IP's FQDN. Of course it has the disadvantage of two distinct calls to the DNS when it comes to resolving the high available service FQDN. You can of course directly use A records also for the high-available Services FQDN, but you will lose this maintainability benefit - it's up to you to decide which is the best way for you.
HAProxy Statistic
HAProxy provides the nice feature of displaying its statistics in a web UI: the feature is very simple to activate: it is just a matter of defining a frontend and enabling statistics. But here also comes the operational shortcoming: especially on setups with a huge number of balanced services, there can be thousands of objects with a statistics table: this leads to generating a very long page. Of course this page has a filtering textbox, but it is not an handy way of working at all having to type the exact name of the listener, frontend or backend you want to filter multiple times until you can eventually to figure out the statistics of the whole single flow we are interested into: having statistics pages focused on each specific balanced service would be much more handy.
Luckily there's a way to limit the scope of the object shown, so it is possible for example to add statistics to the backend themselves - this way the statistics will display only the objects relevant to the objects traversed by the flow that lead to that backend. But sadly there's still a problem: pass-through flows (such as HTTPS pass-through) traverses "tcp mode" backends, whereas statistics can be enabled only on "http mode" backends. The way out is having a last resort "http mode" backend dedicated only to statistics, but it is reachable only if the traffic does not match any other backends - so no SNI or Host FQDN match. This is achievable only having an FQDN not related to any high available service, such as the high-available IP's FQDN: since they are related to the HAProxy itself, and never related to any high available service it will never match any backend but the last one, that is the one we reserve to statistics.
To clarify this concept, let's assume to have the following IMAPs high available services (port 993):
- imaps.fancytools.org
- imaps.foobar.net
Both of them are bound to and share the high-available IP having haproxy-ca-up1a-0-1.p1.carcano.corp as high-available IP's FQDN. When connecting using a mail client, either to "imaps.fancytools.org" or "imaps.foobar.net", the connection is routed to the backend matching the specified SNI.
But if we connect to the same port, but to this time "haproxy-ca-up1a-0-1.p1.carcano.corp" using a web browser (so either http or https), it does not match ani SNI, so the connection falls down to the statistics backend, showing the statistic data for that specific listener and all the backends that can be traversed through it.
HAProxy's Configuration Sections
Describing all the configuration settings in this post would be pointless - it would be a copy of the original manual, so I'm just providing the link to the official HAProxy manual where it is possible to find the settings for every HaProxy version. Remember: my aim with this post is just to provide a style, with a pattern based standard configuration structure.
The HAProxy configuration is generated concatenating one or more settings files divided in sections.
Global Section
The global section is used for configuring settings of the HAProxy service itself, such as the logging facility to use, the path for the PID file or the chroot path and so on.
Defaults Section
The default section is used for providing defaults that are inherited by all the Proxy Sections
Proxy Sections
HaProxy provides two distinct styles for configuring a Proxy:
- defining a Listener
- defining a Frontend with one or more Backends - mind that Backends may also be shared among more Frontends
High Available Service's Traffic Flow - From The Listener To The Last Resort Backend
Before going further, we must describe the stages of the traffic flowing through the HAProxy: the traffic coming from clients requesting a load balanced service is processed by the following objects in a flow ordered as listed below::
Listener
Proxy
Statistics backend
it displays the statistics reports about every listener, every proxy and every backends that can be reached during this flow - so not only the actually traversed ones.
Configuration Directory Structure
We can now define the first part of the configuration standard: the configuration directory structure. In the "High Available HAProxy Tutorial With Keepalived", we modified the HAProxy Systemd service unit not only for downloading the settings from a remote Git repository, but also to load its settings from several "/etc/haproxy" sub-directories, creating them as necessary if they are missing.
These directories are:
ca
certs
listeners
proxies
directory containing the definitions of HTTP (or HTTPS) proxies: these are connection terminating listening endpoints bound only to loopback IP address (127.0.0.1), receiving only the traffic forwarded by the listeners. Each proxy must be defined in a dedicated file with the filename starting by the port number followed by (depending on the proxy type) either "http" or "https", and ending by ".cfg". Examples of file names are "80-http.cfg", "443-https.cfg" and so on.
maps
directory containing the map files dedicated to each specific listener instance or to each proxy. These files are just look-up tables to choose the backend matching the lookup criteria. Each map filename must have the same name of the listener or proxy it is used as a lookup from. Examples of file names are "80-tcps.1", "80-http", "443-https" and so on (the ".1" number is used to identify which instance of the listener using port 80 the map is used by).
backends
directory containing the definitions of the backends. Each backend must be defined in a dedicated file with the filename starting by a meaningful name that enables easily guessing which are the pool's members servers, and ending by ".cfg". Examples of file names are "www-ca-up1a-0.cfg", "git-ca-up1a-0.cfg"and so on.
stats
directory containing the definitions of the statistics backends. Each statistics backend must be defined in a dedicated file with the filename starting by the statistics backend's name and ending by ".cfg". Examples of file names are "80-tcp.cfg", "443-tcps.cfg" and so on.
in addition to the above directories, there are:
- the "haproxy.cfg" file, containing only the "global" and "default" sections
- the "crt-list.txt" containing the options to apply to each TLS certificate (since HAProxy 2.8r1 or above you can also enable OCSP from here).
The Configuration Pattern
As we saw with the configuration directory structure, we are using a configuration pattern leveraging on grouping components by purpose and type, leveraging on an hypothetical flow going from the TCP (or TCPS listener) through HTTP (or HTTPS) proxies, backend and statistics backend, using file maps for lookups.
Listener
Listeners are the actual endpoints clients connect to when requesting load balanced services: they must be bound only to the floating IP addresses - mind that despite it being OK to bind multiple floating IP addresses, listeners must be bound to one port or to a range of contiguous ports only (this is not a real HAProxy constraint, but not respecting this rule may lead to messy setups). In order to perform the backend's selection on both layer 4 protocols (such as TCP), and layer 7 protocols (such as HTTP), listeners must always be set in "tcp mode", and alway forward as the last resort to an HTTP proxy.
Listeners can process plain text TCP connections or TLS protected ones: in this case we refer to them as TCPS listeners.
They behave in a slightly different way::
- TCP listeners can use only one backend, since it is not possible to write matching rules for dynamic backend selection
- TCPS listeners can instead use multiple backends.
They also have a slightly different conditional flow:
- TCP listener check if the incoming connection is not an HTTP connection: if it isn't they redirect the flow to the backend using the map file related to the listener itself
- TCPS listeners check if the incoming connection is a TLS one: if it is like so, they check the Server Name Identifier (SNI) exchanged during the TLS Hello and redirect the flow to the backend using the map file related to the listener itself
So both of them rely on a dedicated map file to select the proper backend - the only difference is that TCP listeners' map files contains only one row, with the "*" character used as a wildcard, to let people know that the backend in the map file matches any request. Conversely,
TCPS listeners' map files contain one entry for each Server Name Identifier (SNI) with the related backend to forward the traffic to.
The naming standard for listeners is to start the name by the port number followed by (depending on the listener type) either "-tcp." or "-tcps." followed by the instance number for that specific port. For example "8012-tcp.1", "8012-tcp.2", "8054-tcps.1" and so on.
As we just said it is not possible to have dynamic backend selection on TCP listeners because it is not possible to write matching criteria. When there's such a need, the only way out is to have distinct listeners bound to the same port - this of course requires having multiple dedicated floating IP addresses. This leads to having multiple definitions for the same listener so, to avoid naming collisions, we always must put the instance progressive number in the listener's name.
Example TCP Listener
This example TCP listener is bound to the high-available IP address 10.100.100.250 on port 6379. Please mind that this is a TCP listener, so it supports only unencrypted connections. Since the IP address is shared between multiple high-available services, the incoming connection can be either TCP or HTTP depending on the high available service type. Regarding this, mind that since HAProxy cannot inspect other protocols besides HTTP, we can have only one high-available TCP service (every connection is always forwarded to the same backend). When dealing with HTTP high available services, the traffic is instead forwarded to the HTTP proxy that can take care of forwarding it to the proper backend.
The example snippet requires to have already defined the "proc.tcp_listener_default_backend" variable: just add the following entry in the "global" section in the main "haproxy.cfg":
set-var proc.tcp_listener_default_backend str("*")
Here is the example snippet - create it as "/etc/haproxy/listeners/6379-tcp.cfg":
listen 6379-tcp.1
mode tcp
description bound to 10.100.100.250:6369
bind 10.100.100.250:6379
option tcplog
# inspection of the protocol to detect if it is an HTTP connection
tcp-request inspect-delay 5s
tcp-request content accept if HTTP
# if it is NOT an HTTP connection, lookup the backend to use.
# in this case, since the protocol cannot be inspected, we don't really have a matching criteria,
# so the lookup is for the "*" entry - that is the only entry of the table
use_backend %[var(proc.tcp_listener_default_backend),lower,map(/etc/haproxy/maps/6379-tcp.1)] unless { req_proto_http }
# if we are here, the incoming connection is HTTP, so we fall back to the
# HTTP proxy running on the same port of this listener
server to_proxy 127.0.0.1:6379 check
You are certainly wondering why we are using a lookup table that can have just one and only one entry for selecting the only suitable TCP backend - it is to be in compliance with the listener's design pattern - TCPS listeners indeed use the same mechanism but can really have multiple entries in the lookup table. In addition to that, the pro of this approach is that when you want to alter the destination backend you always must operate only on the map file: when dealing with automations, it is much more easy and less risky to alter a row in a map rather than an entry in a configuration settings file.
Example TCPS Listener
This example TCPS listeners is bound to the high-available IP address 10.100.100.250 on port 6443, supporting both TLS-passthrough as well as HTTPS terminated connections: the first are directly forwarded to the related backend, whereas the latter are forwarded to the HTTPS proxy running on the same port the listener is bound to. If the incoming connection is plain text and using the HTTP protocol, it is instead immediately forwarded to the statistics backend.
Create it as "/etc/haproxy/listeners/6443-tcps.cfg":
listen 6443-tcps.1
mode tcp
description bound to 10.100.100.250:6443 - SNI matching using /etc/haproxy/maps/6443-tcps.1 for lookups
bind 10.100.100.250:6443
option tcplog
# inspection of the TLS helo from client to get the Server Name Identifier (SNI)
tcp-request inspect-delay 5s
tcp-request content capture req.ssl_sni len 100
tcp-request content accept if { req_ssl_hello_type 1 }
# since this is a TLS endpoint, every connection must use TLS.
# the only expected plain text connections are the HTTP connection for viewing statistics
# In this case the traffic is immediately forwarded to the statistics backend
use_backend 6443-tcps-stats if !{ req.ssl_sni -m found }
# if we are here, the connection is using TLS and the client provided the Server Name Identifier (SNI)
# so we lookup for a TLS pass-through backend matching the supplied SNI
use_backend %[req.ssl_sni,lower,map(/etc/haproxy/maps/6443-tcps.1)] if { req.ssl_sni -m found }
# if we are here, the supplied SNI didn't match any pass-through backend, so we fall back to the
# HTTPS proxy running on the same port of this listener
server to_proxy 127.0.0.1:6443 check
Proxy
Proxies are endpoints bound only to the loopback IP address (127.0.0.1), receiving only the traffic forwarded by the listeners, capable of inspecting the specific protocol used in the traffic.
HTTP
HTTP proxies are endpoints bound only to the loopback IP address (127.0.0.1), set in "http mode" so to be used to inspect HTTP requests. This kind of proxy takes forwarding decisions by inspecting the content of the HTTP Host header, falling back to the stats backend if there are no matches.
Each http proxy must be defined in a dedicated file with the filename starting by the port number followed by "http" and ending by ".cfg". Examples of file names are "80-http.cfg", "6379-http.cfg" and so on.
The following example is the snippet the http proxy continuing the flow in the previous TCP listener example - create it as "/etc/haproxy/proxies/6379-http.cfg":
frontend 6379-http
mode http
description match by Host header using /etc/haproxy/maps/6379-http for lookups
bind 127.0.0.1:6379
http-request set-header X-Forwarded-Proto http
# we lookup for backend matching the supplied HTTP Host header
use_backend %[req.hdr(host),lower,map(/etc/haproxy/maps/6379-http)]
# if we are here, the supplied HTTP Host header didn't match any backend, so we fall back to the
# statistics backend
default_backend 6379-tcp-stats
HTTPS
Are you enjoying these high quality free contents on a blog without annoying banners? I like doing this for free, but I also have costs so, if you like these contents and you want to help keeping this website free as it is now, please put your tip in the cup below:
Even a small contribution is always welcome!
HTTPS proxies behave exactly the same way as HTTP proxy, but they provide a TLS server certificate and terminate the TLS connection - mind that this means that the connection to the backend's pool members will be new connections originating from the HAProxy node itself: this has of course benefits, bit it also bring drawbacks, such as not being able to perform mutual-TLS authentication from caller client with the pool member.
Each https proxy must be defined in a dedicated file with the filename starting by the port number followed by "https" and ending by ".cfg". Examples of file names are "443-http.cfg", "6443-http.cfg" and so on.
The following example is the snippet the https proxy continuing the flow in the previous TCPS listener example - create it as "/etc/haproxy/proxies/6443-https.cfg":
frontend 6443-https
mode http
description match by Host header using /etc/haproxy/maps/6443-https for lookups
# here we define also the path to the certificates directory (ha-proxy takes care
# of selecting the certificate the best matches the Server Name Identifier (SNI)
# passed by the client
# we also specify the path to the crt-list file, that contains TLS settings specific
# for each single certificate - such as enable OCSP stapling
bind 127.0.0.1:6443 ssl crt /etc/haproxy/certs crt-list /etc/haproxy/crt-list.txt
# we lookup for backend matching the supplied HTTP Host header
use_backend %[req.hdr(host),lower,map(/etc/haproxy/maps/6443-https)]
# if we are here, the supplied HTTP Host header didn't match any backend, so we fall back to the
# statistics backend
default_backend 6443-tcps-stats
For the sake of completeness, this is the contents of the "/etc/haproxy/crt-list.txt" file:
/etc/haproxy/certs/carcano.com.pem [alpn h2,http/1.1 ocsp-update on]
Other Protocols Proxies
As we said, HAProxy can inspect only HTTP and HTTPS. Anyway you can exploit his configuration pattern to chain to the flow third party servers and have them process other protocols.
For example, if you need an IMAP proxy, you can:
- configure a TCP listener on port 143, forwarding non-http connections to port 143 on the loopback interface
- run an NGINX instance bound to the loopback address configured only for proxying IMAP connections
Since the listener is managed by HAProxy, you will have connection statistics among the HAProxy stats - IMAP specific details will instead be available in NGINX log files.
Since talking about NGINX would be off-topic in this post, there are no further details about this: I'm just mentioning there's this trick.
Map Files
Map files are text based key-value files used by listeners and HTTP or HTTPS proxies to perform lookups for choosing the backend where to forward the incoming request to: when looking up using the criteria specific for that map file, if there is a match, then the value - that is the destination backend's name, is returned.
The lookup use the following matching criteria:
- TCPS listeners's maps file key is the Server Name Identifier (SNI) the client must provide at TLS Helo time
- HTTP or HTTPS proxies file key is the HTTP Host header the client must provide in the HTTP request
- TCP listeners's maps file key can only be the "*" , that is used as a catch-all - this is because in this use case we cannot have a real matching criteria. The obvious consequence is that, conversely from all the other map files, a TCP listener's map file contains only one row having "*" as the key, followed by the only usable destination backend's name as value.
To keep their size small also on systems with a huge setting, each endpoint or proxy has its own dedicated map file. Each map filename must have the same name of the listener or proxy it is used as a lookup from. Examples of file names are "80-tcps.1", "80-http", "443-https" and so on (the ".1" number is used to identify which instance of the listener using port 80 the map is used by).
Example TCP Listener Map File
This snippet is the content of the "/etc/haproxy/maps/6379-tcp.1" used in the TCP Listener example.
* redis-ca-up1a-0.p1.carcano.group:6379
as we said, this map file can contain only one entry and the key must be the wildcard "*": in this example it returns only the name of a tcp backend with a pool of Redis servers.
Example TCPS Listener Map File
This snippet is the content of the "/etc/haproxy/maps/6443-tcps.1" used in the TCPS Listener example: in the TCPS use case, the key column (the first one) is the TLS Server Name Identifier (SNI), whereas the value column(the second one) is the name of the related tcps backend.
kubernetes-0.p1.carcano.com kube-ca-up1a-0.p1.carcano.group:6443
kubernetes-1.p1.carcano.com kube-ca-up1a-1.p1.carcano.group:6443
In this example, depending on the specified SNI, it returns the name of a tcps backend with a pool of Kubernetes API servers.
Example HTTP Listener Map File
This snippet is the content of the "/etc/haproxy/maps/6379-http" used in the HTTP Listener example: in the HTTP use case, the key column (the first one) is the HTTP Host Header, whereas the value column(the second one) is the name of the related http backend.
fancy-xml-0.p1.carcano.com tomcat-ca-up1a-0.p1.carcano.group:6379
fancy-xml-1.p1.carcano.com tomcat-ca-up1a-1.p1.carcano.group:6379
In this example, depending on the specified HTTP Host Header, it returns the name of a http backend with a pool of Fancy XML services (a custom REST API for transforming XML documents into various formats).
Example HTTPS Listener Map File
This snippet is the content of the "/etc/haproxy/maps/6443-https" used in the HTTPS Listener example: in the HTTPS use case, the key column (the first one) is the HTTP Host Header got after HTTPS inspection, whereas the value column(the second one) is the name of the related https backend.
arcgis-0.p1.carcano.com arcgis-ca-up1a-0.p1.carcano.group:6443
In this example, if "arcgis-0.p1.carcano.com" is HTTP Host Header specified as the key to match, it returns the name of a http backend with a pool of ArcGIS services (a comprehensive geospatial platform) .
Backends
Backends are configuration objects providing the settings for spreading the load of a specific service running on a pool of member servers. Their operational mode must be the same mode of the object referring to them. This means that:
- backends used by TCP or TCPS listeners must be in "tcp mode"
- backend used by HTTP or HTTPS proxies must be in "http mode"
The backend object does not only enable the user to choose the load balancing algorithm and the health check: it also provides fine grain control on a lot of different settings such as performance/rate limiting, sticky sessions and much more.
Each backend must have a meaningful name that enables easy guessing which are the pool's members servers along with the actual target port. For example, if the members are "tomcat-ca-up1a001", "tomcat-ca-up1a002", and "tomcat-ca-up1a003" (all belonging to the "p1.carcano.group" domain), and the target port is https (port 443), the backend name is "tomcat-ca-up1a-0.p1.carcano.group:443". If the target port is http (port 80) the backend name is "tomcat-ca-up1a-0.p1.carcano.group:80" and so on.
Backend definitions must be grouped in files having as filename the backend's name without the part referencing the target service. As usual the filename must end by ".cfg". For example the above described "tomcat-ca-up1a-0.p1.carcano.group:80" and "tomcat-ca-up1a-0.p1.carcano.group:443" backends must both be defined in the "tomcat-ca-up1a-0.p1.carcano.group.cfg" file.
TCP Backends
This snippet is the content of the the "/etc/haproxy/backends/redis-ca-up1a-0.p1.carcano.group.cfg" file used in the TCP Listener example:
backend redis-ca-up1a-0.p1.carcano.group:6379
description Redis on cluster redis-ca-up1a-0.p1.carcano.group
mode tcp
option tcp-check
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send "info replication\r\n"
tcp-check expect string role:master
tcp-check send QUIT\r\n
tcp-check expect string +OK
timeout connect 3s
timeout server 20s
server redis-ca-up1a001 10.100.28.10:6379 check inter 1s
server redis-ca-up1a002 10.100.29.10:6379 check inter 1s
server redis-ca-up1a003 10.100.30.10:6379 check inter 1s
This backend defines a pool of Redis services, detecting which one is the master and sending the traffic only to it (the other nodes, despite being operational, will be marked as failed). If the master fails, after the new master election in the Redis cluster, the backends detect the new master, forwarding traffic to it instead of the failed one.
This snippet is the content of the the "/etc/haproxy/backends/kube-ca-up1a-0.p1.carcano.group.cfg" file used in the TCPS Listener example:
backend kube-ca-up1a-0.p1.carcano.group:6443
mode tcp
description Kubernets master on cluster kube-ca-up1a-0.p1.carcano.group
option tcp-check
balance roundrobin
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server kube-ca-up1a001 10.100.85.30:6443 check
server kube-ca-up1a002 10.100.86.30:6443 check
server kube-ca-up1a003 10.100.87.30:6443 check
This backend defines a pool of Kubernetes API services, round-robin spreading the load on them.
HTTP Backends
This snippet is the content of the "/etc/haproxy/backends/tomcat-ca-up1a-0.p1.carcano.group.cfg" used in the HTTP Proxy example:
backend tomcat-ca-up1a-0.p1.carcano.group:6379
description kubernetes ingress https on cluster tomcat-ca-up1a-0.p1.carcano.group
mode http
stats enable
stats uri /haproxy
stats scope 6379-tcp.1
stats scope 6379-http
stats scope tomcat-ca-up1a-0.p1.carcano.group:6379
option httpchk HEAD /healthz HTTP/1.0
server tomcat-ca-up1a001 10.100.77.51:6379 check weight 1 maxconn 1024
server tomcat-ca-up1a002 10.100.78.51:6379 check weight 1 maxconn 1024
server tomcat-ca-up1a003 10.100.79.51:6369 check weight 1 maxconn 1024
besides declaring the pool members, configuring a specific HTTP check along with come fine-grain settings, it also enables statistics for the "/haproxy" uri: this means that it the "/haproxy" is specified when connecting to the high-available service, then limited scope statistics page with the statistics of the "6371-tcp.1" Listener, of the "6379-http" Proxy and of the "tomcat-ca-up1a-0.p1.carcano.group:6379" Backend itself gets displayed.
This snippet is the content of the "/etc/haproxy/backends/arcgis-ca-up1a-0.p1.carcano.group.cfg" used in the HTTPS Proxy example:
backend arcgis-ca-up1a-0.p1.carcano.group:6443
description kubernetes ingress https on cluster arcgis-ca-up1a-0.p1.carcano.group
mode http
stats enable
stats uri /haproxy
stats scope 6443-tcps.1
stats scope 6443-https
stats scope arcgis-ca-up1a-0.p1.carcano.group:6443
option httpchk HEAD /healthz HTTP/1.0
server arcgis-ca-up1a001 10.100.67.65:6443 check weight 1 maxconn 1024 ssl verify none
server arcgis-ca-up1a002 10.100.68.65:6443 check weight 1 maxconn 1024 ssl verify none
server arcgis-ca-up1a003 10.100.69.65:6443 check weight 1 maxconn 1024 ssl verify none
this backend, besides using TLS, works exactly the same way as the previous one: besides declaring the pool members, configuring a specific HTTP check along with come fine grain settings, it also enables statistics for the "/haproxy" uri: this means that it the "/haproxy" is specified when connecting to the high-available service, the limited scope statistics page with the statistics of the "6443-tcps.1" Listener, of the "6443-https" Proxy and of the "arcgis-ca-up1a-0.p1.carcano.group:6443" Backend itself gets displayed.
Statistics Backends
Statistics Backends are "http mode" backends configured with the sole purpose of displaying statistics of every listener bound to a specific port along with every proxy and backend that can be reached within the traffic flow originating from that.
The name of a statistics backend is derived by combining the port number of the listeners it refers to along with the listener's protocol and a trailing "-stats". For example, the name of the backend for the listeners "6379-tcp.1", "6379.2-tcp" and of all the other possible instances port 6379 and protocol "tcp" is "6379-tcp-stats". Each statistics backend must be defined in a dedicated file with the filename starting by the statistics backend's name and ending by ".cfg". Examples of file names are "6379-tcp.cfg", "6443-tcps.cfg" and so on.
Since the purpose is showing the statistics of every configuration object traversed by the flow starting from the linked listeners, then, besides the listener itself and the http (or https) proxy, it must provide the statistics of every backend listed in the maps files of the listener and of the http (or https) proxy traversed.
To complete the previous example, create the "/etc/haproxy/stats/6379-tcp-stats.cfg" file with the definition of the backend assembling statistics for the traffic incoming from port 6379 protocol "tcp" - in our example the only listener matching this port and protocol is the "6379-tcp.1" Listener:
backend 6379-tcp-stats
description statistics for 6379-tcp
mode http
stats enable
stats uri /haproxy
stats scope 6379-tcp.1
stats scope 6379-http
stats scope redis-ca-up1a-0.p1.carcano.group:6379
stats scope tomcat-ca-up1a-0.p1.carcano.group:6379
stats scope tomcat-ca-up1a-1.p1.carcano.group:6379
As you see, besides the listeners statistics ("6370-tcp.1"), it also shows
- the backend for non HTTP traffic traversing the "6370-tcp.1" listener ("redis-ca-up1a-0.p1.carcano.group:6379")
- the statistic of the only http proxy that can be traversed ("6379-http")
- the statistics of the HTTP backends that can be traversed from the "6379-http" proxy ("tomcat-ca-up1a-0.p1.carcano.group:6379" and "tomcat-ca-up1a-1.p1.carcano.group:6379")
Then, create the "/etc/haproxy/stats/6443-tcps-stats.cfg" file with the definition of the backend assembling statistics for the traffic incoming from port 6443 protocol "tcps" - in our example the only listener matching this port and protocol is the "6443-tcps.1" Listener:
backend 6443-tcps-stats
description statistics for 6443-tcps
mode http
stats enable
stats uri /haproxy
stats scope 6443-tcps.1
stats scope 6443-https
stats scope kube-ca-up1a-0.p1.carcano.group:6443
stats scope kube-ca-up1a-1.p1.carcano.group:6443
stats scope arcgis-ca-up1a-0.p1.carcano.group:6443
As you see, besides the listeners statistics ("6443-tcps.1"), it also shows
- the backends for TLS-passthrough traffic traversing the "6443-tcps.1" listener ("kube-ca-up1a-0.p1.carcano.group:6443" and "kube-ca-up1a-1.p1.carcano.group:6443")
- the statistic of the only https proxy that can be traversed ("6443-https")
- the statistics of the HTTPs backend that can be traversed from the "6443-https" proxy ("arcgis-ca-up1a-0.p1.carcano.group:6443")
Footnotes
Here it ends our tutorial on structuring HAProxy's configuration in an clean and tidy way: the proposed standard ha still a few shortcomings, such as not being able to use both encrypted and TLS traffic on the same endpoint - this can actually be done by increasing a little bit the complexity, at the cost of a negative maintainability impact, so I preferred to avoid it. The perfect solution to avoid increasing complexity would be to apply ACLs to the "server" directives in the listener, but sadly HAProxy does not support this (yet? We will see).
If you appreciate this strive please and if you like this post and any other ones, just share this and the others on Linkedin - sharing and comments are an inexpensive way to push me into going on writing - this blog makes sense only if it gets visited.
Stephan H. Wenderlich says:
As always, valid content for the real experts. I highly appreciate your articles.
Marco Antonio Carcano says:
Many thanks for your feedback and for your support Stephan, … very appreciated :O)
Fiorenzo Bianchi says:
Nice guide on keeping HAProxy configs organized! I’ve often struggled with managing messy setups, so this perspective was refreshing. Curious, how do you balance between flexibility and maintaining a clean structure? Also, stumbled upon a similar resource on https://sebbie.pl/tag/javascript/ that touches on organizing configurations in JavaScript projects, which was quite insightful too. Thanks for sharing!
Marco Antonio Carcano says:
A good cure could be “terminating” the Kaos generators, but sadly for now it’s illegal – I’m of course kidding.
I think the only way is to try to apply design patterns and best practices whenever possible, but I’m aware that green fields are very rare, and so often you have to deal with old and so to say “clever” and so to say “Agile” setups. Probably you can try to rationalise the mess by combining automations and templates to perform the cleanup, applying changes in small batches, repeating this process as necessary until possible.
You certainly will find some “clever things” you cannot sort out: when this happens, you should treat them as exceptions that your automation must be able to handle.
Good luck Fiorenzo, … you are not alone in this messy IT world!