The Ultimate AWK Tutorial For Professionals

AWK is a powerful pattern scanning and processing language developed by Alfred Aho, Peter Weinberger and Brian Kernighan at Bell Labs - the name of this tool is indeed derived by concatenating the letter of their surnames to one another. It is one of that tools that every Linux professionals (not only the more seasoned ones) must be skilled on, since it is broadly used in a lot of shell scripts that very often are inherited from predecessors and that must be maintained: the sad truth is that very often is not worth the effort to rewrite them using other more modern languages, so knowing how to deal with it can really ease your life. And anyway, ... sometimes it requires much less time to code an AWK one liner than a Python script, so knowing how and when to use AWK is certainly a valuable skill still nowadays.
The aim of "The Ultimate AWK Tutorial For Professionals" is not to provide a complete explain about how to code with AWK - there are more modern and handy languages such as Python nowadays: I just want to provide a very quick yet comprehensive walkthrough on it focusing on how to write AWK one-liners that are often embedded in shell scripts or that you can use to sort out common system administration tasks. That's why I'm also showing some real-life use cases with AWK one-liners that can very quickly and easily sort things out.

Acquainting to AWK

Conversely from sed, that is simply "an utility", AWK is a line-oriented pattern scanning and processing language - this means that you can type statements, even with conditional and loop blocks, that execute actions against streams of textual data. Besides using plain-text pattern matching, AWK can extensively use regular expressions.

AWK is a huge topic: of course we cannot explore everything about it, and of course you do not actually need to know everything about it. The aim of this post is just to give some hints to help understanding what AWK is and when it is worth the effort to use it. If you want to know more on it, … there's its official manual.

Before going on let's create the "gods-of-it.csv" file with the following contents:

Dennis,Ritchie,UNIX and C
Ken,Thompson,UNIX and C
Bjarne,Stroustrup,C++
Richard,Stallman,GNU
Timothy John,Berners-Lee,World Wide Web
Linus,Torvalds,Linux
Theo,de Raadt,OpenSSH and OpenBSD and NetBSD
Phil,Zimmermann,PGP
Brian Jhan,Fox,GNU BASH
Larry,Wall,Perl
Guido,Van Rossum,Python

we use this file to do some basic processing hands-on with AWK.

Fields Variables

When processing a line (often called a "record") AWK splits it into fields by tokenizing the string using the field separator (the default is to use the space ' ' character) and assigning each field to a special variable referenced by the $ character followed by a number - so:

$1 is the first field
$2 is the second field

and so on.

The whole line itself is stored into the $0 variable.

Mind that the field separator can be set also simply using the -F command line option.

SHell Invocation

The most common invocation of AWK is using a one liner.

One Liner

As an example, look at the output of the lsscsi command:

[0:0:0:0] disk ATA VBOX HARDDISK 1.0 /dev/sda 
[0:0:1:0] disk ATA VBOX HARDDISK 1.0 /dev/sdb

we can easily get the disk device with SCSI path "0:0:1:0" with the following AWK one liner:

lsscsi | awk '{if($1=="[0:0:1:0]") print $7 }'

the output is as follows:

/dev/sdb

the above statement:

runs lsscsi shell utility
pipes the output to AWK

AWK reads the contents coming from the pipeline by line: if the contents of the 1^st field of the line is "[0:0:1:0]", AWK prints the contents of the 7^th field.

Loading From Statements Files

One liners are suitable for small statements, a use case tipycall of shell scripts; anyway, despite nowadays developing an AWK program is not fashionable, you can even run AWK making it load the statements to execute from a statement file.

For example, we can run the same statement loading it from a file:

create the foo.awk statement file with the same AWK statement we just run as one liner:

{
    if ( $1=="[0:0:1:0]" ) print $7
}

now let's run the same shell pipeline, but specifying the -f command line option that makes AWK load the statements from the "foo.awk" file:

lsscsi | awk -f foo.awk

the output is still

/dev/sdb

Please mind that you can specify the -f option multiple times so to process more than just one statement file.

Statement files are processed in the same order they are specified in the command line.

AWK basic syntax

Now that we know AWK's purpose, it has come the time to see the basic syntax of its statements.

pattern-actions statements

AWK processes a set of

pattern-action statements
optional function definitions

the most basic structure of an AWK script has the following syntax:

pattern { actions }

whenever AWK reads a line from the input, it checks if the contents do match the pattern.

The actions are performed when:

the matching is true
the pattern is omitted (since it's logically same to an always matching pattern)

you may of course need to:

specify more than just one pattern-actions statement
specify actions that must be performed before beginning to read the lines (simply specify the BEGIN pattern keyword)
specify actions that must be performed after reading all the lines (simply specify the END pattern keyword)

So the full structure of an AWK statements script may looks like as follows:

BEGIN pattern { actions }
pattern { actions }
pattern { actions }
...
pattern { actions }
END pattern { actions }

Patterns

Patterns are regular expressions that must match to have the actions triggered.

The default action is print the matching line, so running awk specifying only the pattern makes it behave a little bit like grep.

Line matching pattern

This makes AWK check the match into the whole line - for example, let's try to get the list of filesystems set to be dumped by the dump utility (these are the lines with the 5th field set to "1" - see "man fstab" for more details on this topic, itf interested):

awk '/ 1/' /etc/fstab

the output on my system is:

UUID=a62c5b49-755e-41b0-9d36-de3d95e17232 / ext3 defaults 0 1
LABEL=pgsql_data /var/lib/pgsql ext4 defaults,noatime 1 0

as you see, two lines match, but only the second one actually is of a file-system set to be dumped by the dump utility. You may be tempted to turn the matching pattern into "/ 1 /", but you risk the same error if the first line has a trailing white space by mistake.

The problem here is that we are using a line-matching pattern, while we need a field matching pattern.

Field-matching pattern

AWK provides a more tailored pattern matching system that is targeted to specific fields.

For example, we can select only the lines of the "/etc/fstab" file that matches the equal to "1" search pattern only on the 5^th field (the field dedicated to the dump utility) as follows:

awk '$5=="1"' /etc/fstab

this time the output is only:

LABEL=pgsql_data /var/lib/pgsql xfs defaults,noatime 1 0

so this time we performed a more fine-grained lookup, restricting the output to only the lines that actually contain dump enabled file-systems.

We can also use regular expressions to match a specific field.

For example, to match the filesystems that are mounted with the "noatime" mount option:

awk '$4 ~ /.*noatime.*/' /etc/fstab

this time the output is only:

ULABEL=pgsql_data /var/lib/pgsql xfs defaults,noatime 0 0

you can of course specify a regular expression that matches the whole line as follows:

awk '$0 ~ /.*noatime.*/' /etc/fstab

Negating the pattern

If we need to run actions to lines that do not match the pattern, we can negate it by using the ! character as follows:

!/pattern1/ { actions }

for example:

awk '!/ swap/' /etc/fstab

please note that when dealing with a regular expression, you have to negate the match itself, so the "!" must be put right before the "~" character.

For example, to print every mount point configured in /etc/fstab but the ones with the "noatime" option:

awk '$4 !~ /.*noatime.*/' /etc/fstab

Row number

A matching pattern may be the row number, ... for an example, see the NR special variable.

Logical Operators

Sometimes a single pattern is not enough to uniquely identify the lines we need to run actions, and so we need to specify multiple patterns that must match using a logical OR or a logical AND.

Logical AND

You may need to run actions only when multiple matching criteria match at the same time.

This time we must specify multiple patterns bound with a logical AND.

The syntax to use is as shown by the following snippet:

/pattern1/ && /pattern2/ && /pattern3/ {actions}

for example, to get the XFS formatted filesystems that are set to be mounted with the noatime option:

awk '$3 == "xfs" && $4 ~ /.*noatime.*/' /etc/fstab

Logical OR

You may need to run actions when any of the matching patterns matches: you can achieve this by using the || logical OR:

/pattern1/ || /pattern2/ || /pattern3/ { actions }

for example, to get both the filesystems that are either XFS formatted and also the SWAP partition:

awk '$3 == "xfs" || $3 == "swap"' /etc/fstab

you can actually achieve the same outcome also simply by applying a logical OR within a single matching pattern with multiple criteria:

/criteria1|criteria2|criteria3/ { actions }

this is straightforward by the way, since in regular-expressions "|" is the logical OR.

So the previous statement can be rewritten as follows:

awk '$3 ~ /xfs|swap/' /etc/fstab

This tutorial is a direct excerpt from my Apress book

Mastering row numbers, multidimensional associative arrays, and regex field-matching patterns is a premier scripting milestone. However, writing individual string filtering macros is just one single brick in the DevSecOps wall.

Real-world enterprise environments demand that your data processing routines instantly connect with modern project governance tools, strict supply chain compliance scanning, and cloud-native Kubernetes deployment pipelines.

If you want to systematically evaluate your technical stack and fill your knowledge gaps, jump directly to the Apress Blueprint Box below to discover how to boost and evolve your career using a self-paced learning path.

Special Patterns

There are two special patterns that "match" before and after processing the lines (and so not while processing lines):

BEGIN

This keyword means that the actions are executed before starting to read the lines:

BEGIN { actions }

you can for example exploit this pattern to

print a header for the processed output
alter the behavior of the line-matching patterns

For example, the following statement alters the matching patterns so to behave case-insensitive:

awk 'BEGIN{IGNORECASE=1}$3 == "sWAp"' /etc/fstab

END

This keyword means that the actions are executed after having read all the input lines:

END { action }

for example, the following statement exploits the END pattern to print the number of lines in the /etc/fstab file -

awk 'END{print NR}' /etc/fstab

the NR variable contains the number of processed lines - since we are printing it in the end, it contains the number of lines in the file:

The following example instead shows both of them together:

awk 'BEGIN {print "Authors of UNIX Operating Systems:"} /NIX/ {print} END {print "May Dennis rest in peace, we all owe him a lot"}' gods-of-it.csv

produces the following output:

Authors of UNIX Operating Systems:
Dennis,Ritchie,UNIX and C
Ken,Thompson,UNIX and C
May Dennis rest in peace, we all owe him a lot

The match() built-in function

AWK does have built-in functions too: for example the match(string, regex) function returns the position in the string where the regex expression does match.

The function sets the following variables:

RSTART: number of character since the beginning of the field the beginning matching pattern is found
RLENGHT: length of the string that matches

For example:

awk -F , 'match($3, /N.*X/) {print $3 " matches at position "RSTART" and is "RLENGTH" in length"}' gods-of-it.csv

gives the following output:

UNIX and C matches at position 2 and is 3 in length
UNIX and C matches at position 2 and is 3 in length

Please note that besides in the "pattern matching" position, it can of course be exploited in the "actions" position.

Actions

Actions are statements that get executed when the matching pattern matches. As you can guess by the preceding snippets, AWK actually is a programming language (mind that it is aged, so it does not provide amazing features that are typical of the modern ones): this means that it has control structures such as conditional blocks and loops.

Control Statements

AWK has a few basic control structures:

Conditional Blocks

The basic syntax to declare conditional blocks is as depicted by the following snippet:

if ( … ) { … } else if ( … && … ) { … } else if ( … || … )else { … }

please note how inside a condition you can specify booleans (&& for logical AND, and || for logical OR) and group them using parenthesis.

For example:

awk '{ if($3=="xfs" && $4 ~ /.*noatime.*/) { print } }' /etc/fstab

AWK also supports the definition of a switch conditional block using the switch statement.

The syntax is as follows:

switch (expression) {
case value or regular expression:
   case-body
   break
case other_value or regular expression:
   case-body
   break
default:
   default-body
   break
}

For example, this is a quite weird way to print only the lines that are not commented out of the /etc/fstab file:

awk '{ switch($0) { case /#.*/: break; default: print; break; } }' /etc/fstab

Loops

It is of course also possible to define loops:

Finite Loop

You can either type a for-loop statement:

for (initialization; condition; increment/decrement) {
   statements
}

for example:

echo iteration | awk '{ for (i=0; i<7; i++) { print $0" "i } }'

or a for-in statement:

for (index in array) {
   print element[index]
}

note that, conversely from other languages, it does not return the element itself rather then its index into the array

As in many other languages, you can specify the:

break keywork to immediately exit the loop.
continue keyword to skip to the next iteration of the loop.

Conditional Loop

Thisi is the syntax to be used for a conditional loop:

do { statements } while ( condition )

for example the following loop is executed until i is lower than 3 :

echo iteration | awk '{ i=0; do { i++ } while (i<3); print $0" "i }'

if you need to check the condition before executing the loop, you can use a wile loop:

while ( condition ) { statements }

Infinite Loop

Somtimes an infinite loop is needed, no this is the syntax to be used:

do { statements } while (1)

Exit

The exit statement causes AWK to terminate exiting to the shell who launched it

Return

The return statement is used inside custom functiont to terminate the function and return a value. See "Custom Function" for more information on this topic.

Variables

Obviously AWK does allow you to declare variables.

Declare Variables From The Command Line

Simply specify the variable and its value using the -v command line option (be wary that it can be specified multiple times to set more variables).

Are you enjoying these high quality free contents on a blog without annoying banners? I like doing this for free, but I also have costs so, if you like these contents and you want to help keeping this website free as it is now, please put your tip in the cup below:

Even a small contribution is always welcome!

For example, to filter from the 'gods-of-it.csv' file only the Gods of UNIX and C:

awk -F ',' -v FILTER='UNIX and C' '{ if($3==FILTER) {print $1" "$2} }' gods-of-it.csv

or again, to see the father of the World Wide Web:

awk -F ',' -v FILTER='World Wide Web' '{ if($3==FILTER) {print $1" "$2} }' gods-of-it.csv

Declare Variables Within The Statements

You can of course declare variables within the statements themselves - the syntax is as follows:

variable_name=value

as in every programming language, we can assign to a variable values of other values and so on.

For example, we can easily swap two fields of the "gods-of-it.csv" by declaring a "swap" variable and reassigning the value of the fields as follows:

awk -F , '{swap=$1; $1=$2; $2=swap; print $1" "$2}' gods-of-it.csv

the above one-liner has no matching pattern (no it matches every line of input) and specifies the following actions:

swap=$1 action that assign the value of the first field to the "swap" variable
$1=$2 action that assigns the value of the second field to the variable of the first field
$2=swap action that assigns the value of the swap variable (that was the old value of the first field) to the second field.
print $1" "$2 action that prints the two fields, with their new swapped values

Note how AWK, conversely from the shells, does not expand variables inside double quotes. For example, "$3 value is $4" does not work in awk.

Incrementing And Decrementing Values

when dealing with numeric values, you can increment or decrement them as follows:

var++ - increment the value of var by 1
var-- - decrement the value of var by 1
var+=5 - increment the value of var by 5
var-=5 - decrement the value of var by 5
var*=2 - multiply the value by 2
var/=2 - divide the value by 2

Modulus

The modulus operator is '%'. For example:

remainder = 10 % 3

set the remainder of the division of 10 by 3 into the remainder variable

Arrays

AWK do support arrays: you can declare and add memebers to an array as follows:

array[index]=value

For example, this oneliner populates the "info" array and prints its contents:

awk 'BEGIN {info[0] = "Marco Antonio"; info[1] = "Carcano"; info[2] = "https://grimoire.carcano.ch"; for (i in info) {print info[i]}}'

the output is:

Marco Antonio
Carcano
https://grimoire.carcano.ch

if you need to remove an element from an array, simply use the delete keyword, for example, to delete the 3^rd item from the info array:

delete info[3]

in this example, we split the IP address of the eth0 interface into 4 octets and finally delete the last octet:

ip -4 a show eth0 |grep inet | awk '{ sub("/.*","",$2); len=split($2,octets,"."); delete octets[4]; for (i=1;i<len;i++) {print "octet["i"]:"octets[i]} }'

the output is:

octet[1]:10
octet[2]:1
octet[3]:0

built-in variables

AWK has also few built-in variables:

ARGV[n]

An array containing the files specified in the command line: the first element, 0, is the awk executable itself, the second is the first file specified to be processed and so on.

For example, the command:

awk -F ',' -v FILTER='World Wide Web' '{ if($3==FILTER) {print $1" "$2} }' gods-of-it.csv

sets the following variables:

ARGV[0] = awk
ARGV[1] = gods-of-it.csv

ARGC

The length of the ARGV array.

ENVIRON

An associative array containing the values of the environment variables. For example, to print the value of the "PWD" environment variable:

awk 'BEGIN {print ENVIRON["PWD"]}'

NR

The number of the line that is currently processed. For example, to print the number of the line along with its contents.

awk '{print NR" "$0}' gods-of-it.csv

we can exploit the NR variable also as a matching pattern. For example, to print the 3^rd line of from the gods-of-it.csv file:

awk -F ',' 'NR == 3 {print $0}' gods-of-it.csv

FS

It contains the field separator: it is up to you to decide whether to set this variable or supply the -F command line option.

For example:

awk -v FILTER='UNIX and C' 'BEGIN {FS=","}{ if($3==FILTER) print $1" "$2 }' gods-of-it.csv

Mind that the field separator can be set also simply using the -F command line option

RS

It contains the record separator, which defaults to the newline character.

echo "Dennis Ritchie,UNIX and C#Ken Thompson,UNIX and C" | awk -v RS=# '{print}'

the output is two different lines, as follows:

Dennis Ritchie,UNIX and C
Ken Thompson,UNIX and C

NF

The number of fields in the current input record: you can use it for example to guess the number of fields of the file.

awk -F ',' 'NR == 1 {print NF}' gods-of-it.csv

note how we specified NR == 1 as matching pattern, so to limit the action of printing the number of fields to be executed only when processing the first line.

AWK functions

AWK provides a lot of functions, but there's actually not enough room to explain them thoroughly as they would deserve within a single blog post.

Mind that functions can be used either in "pattern" or in "action" position: this means that the outcome of a function can be used as a matching criteria for the pattern part of tht statement.

I'm just showing a few examples of the most commonly used functions and how to invoke them.

Generate a random number

A common need that sometimes pops up is generating random numbers: for this purpose, AWK provides the rand function:

awk -v seed=${RANDOM} 'BEGIN {srand(seed); print rand()}'

the output is:

0.689777

Convert to Lowercase or Uppercase

The tolower(str) function can be used to convert a string to lowercase. For example:

echo "Hello" | awk '{print tolower($1)}'

the output is:

hello

conversely, you can exploit the toupper(str) function to convert a string to uppercase. For example:

echo "Hello" | awk '{print toupper($1)}'

the output is:

HELLO

Case-Insensitive Comparisons

You can exploit these functions also to do case insensitive string comparisons.

For example:

awk '{if(toupper($2)==toupper("/tMp")) {print}}' /etc/fstab

on my system, the output is:

/dev/mapper/system-tmp /tmp xfs defaults 0 0

Substring

A very common need often is extracting a substring from a string - this is exactly what can be achieved using the substr(str, first, length) function.

The following example prints a substring of the 3^rd field from the beginning of the field (character 0) of 4 characters in length.

awk -F , '$3 ~ /U.*X/ {print substr($3, 0, 4)}' gods-of-it.csv

the output is:

UNIX
UNIX

you can of course exploit this function also in conditionals statements, for example:

awk -F , '{if(substr($3, 0, 4)=="UNIX") {print}}' gods-of-it.csv

prints the whole row if the substring of the 3^rd field from the beginning of the field (character 0) of 4 characters in length exactly matches "UNIX".

The output is:

Dennis,Ritchie,UNIX and C
Ken,Thompson,UNIX and C

Length of strings

Another common need may be gettting the length of a string: this is achieved using the length(str) function.

For example, to print the length of the names contained in each of the rows of the "gods-of-it.csv" file (the first field):

awk -F , '{print "the length of \""$1"\" is "length($1)}' gods-of-it.csv

since we have not specified any pattern to be matched, thie above one-liner prints the length of the names for every record: the output indeed is:

the length of "Dennis" is 6
the length of "Ken" is 3
the length of "Bjarne" is 6
the length of "Richard" is 7
the length of "Timothy John" is 12
the length of "Linus" is 5
the length of "Theo" is 4
the length of "Phil" is 4
the length of "Brian Jhan" is 10
the length of "Larry" is 5
the length of "Guido" is 5

The following example shows how to use the length(str) as a matching criteria for the pattern part of tht statement to print the length of the names (the first field) only when they are longer than three characters:

awk -F , 'length($1) > 6 {print "the length of \""$1"\" is "length($1)}' gods-of-it.csv

the output is:

the length of "Richard" is 7
the length of "Timothy John" is 12
the length of "Brian Jhan" is 10

Splitting (Tokenizing) Into An Array

As we previously saw, AWK does support arrays: we can split a string into tokens that are members of an array using the split(str, array, regex) as follows:

awk -F : '$1 == "nobody" {len=split($0,arr,":");for (i=1;i<=len;i++) {print arr[i]}}' /etc/passwd

the output is the following:

nobody
x
99
99
Nobody
/
/sbin/nologin

as you see, the split function returns the number of items resulting from the split - then we use it in the for loop to put a cap to the iteration.

Sorting an Array

You can sort an array using the asort(source_array [, destination_array [, sorting_criteria] ]) function: it sorts the array relying on the value of the items, but note that it does also exist the asorti function that sorts the array relying on the index of the items.

Please mind that:

if destination_array is not specified, source_array gets overwritten by the outcome of the sort operation
sorting criteria, if specified, can be a custom function that implements customized sorting
sorting honors the value of the IGNORECASE special variable - we met it when talking about Special Patterns.

For example, to load the names of the Gods of IT into an array, sort it and finally print it, just issue:

awk -F , '{names[NR]=$1} END{asort(names); for (name in names) l++; for (i=1;i<=l;i++) {print names[i]}}' gods-of-it.csv

the output is:

Bjarne
Brian Jhan
Dennis
Guido
Ken
Larry
Linus
Phil
Richard
Theo
Timothy John

please note that since AWK is missing a function to return the length of an array (of course you can exploit the split function, but it is an improper use of it), we calculate the size of the array with the for block "for (name in names) l++;", storing it in the "l" variable.

Fore more information on this function, see the AWK official manual here.

Strings Substitution

A common use case is replacing a substring: AWK provides both the:

sub(regex, substitution, string)
gsub(regex, substitution, string)

functions to achieve this (gsub is a variant of the sub function that performs a global substitution).

For example, let's say you have an entry like this in the /etc/passwd file:

ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin

and you want to turn "FTP User" into "FTP Chrooted User", simply issue:

awk -F : '$1 == "ftp" { sub(" User"," Chrooted User",$5); print $1":"$2":"$3":"$4":"$5":"$6":"$7}' /etc/passwd

the output indeed is:

ftp:x:14:50:FTP Chrooted User:/var/ftp:/sbin/nologin

Time Functions

Sometimes it is needed to be able to format or to manipulate time values: for example you may need to write a condition to operate on records or fields that match a specific date, time or timestamp.

AWK provides the following time functions:

mktime(datespec [, utc-flag ]) - turn datespec into a timestamp in the same form as is returned by systime(). It is similar to the function of the same name in ISO C. The argument, datespec, is a string of the form "YYYY MM DD HH MM SS [DST]".
strftime([format [, timestamp [, utc-flag] ] ]) - format the time specified by timestamp based on the contents of the format string and return the result. It is similar to the function of the same name in ISO C.
systime() - return the current time as the number of seconds since the system epoch

Just to provide you an example:

LC_ALL=en_US.UTF-8 awk 'BEGIN {print strftime("Today is %B %d, %Y %H:%M:%S", systime())}'

the output is:

Today is June 14, 2022 12:54:32

Thoroughly documenting these functions in this blog post would be a non-sense, since it would be a duplicate of what you can easily find in the AWK official documentation here.

Running a Shell command

Ok, this may sounds quite a funny use case - you put an AWK one-liner inside a SHELL script, ... and it runs another shell command within it, ... but anyway you may face a situation like this for true: in this use case use the system(command) function.

In the following example, we check the status of the postfix systemd unit:

awk 'BEGIN { ret = system("systemctl status postfix"); print "Outcome=" ret }'

on my system I previously stopped the postfix service, so the output is:

● postfix.service - Postfix Mail Transport Agent
   Loaded: loaded (/usr/lib/systemd/system/postfix.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since sab 2022-06-25 13:27:33 CEST; 2s ago
  Process: 2404 ExecStop=/usr/sbin/postfix stop (code=exited, status=0/SUCCESS)
  Process: 1190 ExecStart=/usr/sbin/postfix start (code=exited, status=0/SUCCESS)
  Process: 1188 ExecStartPre=/usr/libexec/postfix/chroot-update (code=exited, status=0/SUCCESS)
  Process: 1177 ExecStartPre=/usr/libexec/postfix/aliasesdb (code=exited, status=0/SUCCESS)
 Main PID: 1266 (code=killed, signal=TERM)

giu 14 09:29:15 www-ci-ud1a001.s1.dev.ch-zh.carcano.local systemd[1]: Startin...
giu 14 09:29:16 www-ci-ud1a001.s1.dev.ch-zh.carcano.local postfix/postfix-script[1264]: ...
giu 14 09:29:16 www-ci-ud1a001.s1.dev.ch-zh.carcano.local postfix/master[1266]: ...
giu 14 09:29:16 www-ci-ud1a001.s1.dev.ch-zh.carcano.local systemd[1]: Started...
giu 14 13:27:33 www-ci-ud1a001.s1.dev.ch-zh.carcano.local systemd[1]: Stoppin...
giu 14 13:27:33 www-ci-ud1a001.s1.dev.ch-zh.carcano.local systemd[1]: Stopped...
Hint: Some lines were ellipsized, use -l to show in full.
Outcome = 3

as you see the stdout of the command is not piped into the "ret" variable, that instead gets the exit code returned by the launched command when it finishes.

Custom Functions

AWK lets you of course define your own functions and call them.

For example, here we define a function that returns if a number is odd or even:

awk 'function odd_or_even(num) {
  if (num % 2 == 0) return "even"
  return "odd"
}
BEGIN {
  num=10
  res = odd_or_even(num)
  print num " is " res
}'

the output is:

10 is even

Real life examples

Finally we learned enough to be able to make some real-life examples of what we can do unleashing the power of one liner statements with AWK.

Guess the amount of space consumed by old files

The following command calculates the amount of space in MiB used by files that has not been modified since at least 30 days:

find /opt -type f -mtime +30 -exec du -sk {} \;|cut -f1 | awk '{total=total+$1}END{print "Total size is "total/1024" MiB"}'

the output is as follows:

Total size is 18.332 MiB

one liners like the above one are very useful for housekeeping, since you can guess how much space is used by unused files on a per aging basis, and so decide the maximum aging of the files you can keep for a certain amount of free space.

Pretty-printing SLAB information

Monitoring slabs usage can tell you a lot about who is consuming the memory cache and to what degree. Since many slabs have a name that matches their purpose, we can use an AWK one liner to get how many slabs are consumed for a specific purpose.

For example, to see the memory cache consumed by the xfs filesystem simply type:

sudo grep "xfs_" /proc/slabinfo

the output is as follows:

xfs_dqtrx 0 0 528 15 2 : tunables 0 0 0 : slabdata 0 0 0
xfs_dquot 0 0 504 16 2 : tunables 0 0 0 : slabdata 0 0 0
xfs_rui_item 0 0 696 23 4 : tunables 0 0 0 : slabdata 0 0 0
xfs_rud_item 0 0 176 23 1 : tunables 0 0 0 : slabdata 0 0 0
xfs_inode 3270 3456 1024 16 4 : tunables 0 0 0 : slabdata 216 216 0
xfs_efd_item 36 36 440 18 2 : tunables 0 0 0 : slabdata 2 2 0
xfs_buf_item 90 90 272 15 1 : tunables 0 0 0 : slabdata 6 6 0
xfs_trans 34 34 232 17 1 : tunables 0 0 0 : slabdata 2 2 0
xfs_log_ticket 44 44 184 22 1 : tunables 0 0 0 : slabdata 2 2 0

now let say we want to show how much space is consuming each XFS slab type, "pretty printing" to make it easier to understand:

sudo egrep "xfs_" /proc/slabinfo | awk '{printf("%s:\t%8d objects of %4d B\n",$1,$2,$4)}'

this is the outcome:

xfs_dqtrx: 0 objects of 528 B
xfs_dquot: 0 objects of 504 B
xfs_rui_item: 0 objects of 696 B
xfs_rud_item: 0 objects of 176 B
xfs_inode: 3231 objects of 1024 B
xfs_efd_item: 36 objects of 440 B
xfs_buf_item: 90 objects of 272 B
xfs_trans: 34 objects of 232 B
xfs_log_ticket: 44 objects of 184 B

We are using the printf function to pretty print things: in the first argument, placeholders like %s (string) or %d (integer number) mark where to substitute the values specified by the other arguments of the list.

In addition to that we right justify the number of objects (%8d modifier, where 8 is the maximum numbers of padded characters) and right-justify the object size (%4d). "\t" is escaped as a TAB character, and \n is escaped as a newline.

We can even improve it again:

omit Slabs Type With No Objects (0)

This is achieved by enclosing the printf statement within an if block like the following:

if ($2>0) { … }

Compute The Overall Space Used By Each Slab Type

This is achieved prepending the math operations "used_mem=$2*$4; total=total+used_mem;" to the printf statement, within the if block.

Note how, since we are using multiple statements, we terminate each of them by using the ";" character.

The following statement calculates the amount of used space in the line and stores it into the used_mem variable.

used_mem=$2*$4;

The following statement calculates the overall used memory by summing used_mem of each line:

total=total+used_mem;

Print A Summary

We achieve this by enclosing its printf statement within a END block.

Put everything Together

This is the whole statement after putting everything together:

sudo egrep "xfs_" /proc/slabinfo | awk '{if($2>0) { used_mem=$2*$4; total=total+used_mem; printf("%s:\t%10d B\n",$1,used_mem)}} END {printf("\nXFS total usage:%10.2f MB\n",total/1024/1024)}'

the new output is:

xfs_inode: 4268032 B
xfs_efd_item: 15840 B
xfs_buf_item: 24480 B
xfs_trans: 7888 B
xfs_log_ticket: 8096 B

XFS total usage: 4.12 MB

that looks like a huge improvement from the previous one.
To really see it working, let's pump things up, and make XFS cache some data:

sudo find / > /dev/null

and now let's see the amount of data cached by XFS – it is the same command as above:

sudo egrep "xfs_" /proc/slabinfo | awk '{if($2>0) { used_mem=$2*$4; total=total+used_mem; printf("%s:\t%10d B\n",$1,used_mem)}} END {printf("\nXFS total usage:%10.2f MB\n",total/1024/1024)}'

the output is

xfs_inode: 48160768 B
xfs_efd_item: 15840 B
xfs_buf_item: 24480 B
xfs_trans: 7888 B
xfs_log_ticket: 8096 B

XFS total usage: 45.98 MB

well, it seems that we have just coded a new utility that shows you how much RAM is consuming XFS cache.

Guess detailed information on process memory usage

Let see another example: the following statement shows the memory pages used by agetty process:

sudo pmap $(pgrep agetty)

the output is

725: /sbin/agetty -o -p -- \u --noclear tty1 linux
000055fc25065000 56K r-x-- agetty
000055fc25272000 4K r---- agetty
000055fc25273000 4K rw--- agetty
000055fc25274000 8K rw--- [ anon ]
000055fc25a19000 132K rw--- [ anon ]
00007fdd8edff000 6780K r--s- group
00007fdd8f49e000 40K r-x-- libnss_sss.so.2
00007fdd8f4a8000 2044K ----- libnss_sss.so.2
00007fdd8f6a7000 4K r---- libnss_sss.so.2
00007fdd8f6a8000 4K rw--- libnss_sss.so.2
00007fdd8f6a9000 1764K r-x-- libc-2.28.so
00007fdd8f862000 2048K ----- libc-2.28.so
00007fdd8fa62000 16K r---- libc-2.28.so
00007fdd8fa66000 8K rw--- libc-2.28.so
00007fdd8fa68000 16K rw--- [ anon ]
00007fdd8fa6c000 164K r-x-- ld-2.28.so
00007fdd8fc34000 332K r---- LC_CTYPE
00007fdd8fc87000 16K rw--- [ anon ]
00007fdd8fc8d000 28K r--s- gconv-modules.cache
00007fdd8fc94000 4K r---- ld-2.28.so
00007fdd8fc95000 4K rw--- ld-2.28.so
00007fdd8fc96000 4K rw--- [ anon ]
00007ffe4492a000 132K rw--- [ stack ]
00007ffe4495c000 12K r---- [ anon ]
00007ffe4495f000 8K r-x-- [ anon ]
ffffffffff600000 4K r-x-- [ anon ]
total 13636K

We can exploit AWK to more focused computing.

Amount Of Memory Used By The Process Itself

sudo pmap $(pgrep getty) | awk '/ agetty/ {total=total+substr($2, 1, length($2)-1); print} END{print "Total memory useb by agetty alone is "total"Kb"}'

the output is:

0000000000400000 44K r-x-- agetty
000000000060a000 4K r---- agetty
000000000060b000 4K rw--- agetty
Total memory useb by agetty alone is 52Kb

Amount Of Memory Used by The Stack

sudo pmap $(pgrep getty) | awk '/\[ stack \]/ {total=total+substr($2, 1, length($2)-1); print} END{print "Total memory used by stack is "total"Kb"}'

the output is:

00007ffd6c9e4000 132K rw--- [ stack ]
Total memory used by stack is 132Kb

The Amount Of Anonymous Memory

sudo pmap $(pgrep getty) | awk '$0 ~ /\[ anon \]$/ {total=total+substr($2, 1, length($2)-1); print} END{print "Total anonymous memory is "total"Kb"}'

the output is:

000000000060c000 8K rw--- [ anon ]
0000000001365000 132K rw--- [ anon ]
00007fdb5a049000 24K rw--- [ anon ]
00007fdb5a418000 20K rw--- [ anon ]
00007fdb5a633000 12K rw--- [ anon ]
00007fdb5a63d000 4K rw--- [ anon ]
00007fdb5a640000 4K rw--- [ anon ]
00007ffd6cb0d000 8K r-x-- [ anon ]
ffffffffff600000 4K r-x-- [ anon ]
Total anonymous memory is 216Kb

The Amount Of Memory Used By Shared Libraries

sudo pmap $(pgrep getty) | awk '$0 !~ /\[ anon \]/ && !/\[ stack \]$/ && !/agetty/ {total=total+substr($2, 1, length($2)-1); print} END{print "Total memory used by shared libraries is "total"Kb"}'

teh output is:

00007fdb538f9000 103692K r---- locale-archive
00007fdb59e3c000 48K r-x-- libnss_files-2.17.so
00007fdb59e48000 2044K ----- libnss_files-2.17.so
00007fdb5a047000 4K r---- libnss_files-2.17.so
00007fdb5a048000 4K rw--- libnss_files-2.17.so
00007fdb5a04f000 1808K r-x-- libc-2.17.so
00007fdb5a213000 2044K ----- libc-2.17.so
00007fdb5a412000 16K r---- libc-2.17.so
00007fdb5a416000 8K rw--- libc-2.17.so
00007fdb5a41d000 136K r-x-- ld-2.17.so
00007fdb5a63e000 4K r---- ld-2.17.so
00007fdb5a63f000 4K rw--- ld-2.17.so
total 110212K
Total memory used by shared libraries is 220024Kb

Beyond Data Parsing: Building the DevOps and DevSecOps Foundations

Successfully tokenizing unstructured system streams into sorted arrays, splitting multi-octet variables, and computing real-time process memory footprints via terminal pipelines are exceptional systems administration skills. Yet, manual record orchestration represents just one single technical brick in the massive wall of modern DevOps and DevSecOps engineering on Linux. In complex enterprise networks, an expert cannot operate within fragmented text tools: you must know how to translate your automation logic into complete software supply chains, deep kernel defensive configurations, and cloud-native continuous delivery models without vendor lock-in.

When you evaluate your daily development workflows, are you fully confident across the entire engineering stack running under the hood of your automation scripts? Or do you feel you have technical gaps holding your infrastructure designs back?

My book, "DevSecOps and DevOps for Linux: The Foundations", published by Apress, was specifically designed as a comprehensive, lab-driven blueprint to bridge these exact domains. This very tutorial on AWK serves as a direct excerpt and practical foundation for the advanced data parsing operations (XML, JSON, YAML) using Python, jq, and yq, alongside the robust, reusable shell scripting architectures engineered inside the volume. Through intensive, hands-on exercises built entirely on open-source, cloud-agnostic architectures, you will discover how to tie your configuration management repositories, automated compliance engines, and cluster deployments together into a unified, secure infrastructure hosted natively on Kubernetes.

Key insights covered in this volume:

The Holistic Skills Set Brick: Bridge technical engineering with team management frameworks. Master Scrum, Kanban, and Lean methodologies to design system architectures aligned with real corporate workflows.
The Shell Scripting & Unix Tools Brick: Build rigorous operational foundations. Master advanced Bash shell scripting architecture while learning how to combine core Unix tools into robust, repeatable, and enterprise-ready host automations.
The Version Control Engineering Brick: Move past basic commits. Dive deep into Git version control, mastering feature-branch workflows, repository lifecycle management, and complex conflict resolution.
The Data & Core Automation Brick: Build bulletproof data processing setups. Learn advanced RegEx, how to operate using evergreen tools such as Grep, Sed, and AWK, and how to master structured data parsing (XML, JSON, YAML) using Python and tools like xmlstarlet, jq, and yq.
The Modern Python & Automation Brick: Develop a modern Python project using pyproject.toml with pytest-based unit tests, governing the project with GNU Make for testing, building, and digitally signing RPM packages. The project is presented in an evolving fashion, showing how features are added step by step, highlighting how a properly structured Python project can be improved and evolved with minimal or no rework at all.
The Linux OS Hardening & PKI Brick: Learn the real mechanics of security. Implement X.509/PKI architectures, TLS configurations, and GPG encryption and signing, while mastering low-level kernel defenses like SELinux and Linux Capabilities.
The Compliance Check and Shift-Left Security Brick: Learn how to leverage the pre-commit framework to automate compliance checks with Pylint and Flake8, and perform security scans with Bandit and Safety, extending the security audit to the full software supply chain.
The Application Integration Brick: Master the foundational protocols used to securely interconnect enterprise microservices, including HTTP, REST, OpenAPI, SOAP, and LDAP/LDAPS.
The Infrastructure Delivery Brick: Put theory into practice with vertical, real-world labs. Move from basic scripts to engineering Ansible architectures, rootless Podman setups, image creation via Buildah, and complete Pulp3 deployments using Docker Compose.
The Enterprise GitOps Pipeline Brick: Tie everything together by automating your software supply chain. Build complete continuous deployment workflows using Gitea CI pipelines hosted natively on Kubernetes (RKE2).

Footnotes

Here it ends this tutorial on AWK: as we saw, it is still an handy tool that sometimes it is worth the effort to exploit to easily and quickly write one liners to perform pattern scanning. In addition to that, being skilled on it helps to maintan the many scripts that embed AWK one liners that very often system administrators inherith from their predecessors and that the very most of the time is not worth the effort to rewrite using more modern languages.

I hate blogs with pop-ups, ads and all the (even worse) other stuff that distracts from the topics you're reading and violates your privacy. I want to offer my readers the best experience possible for free, ... but please be wary that for me it's not really free: on top of the raw costs of running the blog, I usually spend on average 50-60 hours writing each post. I offer all this for free because I think it's nice to help people, but if you think something in this blog has helped you professionally and you want to give concrete support, your contribution is very much appreciated: you can just use the above button.

Acquainting to AWK

Fields Variables

SHell Invocation

One Liner

Loading From Statements Files

AWK basic syntax

pattern-actions statements

Patterns

Line matching pattern

Field-matching pattern

Negating the pattern

Row number

Logical Operators

Logical AND

Logical OR

Special Patterns

BEGIN

END

The match() built-in function

Actions

Control Statements

Conditional Blocks

Loops

Finite Loop

Conditional Loop

Infinite Loop

Exit

Return

Variables

Declare Variables From The Command Line

Declare Variables Within The Statements

Incrementing And Decrementing Values

Modulus

Arrays

built-in variables

ARGV[n]

ARGC

ENVIRON

NR

FS

RS

NF

AWK functions

Generate a random number

Convert to Lowercase or Uppercase

Case-Insensitive Comparisons

Substring

Length of strings

Splitting (Tokenizing) Into An Array

Sorting an Array

Strings Substitution

Time Functions

Running a Shell command

Custom Functions

Real life examples

Guess the amount of space consumed by old files

Pretty-printing SLAB information

omit Slabs Type With No Objects (0)

Compute The Overall Space Used By Each Slab Type

Print A Summary

Put everything Together

Guess detailed information on process memory usage

Amount Of Memory Used By The Process Itself

Amount Of Memory Used by The Stack

The Amount Of Anonymous Memory

The Amount Of Memory Used By Shared Libraries

Footnotes

Leave a Reply Cancel Reply