JSON is a must-have skill for IT professionals, since it is probably the most used document format when dealing with AJAX and with REST web-services: since both of them are broadly used on the web, it is very likely that sooner or later you'll ever have to deal with it.
Being skilled on JSON does not only mean being able to write JSON documents, but also knowing how to exploit tools such as jq to extract values or even a subset from a JSON document.
Having these skills makes your life easier not only if you are a developer, but also if you are involved in the system integration field.
This post is an overview about all you must know about JSON and how to work with it using jq.
By the way, this post is part of a trilogy of posts dedicated to markup and serialization formats, so be sure not to miss
What is JSON
JavaScript Object Notation is an easy-to-parse and lightweight language independent data-interchange format for the serialization and exchange of structured data.
Its syntax is a subset of the Standard ECMA-404 and defines a small set of formatting rules for the portable representation of structured data. The official JSON website is https://www.json.org.
JSON files consist of a collection of name/value pairs separated using commas: values can be either number, strings, lists or other nested collections. These kinds of collections can be easily mapped to data structures used by the very most of the programming languages.
Therefore, JSON can be easily integrated with any language.
In order to ease learning the JSON syntax, I introduce it step by step, with an hands-on approach that relies on the jq command line utility.
Using VIM with JSON
Of course we have to see how to use jq, but if you are ancient like me, you certainly cannot do without the old fashioned vi and so ... install indentLine plugin:
for our convenience we can install it using vim-plug: let's download vim-plug plugin manager as follows:
curl -fLo ~/.vim/autoload/plug.vim --create-dirs https://raw.githubusercontent.com/junegunn/vim-plug/master/plug.vim
now let's modify our "~/.vimrc" file adding the following snippet:
call plug#begin()
Plug 'https://github.com/Yggdroot/indentLine'
call plug#end()
then let's launch vim as usual:
vim
while in command mode, type the command to install the plugin we listed into "~/.vimrc":
:PlugInstall
if everything went smoothly we can verify the status of the plugins as follows:
:PlugStatus
now simply add the following snippet to your "~/.vimrc" file:
au! BufNewFile,BufReadPost *.{json} set filetype=json
autocmd FileType json setlocal ft=javascript ts=2 sts=2 sw=2 expandtab
let g:indentLine_char = '|'
let g:indentLine_color_term = 239
here we defined the "json" file type and bound the ".json" files to it. Then we configured some help with indentation.
This is the outcome when editing a JSON file:
it's much more readable and usable, isn't it?
JQ - An Overview
It is a really handy command line tool to manage JSON objects in shell scripts. We can easily install it as follows:
sudo dnf -y install jq
Its most basic usage is simply parsing and pretty-printing a JSON object. For example:
echo '{"version": 1, "kind": "myobject" }' | jq
pretty-prints the output as follows:
{
"version": 1,
"kind": "myobject"
}
as you see, jq output is colored and properly indented: this makes it much more human readable.
The main aim of jq however is manipulating JSON.
It leverages on filters, such as:
and functions, such as:
filters and functions can be combined in a fully meshed way separating with a pipe "|".
Let see a few examples - consider the following JSON:
{
"status": "WARN",
"message": "Free disk space too low",
"threshold": "10%",
"current": "8%"
}
we can easily get the value of the "current" attribute by specifying the ".current" filter:
RESULT='{"status": "WARN", "message": "Free disk space too low", "threshold": "10%", "current": "8%"}'
echo ${RESULT} | jq '.current '
the output is:
"8%"
if necessary, we can even specify a string containing the filters:
echo ${RESULT} | jq -r '"Current consumed percentage is \(.current) while threshold is \(.threshold)"'
the outcome is the same string with the values fetched from the JSON:
Current consumed percentage is 8% while threshold is 10%
we can exploit the above feature to export into SHELL variables the fetched value: for example, the string to export the two filters is as follows:
CURRENT=\(.current) THRESHOLD=\(.threshold)
so the whole command is:
export RESULT
export $(echo ${RESULT}|jq -r '"CURRENT=\(.current) THRESHOLD=\(.threshold)"')
let's try it:
echo ${CURRENT}
echo ${THRESHOLD}
the outcome is:
8%
10%
Of course I'm not fostering to create SHELL scripts to manage JSON: these are my two cents when it comes to modify existing legacy scripts to let them parse JSON documents generated by modern software without having to re-code the whole script, saving time and so money.
JSON Serialization
Writing valid JSON documents requires knowing JSON syntax: we are about to see it along with the jq commands that can be used to query the sample JSON documents.
JSON Objects
A JSON object is a comma separated list of key/value pairs where values can be either numbers, strings or any of the literals described later on.
The value of a key can be a simple type, such as a number, a string or a literal representing other kinds of objects, such as lists, including other JSON objects.
For example, this is the syntax to use to specify a JSON object as the value of the "OS" key:
"OS": {
"OS-Family":"Linux",
"Vendor": "Red Hat",
"Version": "8.0"
}
accessing the values of the "OS" JSON object requires minding of the nesting level: the syntax to get the ".Version" element of the ".OS" dictionary is:
.OS.Version
Let's see how to deal using jq to get the value of the ".OS-Family" element of the ".OS" dictionary: you may guess that the syntax is ".OS.OS_family", but this time there's the dash character that makes things harder, so the actual syntax is the one shown in the following snippet:
JSON_DICT='"OS":{"OS-Family":"Linux","Vendor": "Red Hat","Version": "8.0"}'
echo "{$JSON_DICT}" | jq '.OS."OS-Family"'
the output indeed is:
"Linux"
Quotes around "OS-Family" are necessary to avoid jq to get mixed up by the presence of the dash "-" character.
A less common need is to list just the "keys" of the JSON object: we can achieve this by piping the output of the filter to the "keys" function as follows:
echo "{$JSON_DICT}" | jq '.OS|keys'
the output is as follows:
[
"OS-Family",
"Vendor",
"Version"
]
Comments
JSON format does not provide a syntax for comment: although you can define your own key which value should be considered a comment by your application – but keep in mind that it is not a convention.
null value and booleans
As claimed by one of the changes published by RFC 7159 (it updates RFC 4627), a JSON text is a serialized value.
This means that null, true and false have officially become valid JSON literals.
The syntax for specifying null and booleans is depicted by the following JSON snippet:
{
"myCount": null,
"enabled": true,
"autoload": false
}
jq recognizes all of these literals, and so it is able to correctly guess the type:
echo '{"mycount": null,"enabled": true,"autoload": false}' | jq 'map(type)'
as expected, the outcome is:
[
"null",
"boolean",
"boolean"
]
Numbers
Numbers can be either integers or double-precision floating-point. The actual syntax is depicted by the following JSON snippet:
{
"aninteger": 42,
"floating-point": 1.56,
"other-floating": 0.9,
"exponent": 10E-2
}
valid exponent character markers are "e", "e+", "e-", "E", "E+", "E-".
Let's check the types guessed from jq:
echo '{"aninteger": 42, "floating-point": 1.56, "other-floating": 0.9, "exponent": 10E-2}' | jq 'map(type)'
as expected, the outcome is:
[
"number",
"number",
"number",
"number"
]
Strings
Strings are declared by enclosing the value with double quotes as shown by the following JSON snippet:
{
"FullName": "Marco Antonio Carcano"
}
let's check the type guessed from jq:
echo '{"FullName": "Marco Antonio Carcano"}' | jq 'map(type)'
as expected, the outcome is:
[
"string"
]
Be wary also that according to the specification:
A string is a sequence of Unicode code points wrapped with quotation marks (U+0022). All characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark (U+0022), reverse solidus (U+005C), and the control characters U+0000 to U+001F. There are two-character escape sequence representations of some characters.
Unicode control characters from U+0000 to U+001F are:
- U+0000 <control-0000> (NUL: NULL) (used in null-terminated strings)
- U+0009 <control-0009> (HT: HORIZONTAL TABULATION) (inserted by the tab key)
- U+000A <control-000A> (LF: LINE FEED) (used as a line break)
- U+000C <control-000C> (FF: FORM FEED) (denotes a page break in a plain text file)
- U+000D <control-000D> (CR: CARRIAGE RETURN) (used in some line-breaking conventions)
The consequence of this claim is that you cannot insert into a string any of the above control characters, neither the quotation mark (U+0022) and the reverse solidus (U+005C) without escaping them first.
Multiline strings
JSON serializers/deserializers do not have a standard way to handle multi-line strings, so it's up to the application to decide how to deal with it.
There are two possible approaches:
If your serializer/deserializer manages the escaping of "\n" (U+000A) control character as a newline, you can try the following syntax:
{
"Multiline-string": "first line\\nSecond line"
}
otherwise you can try the dirty workaround of splitting the multi-line string into an array (a list - more on this later) of single line strings and have your application treat them as a single multi-line string:
{
"Multiline-string": [
"first line",
"Second line"
]
}
let's check the type guessed from jq:
echo '{"Multiline-string": [ "first line", "Second line" ]}' | jq 'map(type)'
this time the outcome is:
[
"array"
]
an array is actually a list and is often called like so indeed.
Lists
This data structure is what many programming languages call array; the following example depicts the JSON syntax of a list called "disk":
"disks": [
"/dev/sda",
"/dev/sdb",
"/dev/sdc"
]
Let's see some examples of how to use jq to deal with common needs on JSON lists.
Number Of Elements Using the Length Filter
A common need is certainly getting the number of elements in the list - we can pipe the output of the filter to the "length" function:
JSON_LIST='"disks": [ "/dev/sda", "/dev/sdb", "/dev/sdc" ]'
echo "{$JSON_LIST}" | jq '.disks | length'
Getting One Element Of The List
Same way as we do with arrays when coding using a programming language, elements of a list are accessed by index: jq filters work the same way. For example, we can use jq to access the 3rd element of the "disks" list as follows:
echo "{$JSON_LIST}" | jq .disks[2]
the output is as follows:
"/dev/sdc"
Getting A Slice Of The List
Another common need can be getting a slice (so a subset) of a list. For example, the jq statement to get items from the 2nd to the last of the list is:
echo "{$JSON_LIST}" | jq .disks[1:]
Sorting a List
When dealing with the elements of a list, often we have to get them sorted - it is just a matter of piping the output of the filter to the "sort" function:
NUMBERS='"numbers": [ 1, 0.3, 2.5, -0.9 ]'
echo "{$NUMBERS}" | jq '.numbers | sort'
the output is as follows:
[
-0.9,
0.3,
1,
2.5
]
these are numbers, but of course we may have the same needs with strings:
NAMES='"names": [ "Marco", "Antonio"]'
echo "{$NAMES}" | jq '.names | sort'
the output is as follows:
[
"Antonio",
"Marco"
]
Getting Min and Max
We can pipe the output of the filter to the "min" function to get the minimum element:
echo "{$NUMBERS}" | jq '.numbers | min'
it returns -0.9.
Conversely , if we need the maximum element we can pipe the output of the filter to the "max" function:
echo "{$NUMBERS}" | jq '.numbers | max'
it returns 2.5.
Converting A List Into a List Of Dictionaries
Although this is not very common, there may be situations that require you to convert a list into a list of dictionaries: for example you can turn a list into a list of dictionaries so that you can then use the "select" function to know if the list does actually contain a given value.
You can achieve this using to_entries[] function:
echo "{$JSON_LIST}" | jq '.disks| to_entries[]'
the output is:
{
"key": 0,
"value": "/dev/sda"
}
{
"key": 1,
"value": "/dev/sdb"
}
{
"key": 2,
"value": "/dev/sdc"
}
Getting The Index Of An Item Of The list With a Certain Value
After turning the list into a list of dictionaries you can leverage on the "select" function to get a list of dictionaries with a given value.
For example, to get the dictionary with "/dev/sdb":
echo "{$JSON_LIST}" | jq '.disks| to_entries[] |select(.value=="/dev/sdb")'
the output is
{
"key": 1,
"value": "/dev/sdb"
}
now, to get the key, simply add it to the filter:
echo "{$JSON_LIST}" | jq '.disks| to_entries[] |select(.value=="/dev/sdb") .key'
the output is:
1
So the item with a value of "/dev/sdb" is the 2nd one (index=1).
The same statement can of course be used to see if a list contains a given value:
echo "{$JSON_LIST}" | jq '.disks| to_entries[] |select(.value=="/dev/sdd") .key'
since the "disks" list does not contain "/dev/sdd", the above command produces no output.
List of JSON Objects
Since we just talked about them, let's see an example of nesting dictionaries as items of a list.
For example:
"foo_vg": [
{
"name": "bar_lv",
"size": "2GiB",
"type": "xfs",
"mountpoint": "/opt/bar"
},
{
"name": "baz_lv",
"size": "1GiB",
"type": "xfs",
"mountpoint": "/opt/baz"
}
]
we can get the entries from the list by specifying [] in the filter:
FS='"foo_vg":[{"name": "bar_lv","size":"2GiB","type":"xfs","mountpoint":"/opt/bar"},{"name": "baz_lv","size":"1GiB","type":"xfs","mountpoint":"/opt/baz"}]'
echo {$FS} | jq '.foo_vg[]'
the output is:
{
"name": "bar_lv",
"size": "2GiB",
"type": "xfs",
"mountpoint": "/opt/bar"
}
{
"name": "baz_lv",
"size": "1GiB",
"type": "xfs",
"mountpoint": "/opt/baz"
}
Selecting Elements Of A List
The select function can be used to get a slice of a list with only the elements that match a certain criteria. For example, to get the entries with "xfs" filesystem that are "1GiB" in size:
echo "{$FS}" | jq '.foo_vg[] | select(.type=="xfs" and .size=="1GiB")'
we can easily get the value of the "name" key as follows:
echo "{$FS}" | jq '.foo_vg[] | select(.type=="xfs" and .size=="1GiB")| .name'
the output is:
"baz_lv"
We can of course use a regular expression as parameter of the "select" function:
echo "{$FS}" | jq '.foo_vg[] | select(.mountpoint | test("^/opt."))'
Select an Item From a List Of Dictionaries BY Name
we can now select only the dictionary that has an item with a key called "mountpoint":
FS='"foo_vg":[{"name": "bar_lv","size":"2GiB","type":"xfs","mountpoint":"/opt/bar"},{"name": "baz_lv","size":"1GiB","type":"xfs"}]'
echo {$FS} | jq '.foo_vg | map(has("mountpoint"))'
as you see by the output, the first item has the "mountpoint" attribute, whereas the second one hasn't:
[
true,
false
]
Select an Item From a List Of Dictionaries BY Value
we can now select only the dictionary that has an item with a key called "name" with "baz_lv" as value:
echo {$FS} | jq '.foo_vg[] | select(.name=="baz_lv")'
the output is:
{
"name": "baz_lv",
"size": "1GiB",
"fs-type": "xfs",
"mountpoint": "/opt/baz"
}
Footnotes
Here it ends our quick tour of the amazing world of JSON: instead of limiting to show only the grammar, I preferred to leverage on jq to show you things in action. However, be wary that although we saw how to deal with JSON using jq, when dealing with JSON it is more convenient to avoid writing shell scripts and instead use other scripting languages that can natively handle it, such as Python.
In my opinion jq is a powerful tool, but I use it only to perform ad-hoc commands or to add JSON support to scripts that maybe is not worth the effort to rewrite into another language.
By the way, this post is part of a trilogy of posts dedicated to markup and serialization formats, so be sure not to miss
3 thoughts on “JSON and jq in a nutshell”