How can I parse a YAML file from a Linux shell script?
I wish to provide a structured configuration file which is as easy as possible for a non-technical user to edit (unfortunately it has to be a file) and so I wanted to use YAML. I can't find any way of parsing this from a Unix shell script however.
Here is a bash-only parser that leverages sed and awk to parse simple yaml files:
function parse_yaml {
local prefix=$2
local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
sed -ne "s|^\($s\):|\1|" \
-e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
-e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" $1 |
awk -F$fs '{
indent = length($1)/2;
vname[indent] = $2;
for (i in vname) {if (i > indent) {delete vname[i]}}
if (length($3) > 0) {
vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")}
printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, $2, $3);
}
}'
}
It understands files such as:
## global definitions
global:
debug: yes
verbose: no
debugging:
detailed: no
header: "debugging started"
## output
output:
file: "yes"
Which, when parsed using:
parse_yaml sample.yml
will output:
global_debug="yes"
global_verbose="no"
global_debugging_detailed="no"
global_debugging_header="debugging started"
output_file="yes"
it also understands yaml files, generated by ruby which may include ruby symbols, like:
---
:global:
:debug: 'yes'
:verbose: 'no'
:debugging:
:detailed: 'no'
:header: debugging started
:output: 'yes'
and will output the same as in the previous example.
typical use within a script is:
eval $(parse_yaml sample.yml)
parse_yaml accepts a prefix argument so that imported settings all have a common prefix (which will reduce the risk of namespace collisions).
parse_yaml sample.yml "CONF_"
yields:
CONF_global_debug="yes"
CONF_global_verbose="no"
CONF_global_debugging_detailed="no"
CONF_global_debugging_header="debugging started"
CONF_output_file="yes"
Note that previous settings in a file can be referred to by later settings:
## global definitions
global:
debug: yes
verbose: no
debugging:
detailed: no
header: "debugging started"
## output
output:
debug: $global_debug
Another nice usage is to first parse a defaults file and then the user settings, which works since the latter settings overrides the first ones:
eval $(parse_yaml defaults.yml)
eval $(parse_yaml project.yml)
I've written shyaml
in python for YAML query needs from the shell command line.
Overview:
$ pip install shyaml ## installation
Example's YAML file (with complex features):
$ cat <<EOF > test.yaml
name: "MyName !!"
subvalue:
how-much: 1.1
things:
- first
- second
- third
other-things: [a, b, c]
maintainer: "Valentin Lab"
description: |
Multiline description:
Line 1
Line 2
EOF
Basic query:
$ cat test.yaml | shyaml get-value subvalue.maintainer
Valentin Lab
More complex looping query on complex values:
$ cat test.yaml | shyaml values-0 | \
while read -r -d $'\0' value; do
echo "RECEIVED: '$value'"
done
RECEIVED: '1.1'
RECEIVED: '- first
- second
- third'
RECEIVED: '2'
RECEIVED: 'Valentin Lab'
RECEIVED: 'Multiline description:
Line 1
Line 2'
A few key points:
- all YAML types and syntax oddities are correctly handled, as multiline, quoted strings, inline sequences...
-
\0
padded output is available for solid multiline entry manipulation. - simple dotted notation to select sub-values (ie:
subvalue.maintainer
is a valid key). - access by index is provided to sequences (ie:
subvalue.things.-1
is the last element of thesubvalue.things
sequence.) - access to all sequence/structs elements in one go for use in bash loops.
- you can output whole subpart of a YAML file as ... YAML, which blend well for further manipulations with shyaml.
More sample and documentation are available on the shyaml github page or the shyaml PyPI page.