Not the same output format from `df` in different Linux distributions

In Ubuntu the output of this command

df --exclude={tmpfs,devtmpfs,squashfs,overlay} | sed -e /^Filesystem/d | awk '{print $6 " " $1 " " $3 " " $4 " " $5}'

is:

/ /dev/mapper/dockerVG-rootLV 8110496 40591632 17%
/dockerssd /dev/mapper/ssdVG-ssdLV 214133656 274642488 44%
/dockerhdd /dev/mapper/hddVG-hddLV 83278236 1385191240 6%
/var/lib/docker /dev/mapper/hddVG-dockerLV 76046204 412729940 16%

That is what I need.

On CentOS 6 I get this output:

 /dev/mapper/vg_rproxy-lv_root
 51475068 43192316 12% /
/boot /dev/sda1 82688 379364 18%
 /dev/mapper/vg_rproxy-lv_home
 77349888 73119692 1% /home

It's a mess.

Full output from the CentOS 6:

$ df
Filesystem           1K-blocks    Used Available Use% Mounted on
/dev/mapper/vg_rproxy-lv_root
                      51475068 5661336  43192292  12% /
tmpfs                   957140       0    957140   0% /dev/shm
/dev/sda1               487652   82688    379364  18% /boot
/dev/mapper/vg_rproxy-lv_home
                      77349888  294352  73119692   1% /home

What is the problem? How can I fix it?


tl;dr

Use df -P.


Full answer

/dev/mapper/vg_rproxy-lv_root and /dev/mapper/vg_rproxy-lv_home are relatively long strings. It appears df in CentOS decides to split their entries to two lines, this breaks the logic when you want to parse the output further.

In narrow terminals this may be a good thing, creating semi-columnized human-readable output even despite limited horizontal space. I would prefer this not to happen when df writes to a non-tty (a pipe in your case).

Maybe df in Ubuntu behaves similarly if entries in the Filesystem column are long; maybe you just didn't experience this because yours are relatively short. I don't know, this is not important. What is important is df is a POSIX tool and should follow the specification. But the specification explicitly states:

Historical df implementations vary considerably in their default output. It was therefore necessary to describe the default output in a loose manner to accommodate all known historical implementations and to add a portable option (-P) to provide information in a portable format.

About the option:

-P
Produce output in the format described in the STDOUT section.

And finally the relevant part of the STDOUT section (emphasis mine):

The implementation may adjust the spacing of the header line and the individual data lines so that the information is presented in orderly columns.

The remaining output with -P shall consist of one line of information for each specified file system. These lines shall be formatted as follows:

"%s %d %d %d %d%% %s\n", <file system name>, <total space>,
    <space used>, <space free>, <percentage used>,
    <file system root>

So df is allowed to output anything, unless you use -P. Without -P some implementations of df may produce predictable and parsable output, others… not so much. Their behavior may or may not be documented well enough. Therefore in general, when parsing the output of df you should always use -P.

Just adding -P will probably be enough to fix your specific problem.

Note -P governs the format only. Overall POSIX specification applies only in the POSIX locale. Additionally modern implementations of df tend to use 1024-byte blocks by default, while POSIX states the default is 512. In my Debian 10 df from GNU coreutils falls back to the POSIX default when POSIXLY_CORRECT is set in the environment. Portably you can force 1024-byte blocks with -k.

This is a portable command that produces (almost) parsable output:

LC_ALL=POSIX df -Pk

Almost parsable, because entries in the Filesystem column may contain spaces, I think; although in a sanely configured OS they don't.

You may omit LC_ALL=POSIX and still get expected results, but in general it should be there for parsing. E.g. in my Polish locale your sed -e /^Filesystem/d doesn't do its job because I get a Polish term for "filesystem" from my df. LC_ALL=POSIX fixes this. Still my personal preference is not to rely on anything in the header. I would use sed 1d or tail -n +2; or delegate the task to awk, since awk is already in your pipeline. This would be:

LC_ALL=POSIX df -Pk --exclude={tmpfs,devtmpfs,squashfs,overlay} \
| awk 'NR>1 {print $6 " " $1 " " $3 " " $4 " " $5}'

Finally --exclude= is not a portable option. Apparently it works for you in both systems in question, it may not work in other systems though.