Split output of command by columns using Bash?
I want to do this:
- run a command
- capture the output
- select a line
- select a column of that line
Just as an example, let's say I want to get the command name from a $PID
(please note this is just an example, I'm not suggesting this is the easiest way to get a command name from a process id - my real problem is with another command whose output format I can't control).
If I run ps
I get:
PID TTY TIME CMD
11383 pts/1 00:00:00 bash
11771 pts/1 00:00:00 ps
Now I do ps | egrep 11383
and get
11383 pts/1 00:00:00 bash
Next step: ps | egrep 11383 | cut -d" " -f 4
. Output is:
<absolutely nothing/>
The problem is that cut
cuts the output by single spaces, and as ps
adds some spaces between the 2nd and 3rd columns to keep some resemblance of a table, cut
picks an empty string. Of course, I could use cut
to select the 7th and not the 4th field, but how can I know, specially when the output is variable and unknown on beforehand.
One easy way is to add a pass of tr
to squeeze any repeated field separators out:
$ ps | egrep 11383 | tr -s ' ' | cut -d ' ' -f 4
I think the simplest way is to use awk. Example:
$ echo "11383 pts/1 00:00:00 bash" | awk '{ print $4; }'
bash
Please note that the tr -s ' '
option will not remove any single leading spaces. If your column is right-aligned (as with ps
pid)...
$ ps h -o pid,user -C ssh,sshd | tr -s " "
1543 root
19645 root
19731 root
Then cutting will result in a blank line for some of those fields if it is the first column:
$ <previous command> | cut -d ' ' -f1
19645
19731
Unless you precede it with a space, obviously
$ <command> | sed -e "s/.*/ &/" | tr -s " "
Now, for this particular case of pid numbers (not names), there is a function called pgrep
:
$ pgrep ssh
Shell functions
However, in general it is actually still possible to use shell functions in a concise manner, because there is a neat thing about the read
command:
$ <command> | while read a b; do echo $a; done
The first parameter to read, a
, selects the first column, and if there is more, everything else will be put in b
. As a result, you never need more variables than the number of your column +1.
So,
while read a b c d; do echo $c; done
will then output the 3rd column. As indicated in my comment...
A piped read will be executed in an environment that does not pass variables to the calling script.
out=$(ps whatever | { read a b c d; echo $c; })
arr=($(ps whatever | { read a b c d; echo $c $b; }))
echo ${arr[1]} # will output 'b'`
The Array Solution
So we then end up with the answer by @frayser which is to use the shell variable IFS which defaults to a space, to split the string into an array. It only works in Bash though. Dash and Ash do not support it. I have had a really hard time splitting a string into components in a Busybox thing. It is easy enough to get a single component (e.g. using awk) and then to repeat that for every parameter you need. But then you end up repeatedly calling awk on the same line, or repeatedly using a read block with echo on the same line. Which is not efficient or pretty. So you end up splitting using ${name%% *}
and so on. Makes you yearn for some Python skills because in fact shell scripting is not a lot of fun anymore if half or more of the features you are accustomed to, are gone. But you can assume that even python would not be installed on such a system, and it wasn't ;-).
try
ps |&
while read -p first second third fourth etc ; do
if [[ $first == '11383' ]]
then
echo got: $fourth
fi
done