How do I compute the mean from ASCII file data in bash?
In bash I can grep some time measurements from a log file like this
grep "time:" myLogfile.txt | cut -d' ' -f 3 >> timeMeasurements.txt
#timeMeasurements.txt
2.5
3.5
2.0
...
Now I would like to compute the mean value from the values in timeMeasurements.txt
. What is the quickest way to do that in bash?
I know that there is gnuplot and R but it seems like one has to write some lengthy script for either one on them.
Solution 1:
Obligatory GNU datamash version
$ datamash mean 1 < file
2.6666666666667
ASIDE: it feels like this really should be possible natively in bc
(i.e. without using the shell, or an external program, to loop over input values). The GNU bc
implementation includes a read()
function - however it appears to be frustratingly difficult to get it to detect end-of-input. The best I could come up with is:
#!/usr/bin/bc
scale = 6
while( (x = read()) ) {
s += x
c += 1
}
s/c
quit
which you can then pipe file input to provided you terminate input with any non-numeric character e.g.
$ { cat file; echo '@'; } | ./mean.bc
2.666666
Solution 2:
You could use awk
. Bash itself is not very good at maths...
awk 'BEGIN { lines=0; total=0 } { lines++; total+=$1 } END { print total/lines }' timeMeasurements.txt
Notes
-
lines=0; total=0
set variables to 0 -
lines++
increaselines
by one for each line -
total+=$1
add the value in each line to the running total -
print total/lines
when done, divide the total by the number of values
Solution 3:
Another way, using sed
and bc
:
sed 's/^/n+=1;x+=/;$ascale=1;x/n' timemeasurements.txt | bc
The sed expression converts the input to something like this:
n+=1;x+=2.5
n+=1;x+=3.5
n+=1;x+=2.0
scale=1;x/n
This is piped to bc
which evaluates it line-by-line.