command line utility to print statistics of numbers in linux
I often find myself with a file that has one number per line. I end up importing it in excel to view things like median, standard deviation and so forth.
Is there a command line utility in linux to do the same? I usually need to find the average, median, min, max and std deviation.
This is a breeze with R. For a file that looks like this:
1
2
3
4
5
6
7
8
9
10
Use this:
R -q -e "x <- read.csv('nums.txt', header = F); summary(x); sd(x[ , 1])"
To get this:
V1
Min. : 1.00
1st Qu.: 3.25
Median : 5.50
Mean : 5.50
3rd Qu.: 7.75
Max. :10.00
[1] 3.02765
- The
-q
flag squelches R's startup licensing and help output - The
-e
flag tells R you'll be passing an expression from the terminal -
x
is adata.frame
- a table, basically. It's a structure that accommodates multiple vectors/columns of data, which is a little peculiar if you're just reading in a single vector. This has an impact on which functions you can use. - Some functions, like
summary()
, naturally accommodatedata.frames
. Ifx
had multiple fields,summary()
would provide the above descriptive stats for each. - But
sd()
can only take one vector at a time, which is why I indexx
for that command (x[ , 1]
returns the first column ofx
). You could useapply(x, MARGIN = 2, FUN = sd)
to get the SDs for all columns.
Using "st" (https://github.com/nferraz/st)
$ st numbers.txt
N min max sum mean stddev
10 1 10 55 5.5 3.02765
Or:
$ st numbers.txt --transpose
N 10
min 1
max 10
sum 55
mean 5.5
stddev 3.02765
(DISCLAIMER: I wrote this tool :))