Use awk to find average of a column [duplicate]
Solution 1:
awk '{ sum += $2; n++ } END { if (n > 0) print sum / n; }'
Add the numbers in $2
(second column) in sum
(variables are auto-initialized to zero by awk
) and increment the number of rows (which could also be handled via built-in variable NR). At the end, if there was at least one value read, print the average.
awk '{ sum += $2 } END { if (NR > 0) print sum / NR }'
If you want to use the shebang notation, you could write:
#!/bin/awk
{ sum += $2 }
END { if (NR > 0) print sum / NR }
You can also control the format of the average with printf()
and a suitable format ("%13.6e\n"
, for example).
You can also generalize the code to average the Nth column (with N=2
in this sample) using:
awk -v N=2 '{ sum += $N } END { if (NR > 0) print sum / NR }'
Solution 2:
Your specific error is with line 11:
awk 'BEGIN{sum+=$2}'
This is a line where awk
is invoked, and its BEGIN
block is specified - but you are already within a awk script, so you do not need to specify awk
. Also you want to run sum+=$2
on each line of input, so you do not want it within a BEGIN
block. Hence the line should simply read:
sum+=$2
You also do not need the lines:
x=sum
read name
the first just creates a synonym to sum
named x
and I'm not sure what the second does, but neither are needed.
This would make your awk script:
#!/bin/awk
### This script currently prints the total number of rows processed.
### You must edit this script to print the average of the 2nd column
### instead of the number of rows.
# This block of code is executed for each line in the file
{
sum+=$2
# The script should NOT print out a value for each line
}
# The END block is processed after the last line is read
END {
# NR is a variable equal to the number of rows in the file
print "Average: " sum/ NR
# Change this to print the Average instead of just the number of rows
}
Jonathan Leffler's answer gives the awk one liner which represents the same fixed code, with the addition of checking that there are at least 1 lines of input (this stops any divide by zero error). If
Solution 3:
Try this:
ls -l | awk -F : '{sum+=$5} END {print "AVG=",sum/NR}'
NR is an AWK builtin variable to count the no. of records