Use awk to find average of a column [duplicate]

Solution 1:

awk '{ sum += $2; n++ } END { if (n > 0) print sum / n; }'

Add the numbers in $2 (second column) in sum (variables are auto-initialized to zero by awk) and increment the number of rows (which could also be handled via built-in variable NR). At the end, if there was at least one value read, print the average.

awk '{ sum += $2 } END { if (NR > 0) print sum / NR }'

If you want to use the shebang notation, you could write:

#!/bin/awk

{ sum += $2 }
END { if (NR > 0) print sum / NR }

You can also control the format of the average with printf() and a suitable format ("%13.6e\n", for example).

You can also generalize the code to average the Nth column (with N=2 in this sample) using:

awk -v N=2 '{ sum += $N } END { if (NR > 0) print sum / NR }'

Solution 2:

Your specific error is with line 11:

awk 'BEGIN{sum+=$2}'

This is a line where awk is invoked, and its BEGIN block is specified - but you are already within a awk script, so you do not need to specify awk. Also you want to run sum+=$2 on each line of input, so you do not want it within a BEGIN block. Hence the line should simply read:

sum+=$2

You also do not need the lines:

x=sum
read name

the first just creates a synonym to sum named x and I'm not sure what the second does, but neither are needed.

This would make your awk script:

#!/bin/awk

### This script currently prints the total number of rows processed.
### You must edit this script to print the average of the 2nd column
### instead of the number of rows.

# This block of code is executed for each line in the file
{
    sum+=$2
    # The script should NOT print out a value for each line
}
# The END block is processed after the last line is read
END {
    # NR is a variable equal to the number of rows in the file
    print "Average: " sum/ NR
    # Change this to print the Average instead of just the number of rows
}

Jonathan Leffler's answer gives the awk one liner which represents the same fixed code, with the addition of checking that there are at least 1 lines of input (this stops any divide by zero error). If

Solution 3:

Try this:

ls -l  | awk -F : '{sum+=$5} END {print "AVG=",sum/NR}'

NR is an AWK builtin variable to count the no. of records