Awk/Unix group by

$ awk -F, 'NR>1{arr[$1]++}END{for (a in arr) print a, arr[a]}' file.txt
joe 1
jim 1
mike 3
bob 2

EXPLANATIONS

-F, splits on ,
NR>1 treat lines after line 1
arr[$1]++ increment array arr (split with ,) with first column as key
END{} block is executed at the end of processing the file
for (a in arr) iterating over arr with a key
print a print key , arr[a] array with a key

Strip the header row, drop the age field, group the same names together (sort), count identical runs, output in desired format.

tail -n +2 txt.txt | cut -d',' -f 1 | sort | uniq -c | awk '{ print $2, $1 }'

output

bob 2
jim 1
joe 1
mike 3

It looks like you want sorted output. You could simply pipe or print into sort -nk 2:

awk -F, 'NR>1 { a[$1]++ } END { for (i in a) print i, a[i] | "sort -nk 2" }' file

Results:

jim 1
joe 1
bob 2
mike 3

However, if you have GNU awk installed, you can perform the sorting without coreutils. Here's the single process solution that will sort the array by it's values. The solution should still be quite quick. Run like:

awk -f script.awk file

Contents of script.awk:

BEGIN {
    FS=","
}

NR>1 {
    a[$1]++
}

END {
    for (i in a) {
        b[a[i],i] = i
    }

    n = asorti(b)

    for (i=1;i<=n;i++) {
        split (b[i], c, SUBSEP)
        d[++x] = c[2]
    }

    for (j=1;j<=n;j++) {
        print d[j], a[d[j]]
    }
}

Results:

jim 1
joe 1
bob 2
mike 3

Alternatively, here's the one-liner:

awk -F, 'NR>1 { a[$1]++ } END { for (i in a) b[a[i],i] = i; n = asorti(b); for (i=1;i<=n;i++) { split (b[i], c, SUBSEP); d[++x] = c[2] } for (j=1;j<=n;j++) print d[j], a[d[j]] }' file

A strictly awk solution...

BEGIN { FS = "," }
{ ++x[$1] }
END { for(i in x) print i, x[i] }

If name, age is really in the file, you could adjust the awk program to ignore it...

BEGIN   { FS = "," }
/[0-9]/ { ++x[$1] }
END     { for(i in x) print i, x[i] }

Awk/Unix group by

EXPLANATIONS

Related

Recent Posts