Awk/Unix group by
$ awk -F, 'NR>1{arr[$1]++}END{for (a in arr) print a, arr[a]}' file.txt
joe 1
jim 1
mike 3
bob 2
EXPLANATIONS
-
-F,
splits on,
-
NR>1
treat lines after line 1 -
arr[$1]++
increment arrayarr
(split with,
) with first column as key -
END{}
block is executed at the end of processing the file -
for (a in arr)
iterating overarr
witha
key -
print a
print key, arr[a]
array witha
key
Strip the header row, drop the age field, group the same names together (sort), count identical runs, output in desired format.
tail -n +2 txt.txt | cut -d',' -f 1 | sort | uniq -c | awk '{ print $2, $1 }'
output
bob 2
jim 1
joe 1
mike 3
It looks like you want sorted output. You could simply pipe or print into sort -nk 2
:
awk -F, 'NR>1 { a[$1]++ } END { for (i in a) print i, a[i] | "sort -nk 2" }' file
Results:
jim 1
joe 1
bob 2
mike 3
However, if you have GNU awk
installed, you can perform the sorting without coreutils. Here's the single process solution that will sort the array by it's values. The solution should still be quite quick. Run like:
awk -f script.awk file
Contents of script.awk
:
BEGIN {
FS=","
}
NR>1 {
a[$1]++
}
END {
for (i in a) {
b[a[i],i] = i
}
n = asorti(b)
for (i=1;i<=n;i++) {
split (b[i], c, SUBSEP)
d[++x] = c[2]
}
for (j=1;j<=n;j++) {
print d[j], a[d[j]]
}
}
Results:
jim 1
joe 1
bob 2
mike 3
Alternatively, here's the one-liner:
awk -F, 'NR>1 { a[$1]++ } END { for (i in a) b[a[i],i] = i; n = asorti(b); for (i=1;i<=n;i++) { split (b[i], c, SUBSEP); d[++x] = c[2] } for (j=1;j<=n;j++) print d[j], a[d[j]] }' file
A strictly awk solution...
BEGIN { FS = "," }
{ ++x[$1] }
END { for(i in x) print i, x[i] }
If name, age
is really in the file, you could adjust the awk program to ignore it...
BEGIN { FS = "," }
/[0-9]/ { ++x[$1] }
END { for(i in x) print i, x[i] }