Get unique values and their counts in a column

I have a column like this.

Streptococcus
Ecoli
Bcoli
Ecoli
streptococcus
Streptococcus
Mycobacterium
Ecoli

I want a file like this (which includes all unique values and their corresponding counts)

Streptococcus 3
Ecoli 3
Bcoli 1
Mycobacterium 1

Can anyone please help in getting it in ubuntu 12.04?


Solution 1:

Using sort and uniq,

$ sort f | uniq -ci | awk '{print $2,$1}'
Bcoli 1
Ecoli 3
Mycobacterium 1
streptococcus 3

Solution 2:

Assuming each item appears on another line, here is a python solution. Of course you can write the result to a file instead of printing it.

#!/usr/bin/env python
sourcefile = "/path/to/sourcefile"

with open(sourcefile, "r") as germ:
    germ = [item.lower().replace("\n", "") for item in germ.readlines()]
for item in sorted(set(germ)):
    print item.title(), germ.count(item)

it gives:

Bcoli 1
Ecoli 3
Mycobacterium 1
Streptococcus 3

To use it

paste the code above in an empty textfile, save it as get_uniques.py, make it executable (for convenience reasons, so you don't need to type the language if you run it) and run it in a terminal window by the command:

/path/to/get_uniques.py

In case you'd like to write it to a file directly

Use the code below:

#!/usr/bin/env python
sourcefile = "/path/to/sourcefile"
destfile = "/path/to/destinationfile"

with open(sourcefile, "r") as germ:
    germ = [item.lower().replace("\n", "") for item in  germ.readlines()]
with open(destfile, "wt") as germcount:
    for item in sorted(set(germ)):
        germcount.write(item.title()+" "+str(germ.count(item))+"\n")