Get unique values and their counts in a column
I have a column like this.
Streptococcus
Ecoli
Bcoli
Ecoli
streptococcus
Streptococcus
Mycobacterium
Ecoli
I want a file like this (which includes all unique values and their corresponding counts)
Streptococcus 3
Ecoli 3
Bcoli 1
Mycobacterium 1
Can anyone please help in getting it in ubuntu 12.04?
Solution 1:
Using sort and uniq,
$ sort f | uniq -ci | awk '{print $2,$1}'
Bcoli 1
Ecoli 3
Mycobacterium 1
streptococcus 3
Solution 2:
Assuming each item appears on another line, here is a python solution. Of course you can write the result to a file instead of printing it.
#!/usr/bin/env python
sourcefile = "/path/to/sourcefile"
with open(sourcefile, "r") as germ:
germ = [item.lower().replace("\n", "") for item in germ.readlines()]
for item in sorted(set(germ)):
print item.title(), germ.count(item)
it gives:
Bcoli 1
Ecoli 3
Mycobacterium 1
Streptococcus 3
To use it
paste the code above in an empty textfile, save it as get_uniques.py
, make it executable (for convenience reasons, so you don't need to type the language if you run it) and run it in a terminal window by the command:
/path/to/get_uniques.py
In case you'd like to write it to a file directly
Use the code below:
#!/usr/bin/env python
sourcefile = "/path/to/sourcefile"
destfile = "/path/to/destinationfile"
with open(sourcefile, "r") as germ:
germ = [item.lower().replace("\n", "") for item in germ.readlines()]
with open(destfile, "wt") as germcount:
for item in sorted(set(germ)):
germcount.write(item.title()+" "+str(germ.count(item))+"\n")