How to count word frequencies within a file in python
I have a .txt file with the following format,
C
V
EH
A
IRQ
C
C
H
IRG
V
Although obviously it's a lot bigger then that, this is essentially it.Basically I'm trying to sum how many times each individual string is in the file (each letter/string is on a separate line, so technically the file is C\nV\nEH\n etc. However when I try to convert these files into a list, and then use the count function on, it separates out letters so that strings such as 'IRQ' are ['\n'I','R','Q','\n'] so then when I count it I get the frequencies of each individual letter and not of the strings.
Here is the code that I have written so far,
def countf():
fh = open("C:/x.txt","r")
fh2 = open("C:/y.txt","w")
s = []
for line in fh:
s += line
for x in s:
fh2.write("{:<s} - {:<d}".format(x,s.count(x))
What I want to end up with is an output file that looks something like this
C 10
V 32
EH 7
A 1
IRQ 9
H 8
Solution 1:
use Counter()
, and use strip()
to remove the \n
:
from collections import Counter
with open('x.txt') as f1,open('y.txt','w') as f2:
c=Counter(x.strip() for x in f1)
for x in c:
print x,c[x] #do f2.write() here if you want to write them to f2
output:
A 1
C 3
EH 1
IRQ 1
V 2
H 1
IRG 1