How can I count each type of character (and total them) in a text file?
Solution 1:
General count with wc
You can use wc
to count lines, words, characters, bytes ... but not list the count for each separate character. See man wc
.
Count number of each separate character
If you want to list the number for each separate character you can
- start by printing each character to a separate line with
grep
- then sort them with
sort
- then use
uniq
to print the number of each kind
Examples
Examples assuming that you have also a link to a dictionary file (word-list) at /usr/share/dict/words
$ wc --lines --words --chars --bytes /usr/share/dict/words
102305 102305 971304 971578 /usr/share/dict/words
There are more bytes than characters because some characters consist of more than one byte (for example the last [umlaut] characters in the list below).
$ < /usr/share/dict/words grep -o '.' |sort |uniq -c
29105 '
65630 a
1438 A
12 á
6 â
14654 b
1481 B
31144 c
1636 C
5 ç
28422 d
844 D
90579 e
653 E
148 é
29 è
6 ê
10380 f
538 F
22501 g
852 G
19325 h
919 H
68343 i
361 I
2 í
1482 j
560 J
8188 k
680 K
41512 l
942 L
21488 m
1768 M
58328 n
587 N
8 ñ
50187 o
409 O
10 ó
2 ô
21691 p
1049 P
1492 q
72 Q
58312 r
782 R
92909 s
1656 S
53309 t
908 T
26773 u
140 U
3 û
7870 v
7281 w
352 V
533 W
2139 x
44 X
12896 y
154 Y
14 ü
3266 z
161 Z
3 å
2 Å
7 ä
17 ö
Solution 2:
There is a very simple way of counting each character in a text file. I have used your own question as a text file (called countc) and tested this code:
grep '.' -o countc | awk '{a[$1]++} END {for (i in a) print i,a[i]}'
and this is what you get:
' 1
h 9
u 6
46
v 1
i 7
j 2
w 5
k 1
x 1
l 10
y 4
m 3
n 16
a 14
. 2
o 19
p 1
c 12
I 2
d 9
r 14
e 28
f 8
s 8
g 5
t 21
awk arrays are very useful for such operations.