How can I count each type of character (and total them) in a text file?

Solution 1:

General count with wc

You can use wc to count lines, words, characters, bytes ... but not list the count for each separate character. See man wc.

Count number of each separate character

If you want to list the number for each separate character you can

  • start by printing each character to a separate line with grep
  • then sort them with sort
  • then use uniq to print the number of each kind

Examples

Examples assuming that you have also a link to a dictionary file (word-list) at /usr/share/dict/words

$ wc --lines --words --chars --bytes /usr/share/dict/words
102305 102305 971304 971578 /usr/share/dict/words

There are more bytes than characters because some characters consist of more than one byte (for example the last [umlaut] characters in the list below).

  $ < /usr/share/dict/words grep -o '.' |sort |uniq -c
  29105 '
  65630 a
   1438 A
     12 á
      6 â
  14654 b
   1481 B
  31144 c
   1636 C
      5 ç
  28422 d
    844 D
  90579 e
    653 E
    148 é
     29 è
      6 ê
  10380 f
    538 F
  22501 g
    852 G
  19325 h
    919 H
  68343 i
    361 I
      2 í
   1482 j
    560 J
   8188 k
    680 K
  41512 l
    942 L
  21488 m
   1768 M
  58328 n
    587 N
      8 ñ
  50187 o
    409 O
     10 ó
      2 ô
  21691 p
   1049 P
   1492 q
     72 Q
  58312 r
    782 R
  92909 s
   1656 S
  53309 t
    908 T
  26773 u
    140 U
      3 û
   7870 v
   7281 w
    352 V
    533 W
   2139 x
     44 X
  12896 y
    154 Y
     14 ü
   3266 z
    161 Z
      3 å
      2 Å
      7 ä
     17 ö

Solution 2:

There is a very simple way of counting each character in a text file. I have used your own question as a text file (called countc) and tested this code:

grep '.' -o countc | awk '{a[$1]++} END {for (i in a) print i,a[i]}'

and this is what you get:

' 1
h 9
u 6
 46
v 1
i 7
j 2
w 5
k 1
x 1
l 10
y 4
m 3
n 16
a 14
. 2
o 19
p 1
c 12
I 2
d 9
r 14
e 28
f 8
s 8
g 5
t 21

awk arrays are very useful for such operations.