How to count occurrences of each character?

For example I have file 1.txt, that contain:

Moscow
Astana
Tokyo
Ottawa

I want to count number of all char as:

a - 4,
b - 0,
c - 1,
...
z - 0

You could use this:

sed 's/./&\n/g' 1.txt | sort | uniq -ic
  4  
  5 a
  1 c
  1 k
  1 M
  1 n
  5 o
  2 s
  4 t
  2 w
  1 y

The sed part places a newline after every character. Then we sort the ouput alphabetically. And at last uniq counts the number of occurences. The -i flag of uniq can be ommited if you don't want case insensitivity.

A bit late, but to complete the set, another python(3) approach, sorted result:

#!/usr/bin/env python3
import sys

chars = open(sys.argv[1]).read().strip().replace("\n", "")
[print(c+" -", chars.count(c)) for c in sorted(set([c for c in chars]))]

A - 1
M - 1
O - 1
T - 1
a - 4
c - 1
k - 1
n - 1
o - 4
s - 2
t - 3
w - 2
y - 1

Explanation

Read the file, skip spaces and returns as "characters":

chars = open(sys.argv[1]).read().strip().replace("\n", "")

Create a (sorted) set of uniques:
```
sorted(set([c for c in chars]))
```
Count and print the occurrence for each of the characters:
```
print(c+" -", chars.count(c)) for c in <uniques>
```

How to use

Paste the code into an empty file, save it as chars_count.py

Run it with the file as an argument by either:

/path/to/chars_count.py </path/to/file>

if the script is executable, or:

python3 /path/to/chars_count.py </path/to/file>

if it isn't

By default in awk the Field Separator (FS) is space or tab. Since we want to count each character, we will have to redefine the FS to nothing(FS="") to split each character in separate line and save it into an array and at the end insideEND{..} block, print their total occurrences by the following awk command:

$ awk '{for (i=1;i<=NF;i++) a[$i]++} END{for (c in a) print c,a[c]}' FS="" file
A 1
M 1
O 1
T 1
a 4
c 1
k 1
n 1
o 4
s 2
t 3
w 2
y 1

In {for (i=1;i<=NF;i++) a[$i]++} ... FS="" ... block we just splits the characters. And
in END{for (c in a) print c,a[c]} block we are looping to array a and printing saved character in it print c and its number of occurrences a[c]

Do a for loop for all the characters you want to count, and use grep -io to get all occurences of the character and ignoring case, and wc -l to count instances, and print the result.

Like this:

#!/bin/bash

filename="1.txt"

for char in {a..z}
do
    echo "${char} - `grep -io "${char}" ${filename} | wc -l`,"
done

The script outputs this:

a - 5,
b - 0,
c - 1,
d - 0,
e - 0,
f - 0,
g - 0,
h - 0,
i - 0,
j - 0,
k - 1,
l - 0,
m - 1,
n - 1,
o - 5,
p - 0,
q - 0,
r - 0,
s - 2,
t - 4,
u - 0,
v - 0,
w - 2,
x - 0,
y - 1,
z - 0,

EDIT after comment

To create a loop for all printable characters you can do this:

#!/bin/bash

filename="a.txt"

for num in {32..126}
do
   char=`printf "\x$(printf %x ${num})"`
   echo "${char} - `grep -Fo "${char}" ${filename} | wc -l`,"
done

This will count all ANSI characters from 32 to 126 - these are the most commonly readable ones. Note that this does not use ignore case.

output from this will be:

- 0,
! - 0,
" - 0,
# - 0,
$ - 0,
% - 0,
& - 0,
' - 0,
( - 0,
) - 0,
* - 0,
+ - 0,
, - 0,
- - 0,
. - 0,
/ - 0,
0 - 0,
1 - 0,
2 - 0,
3 - 0,
4 - 0,
5 - 0,
6 - 0,
7 - 0,
8 - 0,
9 - 0,
: - 0,
; - 0,
< - 0,
= - 0,
> - 0,
? - 0,
@ - 0,
A - 1,
B - 0,
C - 0,
D - 0,
E - 0,
F - 0,
G - 0,
H - 0,
I - 0,
J - 0,
K - 0,
L - 0,
M - 1,
N - 0,
O - 1,
P - 0,
Q - 0,
R - 0,
S - 0,
T - 1,
U - 0,
V - 0,
W - 0,
X - 0,
Y - 0,
Z - 0,
[ - 0,
\ - 0,
] - 0,
^ - 0,
_ - 0,
` - 0,
a - 4,
b - 0,
c - 1,
d - 0,
e - 0,
f - 0,
g - 0,
h - 0,
i - 0,
j - 0,
k - 1,
l - 0,
m - 0,
n - 1,
o - 4,
p - 0,
q - 0,
r - 0,
s - 2,
t - 3,
u - 0,
v - 0,
w - 2,
x - 0,
y - 1,
z - 0,
{ - 0,
| - 0,
} - 0,
~ - 0,

How to count occurrences of each character?

Explanation

How to use

Related

Recent Posts