Count line lengths in file using command line tools
Problem
If I have a long file with lots of lines of varying lengths, how can I count the occurrences of each line length?
Example:
file.txt
this
is
a
sample
file
with
several
lines
of
varying
length
Running count_line_lengths file.txt
would give:
Length Occurences
1 1
2 2
4 3
5 1
6 2
7 2
Ideas?
This
- counts the line lengths using
awk
, then - sorts the (numeric) line lengths using
sort -n
and finally - counts the unique line length values
uniq -c
.
$ awk '{print length}' input.txt | sort -n | uniq -c
1 1
2 2
3 4
1 5
2 6
2 7
In the output, the first column is the number of lines with the given length, and the second column is the line length.
Pure awk
awk '{++a[length()]} END{for (i in a) print i, a[i]}' file.txt
4 3
5 1
6 2
7 2
1 1
2 2
Using bash
arrays:
#!/bin/bash
while read line; do
((histogram[${#line}]++))
done < file.txt
echo "Length Occurrence"
for length in "${!histogram[@]}"; do
printf "%-6s %s\n" "${length}" "${histogram[$length]}"
done
Example run:
$ ./t.sh
Length Occurrence
1 1
2 2
4 3
5 1
6 2
7 2
$ perl -lne '$c{length($_)}++ }{ print qq($_ $c{$_}) for (keys %c);' file.txt
Output
6 2
1 1
4 3
7 2
2 2
5 1
You can accomplish this by using basic unix utilities only:
$ printf "%s %s\n" $(for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c | sed -E "s/([0-9]+)[^0-9]+([0-9]+)/\2 \1/") 1 1 2 2 4 3 5 1 6 2 7 2
How it works?
- Here's the source file:
$ cat file.txt this is a sample file with several lines of varying length
- Replace each line of the source file with its length:
$ for line in $(cat file.txt); do printf $line | wc -c; done 4 2 1 6 4 4 7 5 2 7 6
- Sort and count the number of length occurrences:
$ for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c 1 1 2 2 3 4 1 5 2 6 2 7
- Swap and format the numbers:
$ printf "%s %s\n" $(for line in $(cat file.txt); do printf $line | wc -c; done | sort -n | uniq -c | sed -E "s/([0-9]+)[^0-9]+([0-9]+)/\2 \1/") 1 1 2 2 4 3 5 1 6 2 7 2