Find duplicate lines in a file and count how many time each line was duplicated?
Assuming there is one number per line:
sort <file> | uniq -c
You can use the more verbose --count
flag too with the GNU version, e.g., on Linux:
sort <file> | uniq --count
This will print duplicate lines only, with counts:
sort FILE | uniq -cd
or, with GNU long options (on Linux):
sort FILE | uniq --count --repeated
on BSD and OSX you have to use grep to filter out unique lines:
sort FILE | uniq -c | grep -v '^ *1 '
For the given example, the result would be:
3 123
2 234
If you want to print counts for all lines including those that appear only once:
sort FILE | uniq -c
or, with GNU long options (on Linux):
sort FILE | uniq --count
For the given input, the output is:
3 123
2 234
1 345
In order to sort the output with the most frequent lines on top, you can do the following (to get all results):
sort FILE | uniq -c | sort -nr
or, to get only duplicate lines, most frequent first:
sort FILE | uniq -cd | sort -nr
on OSX and BSD the final one becomes:
sort FILE | uniq -c | grep -v '^ *1 ' | sort -nr
To find and count duplicate lines in multiple files, you can try the following command:
sort <files> | uniq -c | sort -nr
or:
cat <files> | sort | uniq -c | sort -nr