Get the most common appearing lines from file in Linux
Solution 1:
You can easily do this with built-in commands.
- Feed the contents of the file
sort
. We need this for the next step. - This goes to
uniq -c
. It will count the unique occurrence of each line. If the similar lines are not adjacent, this wouldn't have worked without sorting before. - Then, feed it to another
sort
, which now sorts in reversed order (r
) and based on the numeric (n
) interpretation of theuniq
output. We need the numeric option since otherwise, the space in front of the numbers would lead to wrong results (see GNUsort
's help for more). - Finally, only show the first twelve lines with
head
.
The command would then be:
sort test.txt | uniq -c | sort -rn | head -n 12
The output here contains the actual count of the occurrences.
To only get the raw list of lines, you can pipe the output to sed
:
sort test.txt | uniq -c | sort -rn | head -n 12 | sed -E 's/^ *[0-9]+ //g'
Example:
I'm not there very often
I'm not there very often
Look at me!
Look at me!
Look at me!
Hello there!
Hello there!
Hello there!
Hello there!
Hello there!
Hello there!
Output from the first command, but only selecting 2 from head
:
6 Hello there!
3 Look at me!
Output from the second command:
Hello there!
Look at me!
Solution 2:
If your distro have logtop
cat your_file | logtop
If your file is constantly growing, like a log file, try :
tail -f your_log | logtop