Why did the command uniq -c put a whitespace at the beginning?

The default behaviour of uniq is to right-justify the frequency in a line 7 spaces wide, then separate the frequency from the item with a single space.

Source : https://www.thelinuxrain.com/articles/tweaking-uniq-c

Remove the leading spaces with sed :

$ sort input | uniq -c | sort -nr | sed 's/^\s*//' > output

uniq -c adds leading whitespace. E.g.

$ echo test
test
$ echo test | uniq -c
      1 test

You could add a command at the end of the pipeline to remove it. E.g.

$ echo test | uniq -c | sed 's/^\s*//'
1 test

FWIW you can use a different sorting tool for more flexibility. Python is one such tool.

Source

#!/usr/bin/python3
import sys, operator, collections

counter = collections.Counter(map(operator.methodcaller('rstrip', '\n'), sys.stdin))
for item, count in counter.most_common():
    print(count, item)

In theory this would even be faster than the sort tool for large inputs since the above program uses a hash table to identify duplicate lines instead of a sorted list. (Alas it places lines of identical count in an arbitrary instead of a natural order; this can be amended and still be faster than two sort invocations.)

Output Format

If you want more flexibility on the output format you can look into the print() and format() built-in functions.

For instance, if you want to print the count number in octal with up to 7 leading zeros and followed by a tab instead of a space character with a NUL line terminator, replace the last line with:

    print(format(count, '08o'), item, sep='\t', end='\0')

Usage

Store the script in a file, say sort_count.py, and invoke it with Python:

python3 sort_count.py < input

Why did the command uniq -c put a whitespace at the beginning?

Source

Output Format

Usage

Related

Recent Posts