Sort --parallel isn't parallelizing

Solution 1:

sort doesn't create a thread unless it needs to, and for small files it's just too much overhead. Now unfortunately sort treats a pipe like a small file. If you want to feed enough data to 24 threads then you'll need to specify to sort to use a large internal buffer (sort does that automatically when presented with large files). This is something we should improve on upstream (at least in documentation). So you'll want something like:

(export LC_ALL=C; grep -E  <files> | sort -S1G --parallel=24 -u | wc -m)

Note I've set LC_ALL=C for all processes, since they'll all benefit with this data).

BTW you can monitor the sort threads with something like:

watch -n.1 ps -C sort -L -o pcpu

Solution 2:

With parsort you can sort big files faster on a multi-core machine.

On a 48 core machine you should see a speedup of 3x over sort.

parsort is part of GNU Parallel and should be a drop-in replacement for sort.