Sort --parallel isn't parallelizing
Solution 1:
sort doesn't create a thread unless it needs to, and for small files it's just too much overhead. Now unfortunately sort treats a pipe like a small file. If you want to feed enough data to 24 threads then you'll need to specify to sort to use a large internal buffer (sort does that automatically when presented with large files). This is something we should improve on upstream (at least in documentation). So you'll want something like:
(export LC_ALL=C; grep -E <files> | sort -S1G --parallel=24 -u | wc -m)
Note I've set LC_ALL=C for all processes, since they'll all benefit with this data).
BTW you can monitor the sort threads with something like:
watch -n.1 ps -C sort -L -o pcpu
Solution 2:
With parsort
you can sort big files faster on a multi-core machine.
On a 48 core machine you should see a speedup of 3x over sort.
parsort
is part of GNU Parallel and should be a drop-in replacement for sort
.