gnu-sort - what does manual mean when it says merge option does "not sort"

Gnu sort (at least the version I looked at the source code for), will sort chunks of the file in memory and create a set of temporary files (1 temporary file per chunk). It also uses multi-threading during the memory sort phase (a command line parameter can set the max number of threads to use). After all of the temporary files are created, it then does 16 way merges (unless you override this) of the temporary files until it produces a single sorted file.

The point here is you do not have to split the file into separate files first, as gnu sort will handle a large file automatically, creating sorted temporary files as needed to merge into a single sorted file.

The -m option is for a special case where multiple already sorted files are to be merged.


The -m simply merges the files together, just like the merge operation of mergesort does. It requires the two files to be sorted according to the same order.

So, for sorting a very large file, what you're doing indeed works: split it in several smaller file, sort them locally. At this point, if you just append each file to another, you'll end up having something like 0 1 2 3 ... 0 1 2 3

The -m option do merges them properly.

For example, with those:

a  b
1  3
2  2
3  1

sort -m a b
# 1 2 3 3 2 1
sort -m a a
# 1 1 2 2 3 3
sort -m b b
# 3 2 1 3 2 1
sort -r -m b a
# 3 2 1 1 2 3

I suspect the conceptual problem is regarding what "merge" means. In the context of sorting algorithms, "merge" has a specific meaning. See https://en.wikipedia.org/wiki/Merge_algorithm for a discussion. A critical point is that while a merge operation does take a number of files as input, the items in any single input file have to be in properly sorted order for the merge to do what it is supposed to -- this is different from a sort operation. In this sense "merge does not sort".

There is also a sorting algorithm called "merge sort", which uses as one of its components merge operations.