How do I sort very large files
Solution 1:
That isn't exactly a Java problem. You need to look into an efficient algorithm for sorting data that isn't completely read into memory. A few adaptations to Merge-Sort can achieve this.
Take a look at this: http://en.wikipedia.org/wiki/Merge_sort
and: http://en.wikipedia.org/wiki/External_sorting
Basically the idea here is to break the file into smaller pieces, sort them (either with merge sort or another method), and then use the Merge from merge-sort to create the new, sorted file.
Solution 2:
Since your records are already in flat file text format, you can pipe them into UNIX sort(1)
e.g. sort -n -t' ' -k1,1 < input > output
. It will automatically chunk the data and perform merge sort using available memory and /tmp
. If you need more space than you have memory available, add -T /tmpdir
to the command.
It's quite funny that everyone is telling you to download huge C# or Java libraries or implement merge-sort yourself when you can use a tool that is available on every platform and has been around for decades.
Solution 3:
You need an external merge sort to do that. Here is a Java implementation of it that sorts very large files.