How to Split a Huge CSV File in Linux?
I have 60TB of data that resides in 12 csv files.
The data will be loaded into a clustered database where the loading processes is single threaded. In order to improve my load performance I need to initiate a load process from each node.
So far so good from this point of view. My biggest problem is how can I split this data? It is zipped, and each csv file has around 5TB of data! I tried split but it takes too long!
Solution 1:
The easiest but not the fastest, most likely, way is
unzip -p <zipfile> | split -C <size>