Copy 10 million images in a single folder to another server
Now I know you shouldn't ever put 10 million files into a single directory to begin with. Blame it on the developers, but as it stands that’s where I am at. We will be fixing it and moving them into folder groups, but first we gotta get them copied off of the production box.
I first tried rsync but it would fail out miserably. I assume it was because storing the name and path of the files in memory was greater than the ram and swap space.
Then I tried to compress it all into a tar.gz but it couldn't unzip it, file too large error (it was 60gigs).
I tried to just do a tar to tar exaction, but I got a "cannot open: file too large"
tar c images/ | tar x –C /mnt/coverimages/
Extra Info:
/mnt/coverimages/ is an nfs share where we want to move the images to.
All files are images
OS: Gentoo
Solution 1:
If you install version 3+ of rsync it will do a rolling list of files to transfer and won't need to keep the entire file list in memory. In the future you probably want to consider hashing the filenames and creating a directory structure based on parts of those hashes.
You can see this answer to get an idea of what I mean with the hashing.
Solution 2:
If I could arrange the downtime I'd simple move the disk temporarily.
Solution 3:
have you tried using find and -exec (or xargs), something like
find images/ -exec cp "{}" /mnt/coverimages/ \;
?
Solution 4:
I don't quite think that you have the "tar | tar" command quite right. Try this
tar cf - images/ | cd /mnt/coverimages && tar xf -
Another option would be to stream over SSH (some CPU overhead for encryption):
tar cf - images/ | ssh user@desthost "cd /path/coverimages && tar xf -"
There's also cpio, which a bit more obscure, but offers similar functionality:
find images/ | cpio -pdm /mnt/coverimages/