Parallelizing rsync
Solution 1:
I just had a similar problem having to move several TB from one NAS to a different NAS with no backup/restore capability that would allow me to just feed 1 set to the other.
So I wrote this script to use xargs go run several rsyncs for each directory it encounters. It depends on being able to list the source directories (be careful to escape ARG 3) but I think you could set that stage with a non-recursive rsync that just copied files and directories to the appropriate level.
It also determines how many rsync's to run based on the number of processors but you might want to tweak that.
#! /bin/bash
SRC_DIR=$1
DEST_DIR=$2
LIST=$3
CPU_CNT=`cat /proc/cpuinfo|grep processor |wc -l`
# pseudo random heuristic
let JOB_CNT=CPU_CNT*4
[ -z "$LIST" ] && LIST="-tPavW --exclude .snapshot --exclude hourly.?"
echo "rsyncing From=$SRC_DIR To=$DEST_DIR DIR_LIST=$LIST"
mkdir -p /{OLD,NEW}_NAS/home
[ -z "$RSYNC_OPTS" ] && RSYNC_OPTS="-tPavW --delete-during --exclude .snapshot --exclude hourly.?"
cd $SRC_DIR
echo $LIST|xargs -n1 echo|xargs -n1 -P $JOB_CNT -I% rsync ${RSYNC_OPTS} ${SRC_DIR}/%/ ${DEST_DIR}/%/
Solution 2:
GNU Parallel has a solution.
I have moved 15 TB through 1 Gbps and it can saturate the 1 Gbps link.
The following will start one rsync per big file in src-dir to dest-dir on the server fooserver:
cd src-dir; find . -type f -size +100000 | \
parallel -v ssh fooserver mkdir -p /dest-dir/{//}\; \
rsync -s -Havessh {} fooserver:/dest-dir/{}
The dirs created may end up with wrong permissions and smaller files are not being transferred. To fix those run rsync a final time:
rsync -Havessh src-dir/ fooserver:/dest-dir/
Solution 3:
Yes. Such a feature exists.
There is a utility called pssh that provides the described functionality.
This package provides parallel versions of the openssh tools. Included in the distribution:
- Parallel ssh (pssh)
- Parallel scp (pscp)
- Parallel rsync (prsync)
- Parallel nuke (pnuke)
- Parallel slurp (pslurp)
I'm not sure how easy it is to set up, but it might just do the trick!
Solution 4:
I cannot comment, so I have added a new answer, with a little bit better code than the previous (nice & smart) code.
Check the rsync
line, because it contains an optional ionice
tweak.
#!/bin/bash
start_time=$(date +%s.%N)
# Transfer files in parallel using rsync (simple script)
# MAXCONN: maximum number "rsync" processes running at the same time:
MAXCONN=6
# Source and destination base paths. (not need to end with "/")
SRC_BASE=/home/user/public_html/images
[email protected]:/home/user/public_html/images
RSYNC_OPTS="-ah --partial"
# Main loop:
for FULLDIR in $SRC_BASE/*; do
NUMRSYNC=`ps -Ao comm | grep '^'rsync'$' | wc -l `
while [ $NUMRSYNC -ge $MAXCONN ]; do
NUMRSYNC=`ps -Ao comm | grep '^'rsync'$' | wc -l `
sleep 1
done
DIR=`basename $FULLDIR`
echo "Start: " $DIR
ionice -c2 -n5 rsync $RSYNC_OPTS $SRC_BASE/${DIR}/ $DST_BASE/${DIR}/ &
# rsync $RSYNC_OPTS $SRC_BASE/${DIR}/ $DST_BASE/${DIR}/ &
sleep 5
done
execution_time=$(echo "$(date +%s.%N) - $start" | bc)
printf "Done. Execution time: %.6f seconds\n" $execution_time
Solution 5:
Looks like someone has written this utility for you. It breaks the transfer into parallel chunks. This is a better implementation than the "parallel big file" version listed under GNU Parallel:
https://gist.github.com/rcoup/5358786
Also, lftp can parallelize file transfers via ftp, ftps, http, https, hftp, fish, sftp. A lot of times, there are some advantages to using lftp, because managing permissions, restricted access, etc for rsync can be challenging.