A simple Volume Replication Tool for large data set?
I've used Doubletake Move to move large datasets across the Internet. They use byte level replication, and keep track of changes to files. There is also a nice bandwidth throttling scheduler to use less during the day, and crank it up at night and on the weekend. It also recovers pretty well in the event of a connection break of some kind.
Now, I am assuming this is some sort of MSA attached to a physical machine, but in the event you are using a SAN, Check with your SAN vendor for async replication options.
Whatever replication you use, there are a couple things you want to think about:
- Bandwidth at the source and target side
- File rate of change
If your rate of change is too high on the source side, and you don't have enough bandwidth to overcome it, you will never get a good replication.
Re-indexing databases, defrags, and bulk file moves/adds/deletes have all caused me headaches in the past.
Hopefully my past pain will help someone that reads this! :D
DFSR in 2003 R2 (2008 and 2008 R2 are likely much more scalable, exp from being x64) worked so wonderfully for us with 75 servers with all on different WAN links (1-10MB) syncing files totaling at least 500GB (per server times 75) on bad, slow, or saturated WAN links that I would just recommend doing what you can to get DFSR to work. We found it easier to manage org wide then Veritas, but this was 2006-7 after 2003 R2 came out. GPO’s controlling BITS bandwidth is your friend.
One of the things with DFSR is it keeps a folder with copies of all the blocks that were changed that need to be sent out, so storage is something you want to be liberal with. I am curious about the limits you quote, as I would think that 8M files would be fine in context of plenty of resources (RAM, CPU, Disk). I don't know our file count.
Also, in the past DFSR has been incompatible with backup, A/V, and other software. In 2008-2009 we found that Netbackup didn’t like DFSR and was reporting success with our file server backups but was actually backing up NOTHING. Only on testing restores did we discover this horrible, horrible fault in Netbackup. If there’s one thing you never want backup systems to do, is report a successful backup but truly have an empty tape.
Anyway I give a vote of confidence to DFSR, especially in its 3rd version with 2008 R2 as something you should give a shot if you can't find a vendor's product that specifically says they've tested your scenario. Often what Microsoft officially supports is much more conservative to what they know will work. Obviously your mileage may vary, and you have to determine your own level of risk you're willing to take.
The key for looking at replication is to look for a "consistent set" of data that gets replicated. For example: Db & corresponding Log files should be replicated in a "consistent" manner such that the data is use able on the other replicated site.
The second important feature to look at is - the time required to recovery in case there is a drop in connection. Does the replication start from where it left? Does it re-start the replication?
The third - how well (or worse) do they perform on varying network condition and what is the bandwidth utilization.
The other important things to look at are - are the file permissions maintained? Are the other file attributes maintained eg. what happens to compressed folders? What happens to encrypted files? Are open files replicated? etc etc.
Give all the above, block-based replication solutions are much better than file based replication. Host block-based replication would be cheaper than "off-host" block-based replication.