very fast file syncing software [closed]

I would recommend GlusterFS. It's a NAS filesystem that connects several servers into one filesystem. You can then mount it via fuse. You could also connect the servers via an IPsec solution, see Openswan for example. To add security.

From wikipedia about GlusterFS:
GlusterFS has a client and server component. Servers are typically deployed as storage bricks, with each server running a glusterfsd daemon to export a local file system as a volume. The glusterfs client process, which connects to servers with a custom protocol over TCP/IP, InfiniBand or SDP, composes composite virtual volumes from multiple remote servers using stackable translators. By default, files are stored whole, but striping of files across multiple remote volumes is also supported. The final volume may then be mounted by the client host through the FUSE mechanism or accessed via libglusterfs client library without incurring FUSE filesystem overhead. Most of the functionality of GlusterFS is implemented as translators, including:

File-based mirroring and replication
File-based striping
File-based load balancing
Volume failover
scheduling and disk caching
Storage quotas

How many sites are you talking about? Personally I'd look at setting something up with my own servers using something like the DRBD filesystem or DFS (Windows uses DFS as a synching filesystem over the network, DRBD is a Linux RAID 1 over IP solution), then have the clients connect to a share on the servers (or a mapped drive) and everything would be synced automatically.

As a second research topic, rsync directories among servers over SSH.

Otherwise you might be looking at rolling your own application and set of scripts to do what you're looking for which would probably not be simple, cheap or easy.

Without knowing specifics (number of sites, control at the client sites you have, bandwidth, etc.) it's hard to make other suggestions.

EDIT - DRDB seems optimized for 2 servers; I don't know what it would take to "chain" data. Plus are you going to sync data from one server out to the outliers? Have you planned the priority of sync paths (as in do you have a central repo that everything is syncing from, or are you decentralized in where and how the data will be coming from and going to? Or are these outlying offices syncing things to a central server?) You could be looking at quite a complicated setup when you factor these things in. You would have to look at either having a way to run a sync utility as suggested by http://billboebel.typepad.com/blog/2006/11/data_mirroring_.html or rsync at particular times or finding a cluster-based filesystem that handles multiple active "primary" peers without enough overhead that it slams your bandwidth.

You didn't mention the size of the data being edited or reliability of your connection; if you're dealing with the average document you'll have different potential corruption and issues than if you're editing large graphics.

Given the complication of this kind of setup you're looking at, I would also suggest considering a remote access solution as a potential fix. If you're running Linux it's not altogether impossible to have a central server in the "primary" office and have people log in with remote SSH terminals and run sessions directly off the server, much like a Windows Terminal Services solution. This gives you more control over how the data is backed up, securely accessed and audited. But you should have a decent connection to do this. Very fast connections would allow SSH connections with X-forwarding, mediocre to fast connections you could set up a remote terminal solution more akin to VNC (encrypted tunnel/VPN) for the client<->server connections.

Another consideration is VPN to a central site and mount directories via NFS or a FUSE module like SSHFS. Again, depends on your bandwidth and connection stability.

If you want to keep syncing data as a solution you'll still have potential lock problems and data being updated with race conditions to consider so you'll need to research the best filesystems that can automatically handle that situation.

If you're using Windows for your clients and servers, I suggest you investigate Distributed File System. Also take a look at offline caching with EFS. If you're not using Windows, please let us know what you're using.

Edit: Take a look at tsync (beta) for Linux.

ChironFS is a distributed filesystem designed for replication. It doesn't take care of encryption, but you can use it underneath an encrypted filesystem like EncFS if you want each client to manage encryption, or ue SSHFS to protect the data over the wire. I don't know if it'll be suitable performance-wise.

very fast file syncing software [closed]

Related

Recent Posts