How to keep load-balanced servers synced even with deleted files?

I've recently setup a loadbalanced solution for our websites. We host about 200 sites, most run of our custom application, but some are running wordpress blogs (in which files can be uploaded/deleted). The setup is basic:

          |-------------------> Apache1
          |
 HAProxy -|
          |
          |-------------------> Apache2

I've set up Apache1 as a 'master', so that most of the changes made on it are rsync'd over to Apache2 every minute using the following command:

rsync -av --delete apache1:/var/www/html/ /var/www/html/

The problem is, as mentioned earlier, in some cases files are added/removed on Apache2. The only solution I've come up with so far is to have Apache1 rsync all files in certain directories (wp-content, for instance) to itself (not delete), then push everything back to Apache2.

This has it's flaws, the main ones being:

  • The two servers will eventually get extra files that have been deleted on Apache2
  • As I add more servers, the rsync script will take longer to complete.

Are there any ways to keep 2+ web servers synched, taking into account that both servers can have files added, updated and deleted?


Solution 1:

I'm using OCFS2 with DRBD.

A DRBD resource /etc/drbd.d/r0.res:

resource r0 {
    syncer { rate 1000M; }
    net {
        allow-two-primaries;
        after-sb-0pri discard-zero-changes;
        after-sb-1pri discard-secondary;
        after-sb-2pri disconnect;
    }
    startup { become-primary-on both; }

    on s1 {
        device      /dev/drbd1;
        disk        /dev/sdc;
        address     ip1:7789;
        meta-disk   internal;
    }
    on s2 {
        device      /dev/drbd1;
        disk        /dev/xvdb2;
        address     ip2:7789;
        meta-disk   internal;
    }
}

/dev/drbd1 is formatted as ocfs2 filesystem:

/dev/drbd1   ocfs2   100660180   7427076  93233104   8% /data/webroot

Configuration for OCFS2 without Pacemaker /etc/ocfs2/cluster.conf:

node:
    ip_port = 7777
    ip_address = ip1
    number = 0
    name = s1
    cluster = ocfs2

node:
    ip_port = 7777
    ip_address = ip2
    number = 1
    name = s2
    cluster = ocfs2

cluster:
    node_count = 2
    name = ocfs2

DRBD status can be looked at with drbd-overview utility:

# drbd-overview 
  1:r0  Connected Primary/Primary UpToDate/UpToDate C r---- /data/webroot ocfs2 96G 9.8G 87G 11% 

or from /proc/drbd:

cat /proc/drbd 
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:09

 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----
    ns:953133955 nr:42207234 dw:1185526354 dr:62396241 al:230084 bm:5853 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Solution 2:

We are currently using rsync also, but I'm not crazy about it.

We have been experimenting with fileconveyor, which not only will sync between two servers, but we can also sync up with S3, Cloudfiles or other cloud storage. This will obviously provide us a lot more flexibility.

I don't have any config setups to share at this moment, but we are liking what we see.

Solution 3:

I have not used it in a server setup, but you might try Unison. It deals with changes on either side and will automatically sync things that aren't conflicting. I believe it is limited to 2 hosts, so it wouldn't scale past your current solution.

The only way I know how to scale past 2 hosts would be to set up NFS, or some other shared/distributed filesystem.