DRBD experimentation and virtualization

This was one of those thoughts that were tickling the back of my mind.

I'm working on a home testbed of a high-availability cluster consisting of just computers, not a SAN or NAS for storage, just "if I wanted a server or two that were available even if the hardware failed and I had some old machines around to do it on, can I make it?" thing. Think RAID-1 at a system hardware level.

I was thinking of trying to do it by installing a Linux distro, installing DRBD in primary/primary mode with Pacemaker/STONITH, then installing Xen to virtualize the server(s) that would actually provide the systems to replicate.

Recent setups at work with VMWare ESXi had me wondering if there could be some kind of advantage to instead using ESXi to install Linux VM's on a couple machines, then use DRBD and Pacemaker/STONITH to replicate the server services between virtual machines on two VMWare ESXi systems (and remove Xen from the equation since I could spin up other VM's).

At the time I think I was liking the management interface's more or less straightforward way of giving stats on performance, disk use, etc. on the VM side, while I've seen nothing regarding management of Xen or DRBD other than the command line (although I hate having to use a Windows system to monitor the VMWare server).

Second thoughts told me that it would be an added layer of complexity and probably difficulty with networking, since I could probably more easily run Linux/DRBD replication with the dedicated hardware (each machine would have one NIC for the switch, one NIC to crossover to each other for disk I/O) and I wanted to find out what I could do to create such a cluster for "free"...and VMWare's solutions beyond ESXi are definitely not cheap.

Has anyone else tried something like this configuration, virtualizing machine running DRBD in the VM's instead of bare metal hardware? Are there configuration advantages to this beyond just performance/management monitoring with the free vSphere client (or "free" virtualization of choice)?


Solution 1:

At least with Xen my experience is that it's better to let the Dom0 handle the block devices. I haven't dealt with DRDB but with iSCSI it's better to have Dom0 be the iSCSI initiator and then just have the DomU use the resulting block device.

DRBD doesn't care about the filesystem that's running on the volumes so I would say this is probably best done in the DomO. This also gives you the ability to have DRBD backup Window DomUs.

You may also want to check out this question as it addresses some of your questions about running heartbeat on a vm.

Solution 2:

I have been working at setting this up in much of a way you describe and it works great! (XenServer)

I setup an old but capable server as the primary host, this runs a console only VM for DRBD. This VM then serves a "SharedDRBD" SR back to the Xen Host via NFS. The rest of the working VMs providing services run on the SharedDRBD SR. The VM's DRBD dev is on its own VDI on a MDADM RAID 1. This SharedDRBD SR hosts the rest of the VMs for various services with a local larger RAID10 array for bulk filestorage.

All MDADM work is done by the host, but one side of the DRBD is in a VM.

The DRBD ran in a VM gets synced with a DRBD service running on the file backup server; the file backup server is NOT virtualized purposefully so we have bare metal access to all files given XenServer is the biggest quirk we generally deal with.

There is a secondary server that is virtualized but has no local storage except for what is required for the host. This server is part of a Xen pool with the primary server to simplify failover. Failover is currently manual but fast and easy.

First, all VMs on the SharedDRBD SR are shutdown while the secondary XenServer host is powered on. The DRBD on the file backup server is made primary and mounted as needed. Then, the SharedDRBD SR is pointed to the file backup server and VMs are started up on the secondary server; XenCenter doesn't even realize it is serving the VMs from a new location because it sees the same SR with the same data. The VMs are fired back up and things are back and running.

There is alot more to it in terms of configuration, and arrays, network topology, etc; but the jist is DRBD is served in a VM back to its own host.

Overall it is HA enough for our SMB / Home use; down time during a catastrophic failure of the primary server is 10-20 min or less to fully back online and no loss of data; DRBD means the VMs are up to date! Plus, outside of the primary server which is pretty robust, there is a ton of overall redundancy. Most of the primary server is redundant in-and-of-it-self so it pretty much gives us triple redundancy or better for just about every piece of hardware you can think of (PS, RAM, CPU, HDD, Controllers, NICs, etc) besides the motherboard(s) which is only double redundancy (primary/secondary Xen Hosts).

And yes, XenCenter is installed on windows sadly, the rest is all Linux.

I know, this Q's is 8 years old.