Why does WinDRBD become Diskless/StandAlone (both node)

I have a question.

Currently, this OS is Windows Server 2019. The volume configuration is Raid-5. The two servers are connected by a heartbeat network Both nodes were mirrored using WinDRBD. Both nodes have the same configuration. I left unformatted G: and set D: to be visible to the primary node.

my resource are below

include "global_common.conf";

resource "foo" {
    protocol    A;

    net {
        use-rle no;
    }
    on node1 {
        address     XXX.XXX.XXX.XXX:7600;
        node-id 1;
        volume 1 {
            disk        "G:";
            device      minor 1;
            meta-disk   internal;
        }
    }
    on node2 {
        address     XXX.XXX.XXX.XXX:7600;
        node-id 2;
        volume 1 {
            disk            "G:";
            device      minor 1;
            meta-disk   internal;
        }
    }
}

Both nodes worked normally. The tests were completed by switching roles. ( primary → secondary / secondary → primary )

However, the problem occurred after booting.

After booting, the status appears as below. (both nodes)

foo role:Secondary
  volume:1 disk:Diskless
  node2 connection:StandAlone

I thought and searched a lot, but couldn't find an answer.

There were a few things I was suspicious about.

I wonder if it's because I tried before the G: letter was assigned to the drive. If my thinking is correct, is there a workaround?

If it is now my opinion, but the above problem continues to occur, what is the cause?

Once it was solved in the following way. But I want to find the cause and fix it accurately.

drbdadm down foo
drbdadm up foo

Thanks in advance for your help.


Solution 1:

WinDRBD loses heartbeat which leads to the issue you’re seeing. Cut off the wires for the h/b and you’ll easily reproduce the issue. The writing is on the wall: don’t use WinDRBD in production at least yet. It’s fragile.

Solution 2:

@BaronSamedi1958 actually has a point: WinDRBD isn't a native Windows solution, it was blatantly ported to Windows from Linux with a help of the wrappers emulating Linux kernel APIs.

https://linbit.com/windrbd-replicated-disk-drives-for-windows/

"Technically, the WinDRBD Windows driver consists of a thin Linux compatibility layer that emulates the Linux kernel APIs used by the DRBD driver for the Windows platform. Inside this layer, the original DRBD engine (with a few compiler-specific patches) is working."

As a result, WinDRBD has no much clue about what it's doing and what's going on. On initialization it tries to establish a network connection to the partner node and fails because... within Windows kernel ecosystem storage stack starts BEFORE network stack is fully up and running! WinDRBD can't ping the partner node so it assumes something goes wrong and it doesn't bring the storage pool up to preserve it and avoid data corruption.

There are few ways to solve this issue:

  1. Put a start dependency for windrbd driver on the NDIS miniport controlling the NICs used by WinDRBD.

https://docs.microsoft.com/en-us/windows-hardware/drivers/install/specifying-driver-load-order

It's a flaky way actually as changing network adapter will ruin the configuration. + there's a whole bunch of the drivers above NDIS miniport like NDIS filters (firewall?), NDIS protocol drivers (TCP/IP?) etc you don't know much about and you will have to traverse back with a special tools you might be not familiar with.

https://docs.microsoft.com/en-us/windows-hardware/drivers/network/ndis-driver-stack

  1. Avoid automatic start for WinDRBD and use some logon-dependent script to start it pseudo-automatically. During logon process all the kernel components are ready and in the worst case you can log your unsuccessful driver start issues into some log file, analyze it, script re-tries and so on.

This can be done of course, but it requires some tinkering around with PowerShell. Some good starting points you can get from this discussion thread below.

https://stackoverflow.com/questions/27599287/powershell-disable-and-enable-a-driver

  1. Use something designed for Windows from scratch, like f.e. StarWind vSAN Free or Microsoft own SDS builtin with say AzS HCI.

https://www.starwindsoftware.com/starwind-virtual-san-free

https://docs.microsoft.com/en-us/azure-stack/hci/overview