I'm attempting to move Cluster Core Resources from 1 node to another in a 4 node WSFC (these are all VMs running on Compute Engine in Google Cloud, Windows Server 2012 R2, each in a different subnet). I'm running

Move-ClusterGroup -Name "Cluster Group" -Node mynode

And getting the error:

Move-ClusterGroup : An error occurred while moving the clustered role 'Cluster Group'. The operation failed because either the specified cluster node is not the owner of the group, or the node is not a possible owner of the group

I have moved the Available Storage Cluster Group in this manner successfully, it's just this operation that's failing. The cluster hosts a SQL Server Availability Group which is online and working as expected, and has been failed over previously multiple times.

The first time I tried to do this I got an error in the Cluster Events saying:

Cluster resource 'Cluster IP Address [ip of current host]' of type 'IP Address' in clustered role 'Cluster Group' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

So I checked the ip resources for the cluster core resources and saw each had a possible owner of all 4 nodes, despite being in the wrong subnets. It looks like it was trying to bring the current ip up on the target host, which of course didn't work. I removed the 3 ips in the "wrong" subnet from each of the cluster resources and since then have been getting the first error message I've included here.

I ranGet-ClusterGroup -Name "Cluster Group" | Get-ClusterOwnerNode which initially returned {} for OwnerNodes. I've since tried adding the current owner + the node that I'm trying to move to using Set-ClusterOwnerNode and I can now see the two I'd expect as possible owners, but it's made no difference to the move.

I did wonder if this could be a DNS issue. I assume it's correct to just have the one entry in DNS for the Cluster with the current online IP, so that should be getting updated during a move (as opposed to having multiple A records all with different IPs). I tried updating the security on this, just giving the 2 nodes full control for a bit, as well as checking the permissions on the cluster object (which already had permissions). I haven't done any more with AD/DNS because I don't want to screw things up.

I've run the cluster validation and it doesn't give anything I would consider a reason for this. There are warnings against: the different IP cluster core resources because they can no longer be owned by each node, HostRecordTTL and RegisterAllIP settings, unsigned drivers, some differences in software on the 2 nodes (just updates that have been applied to the one I'm trying to move to).


Well I seem to have fixed it:

I added all nodes back to possible owner of all IPs based on the error from the Move-ClusterGroup cmdlet. Trying to move again, then, I got the initial error of trying to bring the subnet1 IP up on a node in a different subnet. This time I repeated the failover and surpassed the "maximum restarts in the specified period", so instead of just coming back online on the subnet1 node the Cluster Group went into an OFFLINE mode. Once this happened I brought the subnet2 IP online manaully through the GUI. This worked and brought the Cluster Group up on the intended node.

Once I'd done this I could then use Move-ClusterGroup between these two nodes as I would expect. Moving to a node in a third subnet still failed, but doing the same trick of getting the Cluster Group offline and manually bring in the cluster IP online then worked for that node.

I don't really know what happened here, I can only conclude it's some sort of metadata/registry corruption that's been fixed when the IP was brought online manually. Perhaps someone else can enlighten me.