OS X network stack ignores IGMP membership queries

We have a remote site where the Macs are not responding to IGMP membership queries, but the Windows boxes do respond. Consequently, after about 10 minutes, the IGMP-aware network switch cuts off the multicast stream to the Macs.

Here is a screen shot of Wireshark showing the problem:

Wireshark IGMP packet capture

The first packet is the app requesting that the network begin allowing the IGMP packets from 239.255.20.1 through to the Mac. Then you see, about every 125 seconds thereafter, the network switch configured as the IGMP querier (10.1.254.254) ask if we're still interested in that stream. Notice the conspicuous lack of response.

Here is what happens here on the local network, for comparison:

good IGMP packet capture

Here, about every 95 seconds the IGMP querier (172.20.0.2) asks if we still want that stream, and the Mac in question (172.20.0.144) says, "Yes, keep sending it."

The firewall is turned off on the problem Macs in the GUI, and I've verified it at the command line:

$ /usr/libexec/ApplicationFirewall/socketfilterfw --getglobalstate
Firewall is disabled. (State = 0)
$ /usr/libexec/ApplicationFirewall/socketfilterfw --getblockall
Block all DISABLED! 
$ /usr/libexec/ApplicationFirewall/socketfilterfw --getstealthmode
Stealth mode disabled 
$ /usr/libexec/ApplicationFirewall/socketfilterfw --getappblocked /Applications/mumblemutter.app/...
The application is not part of the firewall 

The app doesn't matter, since the stack handles IGMP queries after the group has been joined.

The problem Macs are running 10.11.5, but I cannot believe the problem would be fixed by upgrading to the absolute latest, since that would mean that a BSD-based OS is fixing serious bugs in its network stack in 2016. Possible, but extremely low probability.


Solution 1:

The problem is shown in the first packet capture, where you will notice that the IGMP group join packet is an IGMPv2 packet, but the responses from the IGMP querier are all v3.

This may seem fine, since macOS has supported IGMPv3 for a very long time now, but if you dig into the IGMP implementation in the Darwin open source kernel, down in igmp_input_v3_query(), you find this enlightening bit of code:

/*
 * Discard the v3 query if we're in Compatibility Mode.
 * The RFC is not obviously worded that hosts need to stay in
 * compatibility mode until the Old Version Querier Present
 * timer expires.
 */
if (igi->igi_version != IGMP_VERSION_3) {
    ...etc...

What this means is that macOS is obeying the IGMPv3 spec and putting any network interface where it has seen IGMPv2 packets into "compatibility mode," meaning that it will neither acknowledge IGMPv3 packets nor speak IGMPv3 on that network interface. In terms of the code above, it marks the interface as igi_version = 2, so we'll hit this test and ignore the v3 group membership query on the theory that it is unsafe to speak v3 on this network, lest the v2 devices be unable to understand what's going on.

I see three workable remedies:

  1. Get those in charge of the network to reconfigure their switches to send back IGMPv2 queries to clients that asked for an IGMPv2 group join.

  2. Turn off IGMPv3 support in the IGMP-aware network switches entirely, so that they will only send out IGMPv2 membership queries.

  3. Monitor the network for IGMPv2 packets, find their source, and fix, upgrade, or remove them. If the network can't be made to speak v3 thru-and-thru, go with #1 or #2.

This is not something you can fix with an application code change. The IP_ADD_MEMBERSHIP option to setsockopt() doesn't include a version number, so the app is not in a position to demand IGMPv3. That decision is up to the stack.

While it is possible that there is an OS setting that would affect this, that could only be the case if the macOS IGMP implementation differs from what we see in the igmp.c linked above.

If you sniff the network for IGMP on a Windows box, you will see that it responds to IGMPv3 membership queries with v3 responses, despite the presence of v2 on the network. It is therefore in violation of the RFC; while some network admins who will say, "Well, it works, doesn't it?," the proper response must be that because you cannot force macOS to also disregard the RFC, the solution remains to fix the network.

Solution 2:

This basically breaks IGMPv3, permanently, on most networks. Why? Because some very common implementations of mDNS (like the iPhone) frequently and gratuitously leave and rejoin 224.0.0.251 (the mDNS group) faster than the IGMP version timeout. If even a single device issues an IGMPv2 leave for 224.0.0.251, the querier will issue a group-specific ICMPv2 query for 224.0.0.251 to see if anybody is left. Everybody else in that group (which is most hosts these days) will see it and flip the interface to v2, keeping IGMPv3 queries from ever being answered by MacOS.

All this even though the IGMP spec says that hosts should not explicitly leave and join the 224.0.0.0/24 block, and snooping switches should always forward it. And good luck getting Apple to fix this gratuitous leave/join problem on its zillions of devices.

You can sort of work around it by rebooting all your iOS devices after switching the querier to IGMPv3, and waiting. But if any device should ever issue an IGMPv2 message, MacOS multicast breaks again.

This is a clear bug that cannot be blamed on anything but MacOS ignoring IGMPv3 messages when it thinks an interface is v2.