Apple MacOS appears to ignore default route assigned from DHCP

My team has run into the following problem. We’ve discovered an operationally acceptable workaround which I will submit as an answer to this question, but we are hoping for a better solution, or at least a better explanation of what is going on.

We're trying to bring some Apple products into our test lab in order to define configurations for our customer. Examples and tests were performed on a MacBook Air, but confirmed the problem exists on some older model iPhone and iPads. We've put it on one of the lab’s Wireless Ethernet networks, where it receives DHCP settings from a Cisco 5921 ESR (software router) as expected, including IP, netmask, and default route. The MacBook is in the 192.168.1.0/24 network, and in the following test cases, the remote network is 192.168.2.0/24.

Full output of the MacBook’s ifconfig and netstat -nr are at the end of this question.

The problem is that while it can communicate to other devices on its local subnet, any non-local communication that should go to the gateway is failing. All types of network traffic to non-local destinations fail (ping, web, ssh). And wireshark running on the MacBook shows the expected ARP and IP traffic occurs when communicating with local nodes, even to the router itself. But absolutely no traffic is generated when trying to connect to non-local traffic, not even an ARP request. Yet the routing table seems to specify the correct interface and gateway address for the default entry.

Executing route get yields the following, first for local (which works) and non-local (which does not):

MacBook-Air:~ user1$ route get 192.168.1.1
   route to: 192.168.1.1
destination: 192.168.1.1
  interface: en0
      flags: <UP,HOST,DONE,LLINFO,WASCLONED,IFSCOPE,IFREF,ROUTER>
 recvpipe  sendpipe  ssthresh  rtt,msec    rttvar  hopcount      mtu     expire
       0         0         0         0         0         0      1500      1175 


MacBook-Air:~ user1$ route get 192.168.2.80
route: writing to routing socket: not in table

And ping, first for local then non-local

MacBook-Air:~ user1$ ping  192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
64 bytes from 192.168.1.1: icmp_seq=0 ttl=255 time=2.904 ms
64 bytes from 192.168.1.1: icmp_seq=1 ttl=255 time=3.963 ms
^C
--- 192.168.1.1 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 2.904/3.433/3.963/0.529 ms


MacBook-Air:~ user1$ ping  192.168.2.80
PING 192.168.2.80 (192.168.2.80): 56 data bytes
ping: sendto: No route to host
ping: sendto: No route to host
Request timeout for icmp_seq 0
ping: sendto: No route to host
Request timeout for icmp_seq 1
ping: sendto: No route to host
Request timeout for icmp_seq 2
^C
--- 192.168.2.80 ping statistics ---
4 packets transmitted, 0 packets received, 100.0% packet loss

After checking the basics (netmask, default routes, firewalls), we did some deeper digging that led us to this question

In it, the problem was that the default route is tagged with the letter I. On my system, this is also the case, as seen in this partial output to netstat -nr:

Routing tables

Internet:
Destination        Gateway            Flags        Refs      Use   Netif Expire
default            192.168.1.1        UGScI           4        0     en0

According to the question above, the letter I means

I       RTF_IFSCOPE      Route is associated with an interface scope

I can confirm this by running ping with the -b flag, and running route get with the -ifscope flag (noticing this is what led us to searching RTF_IFSCOPE which led to the above stackexchange post). When these flags are used, both commands work as expected. Without their respective flags, both commands fail.

MacBook-Air:~ user1$ route get -ifscope en0 192.168.2.80
   route to: 192.168.2.80
destination: 192.168.2.80
    gateway: 192.168.1.1
  interface: en0
      flags: <UP,GATEWAY,HOST,DONE,WASCLONED,IFSCOPE,IFREF>
 recvpipe  sendpipe  ssthresh  rtt,msec    rttvar  hopcount      mtu     expire
       0         0         0         0         0         0      1500         0 

MacBook-Air:~ user1$ ping -b en0  192.168.2.80
PING 192.168.2.80 (192.168.2.80): 56 data bytes
64 bytes from 192.168.2.80: icmp_seq=0 ttl=61 time=3.206 ms
64 bytes from 192.168.2.80: icmp_seq=1 ttl=61 time=2.713 ms
64 bytes from 192.168.2.80: icmp_seq=2 ttl=61 time=3.246 ms
64 bytes from 192.168.2.80: icmp_seq=3 ttl=61 time=3.001 ms
64 bytes from 192.168.2.80: icmp_seq=4 ttl=61 time=2.444 ms
64 bytes from 192.168.2.80: icmp_seq=5 ttl=61 time=2.426 ms
64 bytes from 192.168.2.80: icmp_seq=6 ttl=61 time=2.205 ms
64 bytes from 192.168.2.80: icmp_seq=7 ttl=61 time=4.701 ms
64 bytes from 192.168.2.80: icmp_seq=8 ttl=61 time=4.964 ms
64 bytes from 192.168.2.80: icmp_seq=9 ttl=61 time=2.674 ms
^C
--- 192.168.2.80 ping statistics ---
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 2.205/3.158/4.964/0.898 ms

So, RTF_IFSCOPE looks like the BSD version of Linux Network Namespaces or Cisco IOS VRFs (I realize it may predate those). But in my case, the default route is set up by DHCP, so it’s not clear how or why this came to be, nor how to change it to the default scope (or, alternatively, make all applications utilize the special scope).

As a test, I manually added a second default route:

sudo route -r add default 192.168.1.1

The resulting route does not get an I flag associated with it:

Routing tables

Internet:
Destination        Gateway            Flags        Refs      Use   Netif Expire
default            192.168.1.1        UGSc           4        0     en0
default            192.168.1.1        UGScI           4        0     en0

This does fix the problem, now allowing all non-local network communication to work. We don’t consider this an operationally acceptable workaround, especially for iPhone and iPads, since as far as we can tell it would require adding apps to the devices in order to add manual routes.

We suspect that something about our network environment is causing this behavior. But we aren’t sure what that is.

The following output of ifconfig and netstat -nr are from the MacBook Air:

MacBook-Air:~ user1$ ifconfig 
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
    options=1203<RXCSUM,TXCSUM,TXSTATUS,SW_TIMESTAMP>
    inet 127.0.0.1 netmask 0xff000000 
    inet6 ::1 prefixlen 128 
    inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
    nd6 options=201<PERFORMNUD,DAD>
gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
stf0: flags=0<> mtu 1280
OHC4: flags=0<> mtu 0
EHC36: flags=0<> mtu 0
OHC6: flags=0<> mtu 0
EHC38: flags=0<> mtu 0
en0: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
    ether 04:0c:ce:cf:be:ac 
    inet 192.168.1.52 netmask 0xffffff00 broadcast 192.168.1.255
    nd6 options=201<PERFORMNUD,DAD>
    media: autoselect
    status: active
p2p0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 2304
    ether 06:0c:ce:cf:be:ac 
    media: autoselect
    status: inactive
utun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 2000
    inet6 fe80::d380:8cb1:83eb:3753%utun0 prefixlen 64 scopeid 0xa 
    nd6 options=201<PERFORMNUD,DAD>


MacBook-Air:~ user1$ netstat -nr 
Routing tables

Internet:
Destination        Gateway            Flags        Refs      Use   Netif Expire
default            192.168.1.1        UGScI           4        0     en0
127                127.0.0.1          UCS             0        0     lo0
127.0.0.1          127.0.0.1          UH              1      130     lo0
169.254            link#8             UCS             0        0     en0
192.168.1          link#8             UCS             3        0     en0
192.168.1.1/32     link#8             UCS             1        0     en0
192.168.1.1        0:1b:ac:1:79:9a    UHLWIir         3       60     en0
192.168.1.52/32    link#8             UCS             0        0     en0
192.168.1.53       2c:be:8:aa:47:98   UHLWI           0        2     en0    472
192.168.1.54       88:1f:a1:7a:6d:5a  UHLWI           0        0     en0   1200
192.168.1.56       2c:be:8:9d:ff:78   UHLWI           0       10     en0    817
224.0.0/4          link#8             UmCS            1        0     en0
224.0.0.251        1:0:5e:0:0:fb      UHmLWI          0        0     en0
255.255.255.255/32 link#8             UCS             0        0     en0

Internet6:
Destination                             Gateway                         Flags         Netif Expire
default                                 fe80::%utun0                    UGcI          utun0
::1                                     ::1                             UHL             lo0
fe80::%lo0/64                           fe80::1%lo0                     UcI             lo0
fe80::1%lo0                             link#1                          UHLI            lo0
fe80::%utun0/64                         fe80::d380:8cb1:83eb:3753%utun0 UcI           utun0
fe80::d380:8cb1:83eb:3753%utun0         link#10                         UHLI            lo0
ff01::%lo0/32                           ::1                             UmCI            lo0
ff01::%utun0/32                         fe80::d380:8cb1:83eb:3753%utun0 UmCI          utun0
ff02::%lo0/32                           ::1                             UmCI            lo0
ff02::%utun0/32                         fe80::d380:8cb1:83eb:3753%utun0 UmCI          utun0

Solution 1:

I think the missing piece in your puzzle is that Apple uses a slightly different way of handling interface priority than other popular operating systems.

For example, if we take a normal home user scenario, Apple products will try to prioritise traffic over the best available Internet connection. If you have an Internet connection over both Ethernet and WiFi, it will use the cabled connection rather than the wireless connection.

The way this prioritisation is done technically is via the RTF_IFSCOPE flag, you've experienced. Essentially the system will mark the lower prioritized route with the RTF_IFSCOPE flag to ensure that it is only used if is specifically requested (i.e. for example if you have a VPN program that uses a specific connection, it will be able to do so).

You can observe the same thing if you connect to a WiFi without an Internet connection or a WiFi with a captive portal where you haven't signed in yet. On an iOS device for example, you'll see that the WiFi symbol at the top changes to a the waves overlaid with an exclamation mark (!). You can still use the interface and communicate with devices on that specific subnet, but it will not be used as a default gateway - unless specifically requested.

So in your case it seems that you have already got a default gateway on the Mac before you connect it to the WiFi. As the DHCP server on the WiFi does not give out an option 6 (i.e. list of DNS servers), the system will give this connection a lower priority.

It seems that you should be able to solve the problem by not having an existing default route before you connect to the WiFi.

A more common solution would be to simply specify the priorisation you want instead of relying on automatic defaults. This is done on the Mac in System Preferences > Network. Click the Gear icon below the list on the left, and choose "Set Service Order". Now drag the Wi-Fi interface above the other default route interface you have (for example Ethernet). This removes the REF_IFSCOPE flag from your WiFi. You can also do this via command line, or from a central management server, ofcourse - but the GUI route is the easiest way to quickly test this.

Apart from that, I would say that your setup sounds relatively uncommon. I don't agree with your choice in specifically foregoing DNS servers simply to avoid reconfiguring web servers from doing reverse lookups on IP addresses. The reverse lookups will still be done, but they'll just be handled locally instead.

It seems it would be a better choice to have DNS servers and configure web servers reasonably instead. You can configure a DNS server to statically reply to all reverse lookups instead of looking anything up on the Internet or otherwise spending time providing names.

Solution 2:

We discovered this reference when searching for more information on what causes the Apple products to behave this way.

For reasons that are not clear to me, when an Apple device using Wireless Ethernet does not receive a DNS server setting via DHCP, it places the routing information it receives into a network scope that is different from the default scope, thus preventing any non-local communication. When the DHCP offer contains a DNS Server option, this problem does not occur, and when the network is wired Ethernet, the problem does not occur regardless of whether there is a DNS Server setting in the DHCP offer.

Our network environment is standalone, and DNS name resolution is not used. So our DHCP servers do not provide DHCP option 6, aka “dns-server”. We were able to solve our problem by providing a bogus DNS server IP address via DHCP, and all the effected Apple devices are now communicating with non-local addresses as expected, and the “I” flag is now absent from the default route entry.

(We chose not to have DNS in this environment because its not required for our application (all configurations are autogenerated from a common planning tool, so there’s no inherent benefit to using Hostnames), and we have had experience with some software such as web servers that, in the course of logging client requests, would try to reverse-resolve IPs back to hostnames. They would become bogged down due to high rates of DNS queries to non-existant DNS servers, each of which would take a second or two to time out. Since there was no negative response, the log records would cause at least one query that would have to time out. We think it’s safer to remove DNS settings than to identify every possible piece of software that might act this way.)

So right now we have reconfigured the DHCP pool are to provide as the DNS Server the IP address for a system that is on the network but not running a DNS service. It is providing ICMP port unreachable messages, which so far seems to be avoiding lag due to time out, but we remain concerned that our testing doesn’t uncover every possible error case that this configuration change may cause. So if we do not find a better solution to this Apple problem, we may have to actually run a DNS service that contains no data, just to give negative responses and avoid the possibility of DNS timeouts.

We may also attempt to segregate the Apple products so they receive DHCP settings from a different pool, so only they receive DNS server options. I'm believe there's a way to identify Apple products based on OUI, although I've never tried that before.

I’m hoping someone has a better solution that doesn’t require any kind of change to the network infrastructure.