Why does removing an unused IP address from an interface kill connections unrelated to that address
Yesterday I did a quick reinstall of a (physical) server in the datacenter, and since I was short on time and with no easy access to our database I just assigned it an IP that I knew was available and would allow me later access to assign the correct address and continue provisioning from a warmer place.
Today I logged into the server (at 172.16.130.10/22) and did the following:
ip addr add 172.16.128.67/22 dev eth0
From a terminal on my local workstation I checked it responded to ping on the new address and logged in through it:
$ ping 172.16.128.67
PING 172.16.128.67 (172.16.128.67) 56(84) bytes of data.
64 bytes from 172.16.128.67: icmp_req=2 ttl=62 time=3.61 ms
64 bytes from 172.16.128.67: icmp_req=3 ttl=62 time=4.87 ms
^C
$ ssh 172.16.128.67
So far so good, I was connected through the new IP address and the old one was no longer necessary. I went ahead and removed it:
ip addr del 172.16.130.10/22 dev eth0
But as soon as I hit Enter my SSH session froze and I was no longer able to connect. I had to request an on-site operator to reboot the server for me.
Where did I go wrong? Why would removing that address kill my connection?
In linux, IP addresses have a notion of 'primary' and 'secondary' addresses. The primary is typically the first address you add to the system. Removing the primary address has the implicit operation of flushing the entire list of secondary addresses also.
You can avoid this behaviour by setting the sysctl net.ipv4.conf.all.promote_secondaries
to 1 like so:
sysctl -w net.ipv4.conf.all.promote_secondaries=1
This changes the behaviour such that when a primary IP is removed, it will not flush the remaining addresses and instead will promote a new IP address as the primary.