route cache filling with wrong entries

I have an issue with private network traffic not being masqueraded in very specific circumstances.

The network is a group of VMware guests using the 10.1.0.0/18 network.

The problematic host is 10.1.4.20 255.255.192.0 and the only gateway it is configured to use is 10.1.63.254. The gateway server 37.59.245.59 should be masquerading all outbound traffic and forwarding it through 37.59.245.62, but for some reason, 10.1.4.20 ends up occasionally having 37.59.245.62 in its routing cache.

ip -s route show cache 199.16.156.40
199.16.156.40 from 10.1.4.20 via 37.59.245.62 dev eth0
    cache  used 149 age 17sec ipid 0x9e49
199.16.156.40 via 37.59.245.62 dev eth0  src 10.1.4.20
    cache  used 119 age 11sec ipid 0x9e49

ip route flush cache 199.16.156.40

ping api.twitter.com
PING api.twitter.com (199.16.156.40) 56(84) bytes of data.
64 bytes from 199.16.156.40: icmp_req=1 ttl=247 time=93.4 ms

ip -s route show cache 199.16.156.40
199.16.156.40 from 10.1.4.20 via 10.1.63.254 dev eth0
    cache  age 3sec
199.16.156.40 via 10.1.63.254 dev eth0  src 10.1.4.20
    cache  used 2 age 2sec

The question is, why am I seeing a public IP address in my routing cache on a private network?

Network information for the app server (without lo) :

ip a

eth0      Link encap:Ethernet  HWaddr 00:50:56:a4:48:20
          inet addr:10.1.4.20  Bcast:10.1.63.255  Mask:255.255.192.0
          inet6 addr: fe80::250:56ff:fea4:4820/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1523222895 errors:0 dropped:407 overruns:0 frame:0
          TX packets:1444207934 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1524116772058 (1.5 TB)  TX bytes:565691877505 (565.6 GB)

Network information for the VPN gateway (without lo too) :

 eth0      Link encap:Ethernet  HWaddr 00:50:56:a4:56:e9
           inet addr:37.59.245.59  Bcast:37.59.245.63  Mask:255.255.255.192
           inet6 addr: fe80::250:56ff:fea4:56e9/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:7030472688 errors:0 dropped:1802 overruns:0 frame:0
           TX packets:6959026084 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:7777330931859 (7.7 TB)  TX bytes:7482143729162 (7.4 TB)

 eth0:0    Link encap:Ethernet  HWaddr 00:50:56:a4:56:e9
           inet addr:10.1.63.254  Bcast:10.1.63.255  Mask:255.255.192.0
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

 eth0:1    Link encap:Ethernet  HWaddr 00:50:56:a4:56:e9
           inet addr:10.1.127.254  Bcast:10.1.127.255  Mask:255.255.192.0
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

 tun0      Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
           inet addr:10.8.1.1  P-t-P:10.8.1.2  Mask:255.255.255.255
           UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
           RX packets:477047415 errors:0 dropped:0 overruns:0 frame:0
           TX packets:833650386 errors:0 dropped:101834 overruns:0 carrier:0
           collisions:0 txqueuelen:100
           RX bytes:89948688258 (89.9 GB)  TX bytes:1050533566879 (1.0 TB)

eth0 leads to the outside world, and tun0 to an openvpn network of VMs on which sits the app server.

ip r for the VPN gateway :

default via 37.59.245.62 dev eth0  metric 100
10.1.0.0/18 dev eth0  proto kernel  scope link  src 10.1.63.254
10.1.64.0/18 dev eth0  proto kernel  scope link  src 10.1.127.254
10.8.1.0/24 via 10.8.1.2 dev tun0
10.8.1.2 dev tun0  proto kernel  scope link  src 10.8.1.1
10.9.0.0/28 via 10.8.1.2 dev tun0
37.59.245.0/26 dev eth0  proto kernel  scope link  src 37.59.245.59

ip r on the app server :

default via 10.1.63.254 dev eth0  metric 100
10.1.0.0/18 dev eth0  proto kernel  scope link  src 10.1.4.20

Firewall rules:

Chain PREROUTING (policy ACCEPT 380M packets, 400G bytes) 
pkts bytes target prot opt in out source destination 

Chain INPUT (policy ACCEPT 127M packets, 9401M bytes) 
pkts bytes target prot opt in out source destination 

Chain OUTPUT (policy ACCEPT 1876K packets, 137M bytes) 
pkts bytes target prot opt in out source destination 

Chain POSTROUTING (policy ACCEPT 223M packets, 389G bytes) 
pkts bytes target prot opt in out source destination 

32M 1921M MASQUERADE all -- * eth0 10.1.0.0/17 0.0.0.0/0

Solution 1:

Unfortunately, most of what you're seeing is due to routing issues between external routers, they obtain and update their routing info dynamically, to help route traffic around problematic areas, but when those routes get changed often (normally due to availability) it is called route flapping. That is getting reflected down to you, normally end users don't see any of this..

You could attempt to disable your route cache, as explained here (note the caveats, it's not something that seems to offer much on the upside), but I think you'd be better off just talking to the network admin(s) locally as it seems it's their routing which is really unstable.

I am of course going with the assumption that it isn't you responsible for network administration.

Solution 2:

Have someone, or you, take a look at the router/L3 device at 10.1.4.20. It looks like it might be receiving bad routes from an upstream peer that are then being withdrawn and then re-advertised.