Timeouts on Cloud SQL and other external services when using NAT + IP Masquerade on GKE
I have to configure a static IP in one of my PODs because a remote service (outside of my cluster) requires trusted IP whitelisting.
I followed the documentation provided by Google:
https://cloud.google.com/nat/docs/overview?hl=es-419
https://cloud.google.com/kubernetes-engine/docs/how-to/ip-masquerade-agent
But when trying to configure egress traffic using Google cloud NAT service in my GKE cluster plus masquerading using the ip-masq-agent
I start getting timeouts and problems when accessing remote services outside of the cluster.
My Cluster is in version 1.19.10-gke.1600
.
I have tried these config files with the following results:
resyncInterval: 60s
Result:
Chain IP-MASQ (2 references)
target prot opt source destination
RETURN all -- anywhere 10.0.0.0/8 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
RETURN all -- anywhere 172.16.0.0/12 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
RETURN all -- anywhere 192.168.0.0/16 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
MASQUERADE all -- anywhere anywhere /* ip-masq-agent: outbound traffic is sub
ject to MASQUERADE (must be last in chain) */
The services keep using the wrong IP.
resyncInterval: 60s
masqLinkLocal: true
Chain IP-MASQ (2 references)
target prot opt source destination
RETURN all -- anywhere 169.254.0.0/16 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
RETURN all -- anywhere 10.0.0.0/8 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
RETURN all -- anywhere 172.16.0.0/12 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
RETURN all -- anywhere 192.168.0.0/16 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
MASQUERADE all -- anywhere anywhere /* ip-masq-agent: outbound traffic is sub
ject to MASQUERADE (must be last in the chain) */
The same effect, my outside services get the wrong IP.
nonMasqueradeCIDRs:
- 0.0.0.0/0
resyncInterval: 60s
masqLinkLocal: true
Chain IP-MASQ (2 references)
target prot opt source destination
RETURN all -- anywhere anywhere /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
MASQUERADE all -- anywhere anywhere /* ip-masq-agent: outbound traffic is sub
ject to MASQUERADE (must be last in the chain) */
It looks this works better because the external services receive the correct IP but I get connection problems and timeouts.
This is my NAT configuration:
NAT mapping
- High availability: Yes
- Source subnets & IP ranges: All subnets' primary and secondary IP ranges
- NAT IP addresses: static-egress-ip XXX.XXX.XXX.XXX
I'm out of ideas, can someone give me any advice?
After the response got here I updated my config file to add the ips following google cloud documentation, the file goes like this:
nonMasqueradeCIDRs:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
- 100.64.0.0/10
- 192.0.0.0/24
- 192.0.2.0/24
- 192.88.99.0/24
- 198.18.0.0/15
- 198.51.100.0/24
- 203.0.113.0/24
- 240.0.0.0/4
resyncInterval: 60s
masqLinkLocal: true
The result of this in the iptables is:
Chain IP-MASQ (2 references)
target prot opt source destination
RETURN all -- anywhere 10.0.0.0/8 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
RETURN all -- anywhere 172.16.0.0/12 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
RETURN all -- anywhere 192.168.0.0/16 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
RETURN all -- anywhere 100.64.0.0/10 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
RETURN all -- anywhere 192.0.0.0/24 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
RETURN all -- anywhere 192.0.2.0/24 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
RETURN all -- anywhere 192.88.99.0/24 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
RETURN all -- anywhere 198.18.0.0/15 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
RETURN all -- anywhere 198.51.100.0/24 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
RETURN all -- anywhere 203.0.113.0/24 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
RETURN all -- anywhere 240.0.0.0/4 /* ip-masq-agent: local traffic is not sub
ject to MASQUERADE */
MASQUERADE all -- anywhere anywhere /* ip-masq-agent: outbound traffic is sub
ject to MASQUERADE (must be last in chain) */
But if I run a curl checkip.amazonaws.com
to see what IP is being used by the node I get a different IP from the one defined in my NAT Cloud configuration and the external services reject request as non trusted from my cluster.
It seems you have set the nonMasqueradeCIDRs:
as 0.0.0.0/0 thereby preventing Masquerading of all the CIDR traffic, so to fix this issue, in the config file update the nonMasqueradeCIDRs: key with the IPs mentioned in Defaut non-masquerade destination paragraph [1] as given below.
nonMasqueradeCIDRs:
- 172.16.0.0/12
- 192.168.0.0/16
- 100.64.0.0/10
- 192.0.0.0/24
- 192.0.2.0/24
- 192.88.99.0/24
- 198.18.0.0/15
- 198.51.100.0/24
- 203.0.113.0/24
- 240.0.0.0/4
- 10.0.0.0/8
Also please note that the IPs referred in the screenshot were not wrong IPs but those are ranges reserved by RFC 1918/link-local i.e., the IPs 10.0.0.0/8, 172.16.0.0/12 192.168.0.0/16 are reserved for RFC 1918 and the IP range 169.254.0.0/16 is reserved for link-local and these are non-masqueradable and hence these IPs are being displayed with the description ‘ip-masq-agent: local traffic is not subject to masquerade’[2].
[1] https://cloud.google.com/kubernetes-engine/docs/how-to/ip-masquerade-agent#default-non-masq-dests
[2] https://kubernetes.io/docs/tasks/administer-cluster/ip-masq-agent/#ip-masquerade-agent-user-guide
Regards, Anbu.