I am planning to setup multiple Virtual Private Clouds (VPC) on AWS. These VPCs will be located in different geos. Each VPC will have public as well as private instances. I need to incorporate an efficient routing strategy for all the instances running on all VPCs. The cross VPC communication will happen over Ipsec tunnels. Any suggestion regarding the following concerns will be very useful to me.

  1. Should I create a hub-n-spoke structure where each VPC will have an Ipsec tunnel to the hub VPC, or should I create an Ipsec tunnel for each pair of VPC and form a clique?
  2. I have to keep an instance in each VPC which will act as Ipsec gateway and there's a risk that this instance becomes bottleneck or worse a single point of failure. Are there architecture options where I can avoid that?
  3. What IP addressing scheme I should follow so that in future I can move an instance from public subnet to private subnet and vice versa without affecting the overall routing?

Please also provide links to relevant docs / case studies that you think might help me in this scenario.

Thanks.


First, you can't move an instance between subnets. Once a primary ENI has been assigned to an instance it can't be detached, so it will remain in that subnet. The best you can do is launch a new instance in a new subnet, stop it and then move the primary EBS volume from the old instance to the new instance. But of course, this will give you a new IP address.

Second, you should create a full mesh of VPN connections between all availability zones in all regions where you are creating VPCs. Thus if you have two AZs in region alfa and two AZs in region bravo you will have 4 VPN tunnels. For what it's worth these instances can also act as the outbound NAT instances for private subnets in the availability zone. There is some documentation on this configuration.

Finally, the way to avoid the single-point of failure for the VPN/NAT instances is to use autoscaling and scripted provisioning of the instances. This is so that if the instance fails, it will be immediately replaced by a new instance. The trick is that all VPN/NAT instances will need a ENI that will be re-used each time the instance fails and is replaced. This presentation from re:Invent 2013 has an overview of the process: video and slides.