Amazon Web Service (AWS) EC2 instances: Unstable network bandwidths with long-distance connections

I'm trying to deploy a data processing system over a wide area covering multiple regions of AWS. Before doing this, I've been profiling the network connections over the variety of distances, but I've been experiencing frequent sudden drops and fluctuations in the bandwidth over long-distance networks. I'm wondering what would be the root cause of the issue?

The cluster is set up on five regions: Oregon, Ohio, central Canada, Ireland, and Osaka. Below is a screenshot of the different bandwidths, and we're seeing a lot of fluctuations in between each of the connections.

Any insights would be welcome!

Screenshot


I suspect your traffic is going across the internet, no guarantees there. Ideally you want to use the AWS Backbone network which will be more consistent. Two ways to do this are with AWS Transit Gateway and VPC peering - there are probably other ways. Read this blog post for Transit Gateway.

At re:Invent 2019 we announced that AWS Transit Gateway now supports inter-Region peering. This means that all the Amazon VPCs, Site-to-Site VPN connections, and Direct Connect gateways attached to an AWS Transit Gateway hosted in one AWS Region can exchange traffic with resources deployed in other AWS Regions. The inter-Region traffic is encrypted, traverses the AWS global network, and is not exposed to the public internet – thus reducing the attack surface.

VPC Peering is another way to do this. That traffic goes across the AWS backbone as well. See this blog post.

Inter-Region VPC Peering provides a simple and cost-effective way to share resources between regions or replicate data for geographic redundancy. Built on the same horizontally scaled, redundant, and highly available technology that powers VPC today, Inter-Region VPC Peering encrypts inter-region traffic with no single point of failure or bandwidth bottleneck. Traffic using Inter-Region VPC Peering always stays on the global AWS backbone and never traverses the public internet, thereby reducing threat vectors, such as common exploits and DDoS attacks.