Methodologies for performance-testing a WAN link

We have a pair of new diversely-routed 1Gbps Ethernet links between locations about 200 miles apart. The 'client' is a new reasonably-powerful machine (HP DL380 G6, dual E56xx Xeons, 48GB DDR3, R1 pair of 300GB 10krpm SAS disks, W2K8R2-x64) and the 'server' is a decent enough machine too (HP BL460c G6, dual E55xx Xeons, 72GB, R1 pair of 146GB 10krpm SAS disks, dual-port Emulex 4Gbps FC HBA linked to dual Cisco MDS9509s then onto dedicated HP EVA 8400 with 128 x 450GB 15krpm FC disks, RHEL 5.3-x64).

Using SFTP from the client we're only seeing about 40Kbps of throughput using large (>2GB) files. We've performed server to 'other local server' tests and see around 500Mbps through the local switches (Cat 6509s), we're going to do the same on the client side but that's a day or so away.

What other testing methods would you use to prove to the link providers that the problem is theirs?


Tuning an Elephant:
This could require tuning, probably not the issue here as pQd says though. This sort of link is known "Long, Fat Pipe" or elephant (see RFC 1072). Because this is a fat gigabit pipe going over a distance (distance is really time/latency in this case), the tcp receive window needs to be large (See TCP/IP Illustrated Volume 1, TCP Extensions Section for pictures).

To figure out what the receiving window needs to be, you calculate the bandwidth delay product:

Bandwidth * Delay = Product

If there is 10MS latency, this calculator estimates you want a receive window of about 1.2 MBytes. We can do the calculation ourselves with the above formula:

echo $(( (1000000.00/.01)/8  )) 
12500000

So you might want to run a packet dump to see if tcp window scaling (the TCP extension that allows for larger windows) is happening right to tune this once you figure out whatever the large problem is.

Window Bound:
If this is the problem, that you are window size bound with no scaling, I would expect the following results if no Window scaling is in place and there is about 200ms latency regardless of the pipe size:

Throughput = Recieve Window/Round Trip Time

So:

echo $(( 65536/.2 ))
327680 #Bytes/second

In order to get the results you are seeing you would just need to solve for latency, which would be:

RTT = RWIN/Throughput

So (For 40 kBytes/s):

echo $(( 65536.0/40000.0 )) 
1.63 #Seconds of Latency

(Please check my Math, and these of course don't include all the protocol/header overhead)


40kbps is very low [up to the point that i would suspect faulty media converters/duplex mismatch [but you have gigabit so there is no place for half duplex!] etc]. there must be packet losses or very high jitter involved.

iperf is first tool that comes to my mind to measure available throughput. run at one side

iperf -s 

and on the other:

iperf -t 60 -c 10.11.12.13

then you can swap client/server roles, use -d for duplex etc. run mtr between both machines before start of the test and see what latency / packet losses you have on unused link, and how do they change during the data transfer.

you would like to see: very small jitter and no packet losses until link is saturated at 90-something percent of its capacity.

iperf for *nix and win, read here and here about it.

mtr for *nix and win.


tracepath can show you routing problems between the two sites.

iperf, ttcp and bwping can give you useful information.

do you know how this 1GB link is being provisioned? are you bridging or routing over this link? What is your SLA for the link? you could being shaped by your link provider?

if your only getting 40kbs , then there is a serious problem, are you sure that it's not a 1MB's link rather than 1GB/s link. You'll probably find that the speed of the link is not what you think it it :-)