How to find the bottleneck while transferring huge files between 2 hosts

We frequently need to transfer huge files (upwards of 50 GB) between two hosts, and the transfer rate never seems to reach the expected throughput for the network. There are several points which could be the bottleneck, but each of their theorical upper limit are way over the actual transfer rate. Here's a typical setup :

Laptop --> 802.11n --> AP --> CAT 6 cable --> 10/100 Mbits router --> Desktop

In this connection, the bottleneck is clearly the router, which would limit the transfer rate at 100 Mbits/sec. Even then, I rarely see a transfer rate (with scp) exceeding 9.5 MB/s, which represents 76 Mbits/sec, or only 76% of theorical maximum limit.

Can there really be a 24% overhead at the access point, or is there something else limiting the speed? It could be disk I/O (although SATA is rated at 1.5 Gbps), or anything on the motherboard between the disk and the NIC (how can I measure that?).

Is there a way to know for sure(*) where the bottleneck is? If I can't get more than 76 Mbps from a 100 Mbps router, will upgrading the network to gigabit increase throughput or will I still get 76 Mbps because the bottleneck is elsewhere?

(*) or at least in a way convincing enough that a boss would agree to invest to upgrade that one part of the network


Solution 1:

your problem is that you are testing too many things at once:

  • disk read speed
  • SSH encryption
  • wireless
  • SSH decryption
  • disk write speed

Since you mentioned SSH I am going to assume this is a unix system...

You can rule out any problems with disk read speed with a simple

dd if=yourfile of=/dev/null #or
pv yourfile > /dev/null

on the receiving end you can do a simple disk write test

dd if=/dev/zero of=testfile bs=1M count=2000 # or
dd if=/dev/zero bs=1M count=2000 | pv > testfile

dd is not really a "benchmark" but since scp uses sequential IO, it is close enough

you can also test SSH by doing something like

dd if=/dev/zero bs=1M count=100 | ssh server dd of=/dev/null # or
dd if=/dev/zero bs=1M count=100 | pv | ssh server dd of=/dev/null

finally, to rule out SSH being the bottleneck, you can use nc to test the network performance

server$ nc -l 1234 > /dev/null
client$ dd if=/dev/zero bs=1M count=100 | pv | nc server 1234 # or
client$ dd if=/dev/zero bs=1M count=100 | nc server 1234

if you really want to properly test the network, install and use something like iperf, but nc is a good start.

I'd start with the nc test as that will rule out the most things. You should also definitely run the tests while not using wireless. 802.11n can easily max out a 100mbit port, but only if you have it properly setup.

(Ubuntu >= 12.04 defaults to netcat-openbsd. nc -l -p 1234 > /dev/null may be what you want if you're using netcat-traditional).

Solution 2:

Think of it this way;

You have a slow (laptop disks are slow) SATA disk running one file system or another which then turns into an IP-based file sharing protocol such as SMB. This then gets turned into wifi format which then hits an AP, which then goes over wired ethernet (which does require some reformating) to a pretty slow switch.router then onto a probably-quite-slow-desktop, gets broken back out your file system format of choice and finally onto the disk. All of this happens for every packet, most if not all of which require an acknowledge packet sent back before it sends the next packet.

I'm surprised you're seeing as much speed at you are!

Here's a clue, wire the laptop to the 100Mbps switch/router when you need to transfer the files - seriously, it'll be much, much quicker. Also consider faster disks at each end and make sure you're using an efficient file transfer mechanism too.

Hope this helps.

Solution 3:

As Chopper3 alludes to, also try using rsync-over-ssh for files of that size as there's a goodly chance that something could go wrong; nothing sucks more than to get 45GB through a 50GB transfer and have it fail. It's possible it may also decrease your overhead but I've not personally tested it with filesizes this large.

When transferring thousands of small files rsync can also decrease the overhead substantially -- a 75K file/1500 dir/5.6K average filesize test I ran once took 12min with FTP and SFTP, 10min with SCP, but only 1min50sec with rsync-over-ssh due to the decreased setup/teardown overhead. Rsync w/o SSH was only 20sec faster at 1min33sec.