Slow Transfers over Distance
Making sure the TCP window is opening up wide enough to cover the Bandwidth Delay Product would have been my first guess too. Assuming that is configured properly (and supported by both ends) I would next examine a packet trace to make sure that the window really is opening up and that one of the hops in the path isn't stripping the window scaling. If that is all good, and you are certain you are not banging into a bandwidth constrained hop in the path, the likely cause to your problems is random packet drops. This hypothesis is supported by the indication of the duplicated ACKs you mentioned. (Duplicated ACKs are generally a direct result of lost data). Also note that with a large bandwidth delay product and therefore a large open sliding window, even low levels of random packet drops can significantly hamper the total throughput of the connection.
Side Note: For bulk data transfers over TCP and over a multi-hop WAN connection, there should be no need or reason to disable Nagle. In fact, that exact scenario is why Nagle exists. Generally, Nagle only needs to be disabled for interactive connections where sub-MTU sized datagrams need to be forced out without any delay. ie: For bulk transfers, you want as much data in each packet as possible.
did you tune your packet reordering threshould? Check it on tcp_reordering at /proc on Linux. On long pipes, it is common a multipath effect to cause false packet loss dectection, retransmission and the drops in speed you sent in your chart. It causes a lot of duplicate Acks too, so it worth to be checked. Do not forget you must tune both sides of the pipe to have good resuls and to use at least cubic. An interactive protocol, like ftp can harm any tcp for long pipe optimization you can do. Unless you are only transfering large files.