Possibility of WAN Optimization for SSH traffic

While I do understand that SSH by itself is a very low bandwidth utilizing protocol, it is sometimes choking up bandwidth during peak hours in our office environment. I want to know if at all it is possible to reduce/optimize SSH (rsync) traffic over WAN ?

I realized that riverbed cannot do it. What kind of other designs (proxies) can I possible think of, ignoring security issues like MiTM, DPI, etc. Can TrafficSqueezer, WANProxy or OpenNOP be of any use here ?

Also, please suggest if there are any other ideas to backup data (if there is any) other than rsync. Is it even possible that I think of decrypting SSH with a proxy server before reaching Riverbed and transporting it over WAN to the other end.

Sender (RSYNC) Server --> Proxy (Decrypt SSH) --> Sending Riverbed --> Receiving Riverbed --> Receiver Server

Current Topology:

100's of Users (rsync) --> Source Riverbed --> (passed through traffic/unoptimized)  -->  Destination Riverbed --> Remote Machine

I was initially going to be a naysayer to the idea of trying to "WAN optimize" rsync traffic, but the more I thought about it the more I decided that it was possible.

A TCP-based dictionary compressor (which I believe the Riverbed Steelhead appliances can do) would probably benefit an unencrypted rsync stream. Presumably the Riverbed devices can do encryption on the "optimized" traffic, so running rsync unencrypted shouldn't be compromising the integrity or confidentiality of traffic to the WAN. (Between the source server and the Riverbed device may be a different story.)

You don't have to run rsync over SSH. It will run perfectly fine over TCP or any other reliable stream transport.

It seems like a good WAN acceleration architecture would run somewhat counter to a good security architecture, since encrypted traffic would be high-entropy and low redundancy and not at all conducive to compression. I think these are concerns you'd have to balance. I haven't kept up with Riverbed in a number of years, but this actually seems like a place where man-in-the-middle decryption of encrypted protocols might make good sense (albeit that turns the WAN accelerator into a huge target for attacks).

Edit:

I'm coming back to this answer a few hours later because, frankly, it's keeping me up at night.

I want to clarify some assumptions that I'm making. I am assuming:

  • You're working with WAN links that are significantly slower than a LAN-- 100Mbps or less.

  • You're performing backups across these WAN links that you'd like to speed-up, in terms of wallclock time.

  • The servers hosting the source and destination files have sufficient CPU and network connectivity to completely saturate the WAN link and the WAN really is the bottleneck.

  • You're using operating systems with TCP implementations that can reasonably scale the receive window to accommodate the bandwith delay product of your WAN link.

If the servers can't saturate the network link then your bottleneck is somewhere else. Basically, I'm assuming that you've got a small pipe that your servers can saturate when running backups. If you're bottlenecking on CPU or I/O in the servers then no amount of network-related "magic" is going to help you.

Speaking rather bluntly, I feel a bit silly speaking positively about WAN acceleration appliances. I've been less than impressed with them in the past (mainly from an ROI and cost perspective) and wary of them after hearing numerous horror stories about application and operating system "strangeness" that would disappear when the WAN accelerators were disabled. I've been suspicious of them as a "technology" and generally have felt like they are a symptom of using poorly-engineered protocols, poor server-placement decisions, or poor network architecture.

I've spent the better part of two hours reading-up on dictionary-based protocol compressors and playing around with rsync. I think that, depending on the amount of redundancy in the changes you're synchronizing with rsync, there's actually a potential for seeing some minor performance improvement using a dictionary-based WAN accelerator. It's going to depend a lot on what your data looks like.

I don't have any numbers using an actual WAN accelerator to back that up. Nor do I have any personal experience with WAN accelerators in production use, let alone "accelerating" the rsync protocol. I certainly wouldn't go out and buy something based on what I'm saying here, but if you already have something in place I'd consider running some unencrypted rsync traffic through it to see what it does.


Definitely see the answers at: Why is my rsync so slow?

I've relied on rsync-based solutions for replication and backup for many years. I've never needed to use any form of WAN acceleration (using an appliance).

Over the years, my rsync approaches have evolved; from basic compression to using "-e ssh -c arcfour" as a lighter-weight cipher, to using HPN-SSH to be able to control TCP windows and disable encryption over longer distance links, and most recently, wrapping rsync with UDR/UDT to have UDP-based rsync transfers. I should add that UDT is the core of some WAN-acceleration productions on the market.

These are pretty esoteric solutions, but I'd really start with understanding what you're doing today. Let's see your current rsync command strings and try to see if they can be optimized.

  • What are you backing up?
  • How far is the target from the source?
  • What are your bandwidth capabilities/limitations?

Edit:

You're talking about binary data being transferred over a long distance with a bandwidth capability of 400Mbps. And this appears to be multiple streams of bidirectional traffic triggered at random times of the day.

If bandwidth saturation is your concern, couldn't you simply rate-limit your rsync transfers with:

--bwlimit=KBPS          limit I/O bandwidth; KBytes per second

Or if your network devices are capable, this could become a traffic-shaping or quality-of-service exercise. But in the end, it seems like it's a people/policy issue.

Edit:

My suggestion for UDR sends rsync unencrypted by default over UDP ports 9000-9100. This doesn't require much of a change in the command line, versus running rsync in daemon mode. That is possibly within the realm of something that can be addressed by your Riverbed units. It wasn't clear from your initial question as to the scope, number of users or if you even had the WAN accelerators in place.