How to duplicate TCP traffic to one or multiple remote servers for benchmarking purposes?

Infrastructure: Servers in Datacenter, OS - Debian Squeeze, Webserver - Apache 2.2.16

Situation:

The live server is in use by our cusotmers every day, which makes it impossible to test adjustments and improvements. Therefore we would like to duplicate the inbound HTTP traffic on the live server to one or multiple remote servers in realtime. The traffic has to be passed to the local Webserver (in this case Apache) AND to the remote server(s). Thereby we can adjust configurations and use different/updated code on the remote server(s) for benchmarking and comparison with the current live-server. Currently the webserver is listening to approx. 60 additional ports besides 80 and 443, because of the client structure.

Question: How can this duplication to one or multiple remote servers be implemented?

We have already tried:

agnoster duplicator - this would require one open session per port which is not applicable. (https://github.com/agnoster/duplicator)
kklis proxy - does only forward traffic to remote server, but does not pass it to the lcoal webserver. (https://github.com/kklis/proxy)
iptables - DNAT does only forward the traffic, but does not pass it to the local webserver
iptables - TEE does only duplicate to servers in the local network -> the servers are not located in the same network due to the structure of the datacenter
suggested alternatives provided for the question "duplicate tcp traffic with a proxy" at stackoverflow (https://stackoverflow.com/questions/7247668/duplicate-tcp-traffic-with-a-proxy) were unsuccessful. As mentioned, TEE does not work with remote servers outside the local network. teeproxy is no longer available (https://github.com/chrislusf/tee-proxy) and we could not find it somewhere else.
We have added a second IP address (which is in the same network) and assigned it to eth0:0 (primary IP address is assigned to eth0). No success with combining this new IP or virtual interface eth0:0 with iptables TEE function or routes.
suggested alternatives provided for the question "duplicate incoming tcp traffic on debian squeeze" (Duplicate incoming TCP traffic on Debian Squeeze) were unsuccessful. The cat|nc sessions (cat /tmp/prodpipe | nc 127.0.0.1 12345 and cat /tmp/testpipe | nc 127.0.0.1 23456) are interrupted after every request/connect by a client without any notice or log. Keepalive did not change this situation. TCP Packages were not transported to remote system.
Additional tries with with different options of socat (HowTo: http://www.cyberciti.biz/faq/linux-unix-tcp-port-forwarding/ , https://stackoverflow.com/questions/9024227/duplicate-input-unix-stream-to-multiple-tcp-clients-using-socat) and similar tools were unsuccessful, because the provided TEE function will write to FS only.
Of course, googling and searching for this "problem" or setup was unsuccessful as well.

We are running out of options here.

Is there a method to disable the enforcement of "server in local network" of the TEE function when using IPTABLES?

Can our goal be achieved by different usage of IPTABLES or Routes?

Do you know a different tool for this purpose which has been tested and works for these specific circumstances?

Is there a different source for tee-proxy (which would fit our requirements perfectly, AFAIK)?

Thanks in advance for your replies.

----------

edit: 05.02.2014

here is the python script, which would function the way we need it:

import socket  
import SimpleHTTPServer  
import SocketServer  
import sys, thread, time  

def main(config, errorlog):
    sys.stderr = file(errorlog, 'a')

    for settings in parse(config):
        thread.start_new_thread(server, settings)

    while True:
        time.sleep(60)

def parse(configline):
    settings = list()
    for line in file(configline):
        parts = line.split()
        settings.append((int(parts[0]), int(parts[1]), parts[2], int(parts[3])))
    return settings

def server(*settings):
    try:
        dock_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

        dock_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

        dock_socket.bind(('', settings[0]))

        dock_socket.listen(5)

        while True:
            client_socket = dock_socket.accept()[0]

            client_data = client_socket.recv(1024)
            sys.stderr.write("[OK] Data received:\n %s \n" % client_data)

            print "Forward data to local port: %s" % (settings[1])
            local_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            local_socket.connect(('', settings[1]))
            local_socket.sendall(client_data)

            print "Get response from local socket"
            client_response = local_socket.recv(1024)
            local_socket.close()

            print "Send response to client"
            client_socket.sendall(client_response)
            print "Close client socket"
            client_socket.close()

            print "Forward data to remote server: %s:%s" % (settings[2],settings[3])
            remote_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            remote_socket.connect((settings[2], settings[3]))
            remote_socket.sendall(client_data)       

            print "Close remote sockets"
            remote_socket.close()
    except:
        print "[ERROR]: ",
        print sys.exc_info()
        raise

if __name__ == '__main__':
    main('multiforwarder.config', 'error.log')

The comments to use this script:
This script forwards a number of configured local ports to another local and a remote socket servers.

Configuration:
Add to the config file port-forward.config lines with contents as follows:

Error messages are stored in file 'error.log'.

The script splits the parameters of the config file:
Split each config-line with spaces
0: local port to listen to
1: local port to forward to
2: remote ip adress of destination server
3: remote port of destination server
and return settings

It is impossible. TCP is statefull protocol. User end computer is involved in every step of connection and it will never answer to two separate servers trying to communicate to it. All you can do is collect all http request on webserver or some proxy and replay them. But that will not give and exact concurrency or traffic conditions of a live server.

From what you describe, GOR seems to fit your needs. https://github.com/buger/gor/ "HTTP traffic replay in real-time. Replay traffic from production to staging and dev environnements." ?

Teeproxy could be used to replicate traffic. The usage is really simple:

./teeproxy -l :80 -a localhost:9000 -b localhost:9001

a production server
b testing server

When you put a HAproxy (with roundrobin) before your webserver you can easily redirect 50% of your traffic to testing site:

         /------------------> production
HAproxy /                 ^
        \                /
         \---- teeproxy -.....> test (responses ignored)

TCP, being a stateful protocol, isn't amenable to simply blasting copies of the packets at another host, as @KazimierasAliulis points out.

Picking up the packets at the layer of TCP termination and relaying them as a new TCP stream is reasonable. The duplicator tool you linked to looks like your best bet. It operates as a TCP proxy, allowing the TCP state machine to operate properly. The responses from your test machines will just be discarded. That sounds like it fits the bill for what you want exactly.

It's unclear to me why you've written off the duplicator tool as unacceptable. You will have to run multiple instances of the tool since it only listens on a single port but, presumably, you want to relay each of those different listening ports to different ports on the back-end system. If not, you could use iptables DNAT to direct all the listening ports to a single listening copy of the duplicator tool.

Unless the applications you're testing are dirt simple I expect that you're going to have problems with this testing methodology relating to timing and internal application state. What you want to do sounds deceptively simple-- I expect you're going to find a lot of edge cases.

I'm trying to do something similar, however, if you are simply trying to simulate the load on a server I would look at something like a load-testing framework. I've used locust.io in the past and it worked really well for simulating a load on a server. That should allow you to simulate a large number of clients and let you play with the configuration of the server without having to go through the painful process of forwarding traffic to another server.

How to duplicate TCP traffic to one or multiple remote servers for benchmarking purposes?

----------

Related

Recent Posts