Monitoring HTTP traffic using tcpdump

To monitor HTTP traffic between a server and a web server, I'm currently using tcpdump. This works fine, but I'd like to get rid of some superfluous data in the output (I know about tcpflow and wireshark, but they're not readily available in my environment).

From the tcpdump man page:

To print all IPv4 HTTP packets to and from port 80, i.e. print only packets that contain data, not, for example, SYN and FIN packets and ACK-only packets.

tcpdump 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'

This command

sudo tcpdump -A 'src example.com and tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'

provides the following output:

19:44:03.529413 IP 192.0.32.10.http > 10.0.1.6.52369: Flags [P.], seq 918827135:918827862, ack 351213824, win 4316, options [nop,nop,TS val 4093273405 ecr 869959372], length 727

E.....@....... ....P..6.0.........D...... __..e=3...__HTTP/1.1 200 OK Server: Apache/2.2.3 (Red Hat) Content-Type: text/html; charset=UTF-8 Date: Sat, 14 Nov 2009 18:35:22 GMT Age: 7149
Content-Length: 438

<HTML> <HEAD> <TITLE>Example Web Page</TITLE> </HEAD> <body>
<p>You have reached this web page ...</p> </BODY> </HTML>

This is nearly perfect, except for the highlighted part. What is this, end -- more importantly -- how do I get rid of it? Maybe it's just a little tweak to the expression at the end of the command?


Solution 1:

tcpdump prints complete packets. "Garbage" you see are actually TCP package headers.

you can certainly massage the output with i.e. a perl script, but why not use tshark, the textual version of wireshark instead?

tshark 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'

it takes the same arguments as tcpdump (same library) but since its an analyzer it can do deep packet inspection so you can refine your filters even more, i.e.

tshark 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' -R'http.request.method == "GET" || http.request.method == "HEAD"'

Solution 2:

take a look at ngrep - it mighe be of some use for you.

as reference for others httpry [ server seems to be down now but i hope it's temporary ] and tshark are also useful for passive protocol analysis - first one just for http, second - for much more.

Solution 3:

Try httpry or justniffer

Justniffer works well on tcp packets reordering retrasmissions and ip fragmentation