slow response from squid proxy - optimization

Solution 1:

My first hunch would be to sniff the traffic using tcpdump and load it into wireshark to see where the delay is happening.

tcpdump -i any -s 0 -w /tmp/squid.pcap

(If you're doing it over ssh, add "not port ssh" to the end.)

Once you load this into wireshark you should be able to see where the delay appears to be. I'd recommend doing this during a quiet time so there isn't too much traffic obscuring your view. If you can be the only person accessing the proxy at the time, even better.

Likely delays are:

  • Browser contacting proxy
  • Proxy contacting webserver
  • Proxy DNS requests
  • Proxy returning response to browser

Solution 2:

For some web pages, it is not possible to draw the page before nearly the entire page is downloaded, images and all. To speed up such a page, there are a few things you can do:

  • Use web cache (as you already are): this brings in images quicker.
  • Use faster machines: much of the time may be in "composition" - that is, laying out the web page for display, and not in getting the information.
  • Use faster browsers: this is the same as the above. If using IE5, try IE6 or IE7. If using Firefox 2, try Firefox 3 or Safari.

In days gone by, I used to browse with Internet Explorer for Macintosh (68k in those days). I well remember seeing the "newspaper" icon that told you to wait as IE was computing how to display the page (not getting data: computing...)

Another thing to be aware of: some pages will explicitly request that they not be cached: it is up to the cache administrator as to whether these requests are granted or denied. Typically these pages are those that change often or that the web admin does not want to have stored elsewhere. Thus, in such a page, you will have an additional overhead involved as the web cache must process the page on your behalf, even though there is no page ever in the cache at all.

I would agree that sniffing traffic is a good way to determine why things are being delayed. What part of the network stream is actually causing the delay?

Wireshark (and tcpdump) have a large set of filters that you can use to clean up the traffic: the only reason you'd really have to wait until a quiet time is in order to avoid having a massive TCP dump file. However, you can get a reasonable set of data just be limiting yourself to direct-to-proxy network traffic:

tcpdump -s 0 -n -w tcpdump.dat port 3128

(Port 3128 is the standard squid port: use whatever is appropriate for you.)

Using Wireshark, you can instantly filter based on a single TCP traffic stream: so you don't have to worry about the mixing of different streams there either.

Also look at the logs in /var/log/squid and examine what is happening to the request: is it coming from the cache? Is it coming from the remote site? Try repeated requests - does the page come up quicker after running it once?