Troubleshooting Website problems within the local network

Have an external website which opens fine on some PC's, yet seems to time out (or symptoms of timing out, but never actually does) on others.

Seems to only affect (some) of our newer HP Pro 3305 MT Workstations. All of which are running Win7 32bit SP1 with all updates. Older PC's (Win7 32bit SP1 & WinXP) are unaffected.

Using Google Chrome & Firefox makes no difference. Opening the website in IE9 Compatibility Mode has exactly the same symptoms.

All PC's are on the same local network (Workgroup) using the same DNS server & gateway (inhouse) on the same internet connection, on the same subnet. There is no proxy server, no content filtering, no load balancing etc etc. Only group policy in effect (locally) is for Update scheduling. Local firewalls are all the same (Kaspersky WP4) and our external facing firewall has no IP specific settings.

I have no control over the external website, traceroute shows the same destination on all PC's. It is a fairly popular website in our industry (Horticulture) and i'm not aware of any other people (even other sites within our sister companies) with the same problem.

Update: Used Fiddler2 to monitor the HTTP request, seems its not getting fulfilled for some reason?!

Request sent:

GET http://www.rhs.org.uk/ HTTP/1.1
Host: www.rhs.org.uk
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.47 Safari/536.11
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3

Log from Fiddler 2 of the request:

This session is not yet complete. Press F5 to refresh when session is complete for updated statistics.

Request Count:   1
Bytes Sent:      567        (headers:567; body:0)
Bytes Received:  0      (headers:0; body:0)

ACTUAL PERFORMANCE
--------------
ClientConnected:    17:02:33.720
ClientBeginRequest: 17:02:39.118
GotRequestHeaders:  17:02:39.118
ClientDoneRequest:  17:02:39.118
Determine Gateway:  0ms
DNS Lookup:         0ms
TCP/IP Connect: 46ms
HTTPS Handshake:    0ms
ServerConnected:    17:02:39.165
FiddlerBeginRequest:    17:02:39.165
ServerGotRequest:   17:02:39.165
ServerBeginResponse:    00:00:00.000
GotResponseHeaders: 00:00:00.000
ServerDoneResponse: 00:00:00.000
ClientBeginResponse:    00:00:00.000
ClientDoneResponse: 00:00:00.000


RESPONSE BYTES (by Content-Type)
--------------
~headers~:  0

Log of a successful request from a working PC (done this morning, excuse the timestamps being different from above):

Request Count:   1
Bytes Sent:      493        (headers:493; body:0)
Bytes Received:  20,413     (headers:525; body:19,888)

ACTUAL PERFORMANCE
--------------
ClientConnected:    08:22:47.766
ClientBeginRequest: 08:22:47.766
GotRequestHeaders:  08:22:47.766
ClientDoneRequest:  08:22:47.766
Determine Gateway:  0ms
DNS Lookup:         26ms
TCP/IP Connect: 30ms
HTTPS Handshake:    0ms
ServerConnected:    08:22:47.828
FiddlerBeginRequest:    08:22:47.828
ServerGotRequest:   08:22:47.828
ServerBeginResponse:    08:22:48.905
GotResponseHeaders: 08:22:48.905
ServerDoneResponse: 08:22:48.905
ClientBeginResponse:    08:22:48.905
ClientDoneResponse: 08:22:48.905

    Overall Elapsed:    00:00:01.1388020

RESPONSE BYTES (by Content-Type)
--------------
text/html:  19,888
~headers~:  525

So my question has evolved into:

What is the difference between the 2 requests and how do I determine why 1 PC is not getting a reply to it's GET request?

Update 2:

See my answer below. I may well accept it in the future, but without being able to reproduce the problem (or the fix) I'd like to leave this question open.


Solution 1:

If you want to know the difference in the HTTP GET request, download the ZAP (Zed Attack Proxy) from OWASP or some other proxy that will allow you to inspect each packet before it is sent to the server. This will answer the question of "what is the difference between the 2 requests".

If the requests are the same try another NIC.

Most likely your NIC is on-board. Try installing a PCI NIC with appropriate drivers and see if you can get there. Sounds like hardware/driver issue at this point.

Solution 2:

I've never used Fiddler before, but based on the "ServerGotRequest" being un-set in the failure scenario implies one of three things:

  1. The server hasn't received the full request from the workstation (i.e. the HTTP GET hasn't completed)
  2. The server received the request but didn't reply due to an error or other problem on the sever.
  3. The server replied, but the reply packet didn't make it back.

I know this is a hosted server, do you have access to look at server logs or the ability to run a sniffer on it (i.e. WireShark) to capture data while you're testing? If so, watch the server log files for any errors, and run the sniffer until you get a failure scenario at the workstation then look and see if the server received the full response and tried to respond.

After that, check the Kapersky firewall logs to see if it dropped any packets. Is it possible to setup a sniffer in front of the firewall and see if the response from the server is making it back that far? If it makes it to the firewall, and Kaspersky doesn't note dropping anything it's probably safe to assume it made it through.

During these tests, I'd suggest running WireShark on one of the machines that fails. It will show the out-bound connections, plus it should also show any responses the NIC receives. If it is a NIC issue, the sniffer trace should show the packet being received and from there you can determine if that warrants a NIC and/or driver update.

Since you are unable to attach a sniffer to the outside of your firewall, you'll need to work with your ISP to have them setup monitoring for the packets leaving your router, but never receiving a response.

Once the ISP has confirmed or refuted your hypothesis about where the packets are going, there are two options: Option 1: The packet makes it to the firewall but does NOT go out to the ISP during a failed web connect attempt. Option 2: The packet makes it through the firewall onto the ISP network, but the response never comes.

Option 1 might be easiest to replace and/or re-install the firewall if possible. If it is an ISP provided device, you'll want to have them save the current config but apply a very basic configuration on the new system to ensure it's not a configuration related problem.

Option 2 would be nice because it puts the problem on them to fix, but if they don't have the time to look into it then you're stuck with their answer. In this case, it could be that it leaves their network and goes out to their Internet provider - that gets into a whole other can of worms trying to track down where a packet died.