High Instances of Zero Window Messages
On my web servers I am seeing a high rate (106 over ~13 seconds or 300,000 packets) of zero window update messages sent form my web servers to my database servers during peak traffic.
Firmware is Updated:
I have updated the firmware and driver to the latest versions that dell provides for the BCM5709C cards.
TCP Offload is enabled:
Going off the fact that I see an active "Total Offload TCP Connections" in the Broadcom Advanced Control Suite interface (BACs) TCP offloading is enabled. I also don't see the CPU pegging on the servers.
Window Scaling is enabled:
Window Scaling is enabled but not used much. I see 20 packets with Window Scaling set out of 300,000 packets.
Stats:
Average round trip time is ~2MS with a max of ~3 MS. CPU usage on the Web servers is not peaking at all.
Questions:
- I don't believe that the buffers should be filling this much on the web servers .
- Are the other metrics beside CPU I should be looking at to see why the buffers are filling up?
- Given that everything is up to date should I be looking into tuning the TCP parameters on my Windows 2008 Server R2 web servers? What adjustments should I be making if this is the case?
Solution 1:
The question is somewhat aged already. I am not sure if it is still unresolved, but will try some troubleshooting advice nonetheless.
First of all, it is important to check where zero-window-announcements occur. At certain points in the protocol exchange it might be perfectly valid for them to be there if the web server simply does not expect any data to come back as a response at a given moment and maybe has set the receive buffer to 0 for a given socket or has the receive buffer filled up by simply not fetching anything from there for a while. Debugging this would require knowledge of the protocol (better yet the implementations) used.
You should not need to tune any value of the TCP parameters for any common LAN setup, TCP is mainly self-tuning except for extreme cases like networks with variable latencies or unpredictable packet loss.
Solution 2:
I've never run in to this but I have a hunch the problem is at the application layer. I would start by looking at perfmon counters related to the web processes. The "Internet Information Services (IIS) 7.0 Resource Kit" and the "Internet Information Services (IIS) 7.0 Administrator's Pocket Consultant" both have information regarding performance monitoring and tuning, unfortunately neither one is free.
http://www.microsoft.com/learning/en/us/book.aspx?ID=9550&locale=en-us
http://www.microsoft.com/learning/en/us/book.aspx?ID=10442&locale=en-us
EDIT:
One possible method of tracking this down (admittedly very crude) would be to temporarily stop the web services on the server and download a large file or a large number of small files to the web server and see if you have the same zero window condition. If you do then you can probably rule out any resource issues with the web services as the cause. If you don't then you can focus all of your efforts on analyzing the resource usage of the web services to find the cause.