How to improve Windows Server 2008 R2 to handle many connections?
It has been a few days so far that I am trying to figure how to solve this problem. First of all, I am running a website with an average daily page view of 350,000. Previously, all ads management (tracking click and impression that each ads has served) and content were served in a single server with the following spec:
Server 1 OS: Windows 2008 R2 64-Bit CPU: Intel® Core™ i5 - 4 cores RAM: 8 GB Storage: 2 x 1 TB hard drives Bandwidth: 10 TB per month
To improve our website speed, I decided to separate the ads management script to another dedicated server because we have more than 15 advertisers to 30 advertisers per each page.
Server 2 OS: Windows 2008 R2 64-Bit CPU: Intel® Core™ i5 - 4 cores RAM: 4 GB Storage: 2 x 300 GB hard drives Bandwidth: 10 TB per month
The Problem
The problem is that Server 1
can handle both content and ads system. Now, that I take away the ads system and put it at Server 2
. Server 2
can barely serve only ads system.
Test
- First of all, I moved 75% of the ads to
Server 2
. And then, perform a ping to server:ping -t xxxxx
. [I did the ping for 10 minutes and its following similar pattern as below]
Reply from xxxxx bytes=32 time=290ms TTL=116 Reply from xxxxx bytes=32 time=289ms TTL=116 Reply from xxxxx bytes=32 time=320ms TTL=116 Reply from xxxxx bytes=32 time=286ms TTL=116 Reply from xxxxx bytes=32 time=286ms TTL=116 Reply from xxxxx bytes=32 time=348ms TTL=116 Reply from xxxxx bytes=32 time=284ms TTL=116
- Then, I moved 100% of the ads to
Server 2
. Then, perform a ping to server again. [I did the ping for 10 minutes and its following similar pattern as below]
Reply from xxxxx bytes=32 time=290ms TTL=116 Request timed out Reply from xxxxx bytes=32 time=320ms TTL=116 Reply from xxxxx bytes=32 time=286ms TTL=116 Request timed out Request timed out Reply from xxxxx bytes=32 time=284ms TTL=116
Attempts
- Increase
MaxUserPort
andTcpNumConnection
- Restart the server
- Increase IIS
Max Instances
andInstance MaxRequests
Server Resource
- Only 10%-15% of the network connection is used
- Only 10%-15% of the CPU is used
- Only 25% of the memory is used
Solution 1:
Well, let's start. This is longer.
You totally misjudged the facts here it looks. Windows - even the outdated 2008 R2 which you should update ASAP - is completely capable of handling a volume my mobile phone has no problem handling.
So, that leaves 3 possible areas of issues:
Installation. Your drivers may be crappy. Given you run an outdated operating system - how good are your drivers? Update them - this CAN cause all kinds of issues.
Network. This seriously looks like "My car is too slow, please help me make it faster" when the problem is you spend most time in a traffic jam and complaint about the traffic not moving. Not a car tuning problem. 10tb traffic say nothing about the network congestion. Watch your network traffic statistics on your NIC and then react accordingly - if they re not topped out at the speed they should be.... your provider has oversold. Simple like that.
Code. Could be you need more RAM (computer is busy swapping out to RAM instead of processing) or crappy coding is using all your CPU to a degree that makes the kernel level TCP stack not react properly (yes, ICMP replies are that low). This would be brutal - but it is another avenue to check. It could also be that you overload the discs by accessing them too often instead of caching in RAM, but I somehow fail to see that leading to lost pings. Any issue here is not something an admin can handle, though - you have to throw hardware at it, or take a stick and hit the programmer with it until he fixes it (if it is a "stupid" level mistake that eats the performance - if it is not, then it is a lot harder to make serious gains and it may just be your need beefier hardware).
It definitely requires no tuning of windows - a well configured windows can deliver a LOT more than that. My file servers regularly ß over longer periods of time - deliver 4-6gigabit from a relatively stock setup.
Now, all the numbers you give say nothing Seriously.
- 10-15% CPU is used COULD mean swapping.
- 25% memory is used likely is a good indicator now swapping happens, but it could still mean the CPU is waiting for IO.
- 10%-15% network is used means - absolutely nothing because it is only YOUR Side of the network. What about upstream? What if the provider is putting 20 servers with 1 gigabit on a 1 gigabit uplink from the rack and that is overflowing like hell?
The last point is quite likely - dropped packets are a good indicator of that. And this will not be visible for you.
My advice.... turn off anything on a machine for a moment, make a speed test from external with a large static file. I would bet you run into congestion higher up.
Anything you did so far - maxuserport, tcpnumconnection, restarting the server, playing around with IIS settings - is totally off and do nothing in the best place. Banging a hammer on a slow car never fixes anything - especially if the car is slow because it stands in a traffic jam. I would undo all the changes and start analyzing the problem, not only your server. I would bet on network congestion at the moment.