2008 R2 Terminal Server: "Insufficient system resources exist to complete the requested service"

I'm working with an unhealthy Windows 2008 R2 Terminal Server configured in a vSphere environment. It currently has 4 vCPUs and 32GB RAM. No overcommitment.

The concurrent user count on this server has risen sharply in recent months (~70), and is possibly over the recommended level. Due to the applications used by the users on this system, splitting this into multiple servers will be a challenge beyond the scope of this question.

However, at certain points during the week (and now, almost daily), new user logons produce the following errors: Event ID 1500

Windows cannot log you on because your profile cannot be loaded. Check that you are connected to the network, and that your network is functioning correctly.

DETAIL - Insufficient system resources exist to complete the requested service.

This remains until some users log off, sessions are manually disconnected or the system is rebooted entirely.

I'd like to know:

  • What resource(s) is this error message referring to? What's actually constrained?
  • Is there an OS-level tunable or configuration that can help with this?
  • Users are content with performance, except for the increased frequency of this error message. Is there something else at play here?
  • Is there an absolute limit to the number of users a terminal server can accommodate? I see 150+ users described in certain tuning guides for Terminal Servers.

enter image description here

enter image description here


Solution 1:

This has been solved.

I began to examine the registry because increasing CPU and RAM resources on the virtual machine did not resolve the issue.

I was pointed to Microsoft's dureg tool to estimate the registry's size. Browsing via regedit, I encountered issues opening the keys under HKEY_USERS\.Default\PRINTERS. Using dureg, I started probing under that hierarchy.


Printers were the problem. The cause and fix are detailed in:
The size of the "HKEY_USERS.DEFAULT" registry hive continuously increases on a Windows Server 2008 R2 SP1-based server

Hotfix: http://support.microsoft.com/kb/2871131

This apparently stops the growth, but the keys and registry need to be compressed to reclaim space.

Compressing bloated registry: http://support.microsoft.com/kb/2498915

1)  Boot from a WinPE disk.
2)  Open regedit while booted in WinPe, load the bloated hive under HLKM. (e.g. HKLM\Bloated)
3)  Once the bloated hive has been loaded, export the loaded hive as a "Registry Hive" file with a unique name.
4) Unload the bloated hive from regedit.
5) Rename the hives so that you will boot with the compressed hive.
e.g.
c:\windows\system32\config\ren software software.old
c:\windows\system32\config\ren compressedhive software

Hmm, a few steps... kinda tricky to do remotely during production hours. I tried to reach out to my resident Microsoft expert to complete, but he was busy chasing down some SCCM or SCVMM issue somewhere. Reading through some Citrix-related forums, I took note of a tool that could perform the above with fewer steps...

So I took a virtual machine snapshot, then downloaded and ran freeware registry compression software (Tweaking.com); despite the overwhelming sound of the collective groans of Microsoft systems engineers everywhere...

note the 1.4GB saved in the default Config... tucows

PLEASE REBOOT!

Following a reboot, all was well. The user count reached 86 with no ill effects and no profile-related errors. I've monitored the printer registry hive and it's held stable.

Solution 2:

In Windows Server 2003 that error was a result of kernel memory exhaustion. Because you're dealing with Windows Server 2008 R2 I'm not sure how closely related the cause of the problem is to the cause in W2K3, but I would bet that it is a memory issue due to the number of users and processes. I would take a look at Nonpaged Pool memory exhaustion as the probable cause. In addition, the number of procceses is at almost 800, which is quite high. MS would probably tell you to reduce the number of processes, which can only be done by reducing the user load.

This article has some good information regarding memory usage in Windows and how you can view the Nonpaged Pool limit to see if that's the cause of the problem:

https://blogs.technet.com/b/markrussinovich/archive/2009/03/26/3211216.aspx

Solution 3:

Start up Windows Performance Monitor to monitor the various counters:

  • Context Switches
  • Page Table Entries
  • GDI elements
  • Handles
  • … (whatever you can find)

And see if one of these peaks when you get a failed login.

Also: something is causing high kernel CPU% on your system - you should investigate that to see if it leads you to a related problem.


The User Profile Hive Cleanup service may help out here as it "helps to ensure user sessions are completely terminated when a user logs off".

Solution 4:

Well, from what I've read about RDS capacity planning in Server 2008 R2, you might just be running your poor terminal server on insufficient resources for the number of users you have using it. In particular, I notice that you have 80 users on 4 vCPUS, and MS recommends 1 core per 15 users.

From the technet blog titled RDS Sizing and Capacity Planning Guidance:

We always felt the need of Hardware capacity guidance and sizing information for Terminal Services or Remote Desktop services for Server 2008 R2, Whenever I am engaged in any architectural guidance discussion for RDS deployment i always get a question what needs to be taken into consideration while deciding the hardware configuration and to do capacity planning.

Here are some bullet points which I recommend to my partners and customers to consider:

  • 2GB Memory (RAM) is the optimum limit for each core of a CPU. E.g. If you have 4 GB RAM then for optimum performance there should be Dual core CPU.
  • 2 Dual Core CPU perform better then single Quad core processor.
  • Recommended bandwidth for LAN of 30 users and WAN of 20 users. Bandwidth (b) = 100 megabits per second (Mbps) with Latency (l) Less than 5 milliseconds.
  • On a Terminal Server 64 MB per user is the Ideal Memory (RAM) requirement for GP Only use + 2 GB for OS E.g. (100 users * 64) + 2000 = 8.4 GB i.e. 8GB RAM.
  • More applications used (i.e. Office, CAD Apps and etc.) will require more memory per user to be added to this calculation over the 64 MB base memory per user.
  • 15 TS session per CPU core is the optimum performance limit of a Terminal Server.
  • Network should not have more than 5 hops, and latency should be under 100ms.
  • 64 kbps is the Ideal Bandwidth per user session. (256 color, switched network, bitmap caching only)
  • CPU performance degrades if %processor time per core is constantly above 65%.
  • Terminal servers performance doubles when it is running on a X64 HW and OS.

In addition to that, Microsoft has just released a whitepaper on Capacity Planning in Windows Server 2008 R2.

Download it here