When do Dell PowerEdge servers (R210II and R620) automatically shutdown due to over heating?

I've had a hell of a time trying to find out when and how a Dell PowerEdge server (in my case we have a bunch of R210IIs and R620s with iDRACs) deals with overheating. I don't want to wait for the CPU to self-preserve, and ideally the server itself should deal with high temps over a period of time by issuing a self IPMI command to the OS to power down before a critical threshold is reached. e.g. at 55C, issue the IPMI command to the OS, if the server reaches 80C, pull the plug, etc...

The problem is that all of Dell's documentation is unclear on when or how a server shutdown from overheating occurs.

My question is if Dell supports thermal management graceful shutdown like this, or it's some fine print or unclear documentation on the critical temperature where it'll simply pull it's own plug? Is Dell OpenManage necessary to support this?

I really would like to avoid having to run a dedicated management server plugged into the various networks (trying to avoid bridging between networks through a single management point) to remotely manage shutdown like this. It would be a single point of failure which is also subject to the same hardcoded or inflexible thermal conditions as my servers themselves.

My R620s have iDRACs in them. I included them for the iDRAC's remote management features, but at this point I'm disappointed the iDRAC is incapable of handling this. It's thermal settings are limited to controlling fan speeds and the horrible documentation and in system help doesn't actually say when shutdown could occur.

Any real world advice is greatly appreciated! Thank you.


The best I could find was from a thread on Spiceworks forums. The response is from a Dell representative:

There are a lot of ways to do this. You are correct that by default none of the options for a graceful shutdown are enabled, but the server will shut down if a critical threshold is met.

You can set alert actions within the iDRAC/CMC. You can set it to power off when a temperature warning or critical threshold is met. You can also set platform events or alert actions within OMSA. There is also a section in OMSA under shutdown for thermal. You can set it to perform an action there as well. Also, you can configure OMSA to execute a program if an event is triggered. You can use that feature to execute the shutdown program within Windows.

The Power Off option in the alert actions is a graceful shutdown. I recommend that you set it to shutdown on the warning threshold. If you configure it for the critical threshold it may attempt a graceful shutdown and then hit the critical limit and perform a hard shutdown before a graceful shutdown can be completed.

I also read an Official Dell PDF regarding OpenManage with this mention of thermal shutdown:

Dell OpenManage Server Administrator (OMSA) enables administrators to set temperature thresholds at which servers should perform an emergency thermal shutdown.

So the answer appears to be Yes, Dell servers do support graceful thermal shutdown and that temperature is configurable. You can use the OpenManage Server Administrator on each server to make these changes (I believe you can make these changes while the server is running). You should not need to install a centralized OpenManage management server, though it can simplify a lot of other management tasks.

:EDIT:
I should append that these answers are generic for Dell servers. I did not find anything specific to the server models you listed.


Thanks to Thomas for digging up the OpenManage doc reference. OMSA, which needs to be installed somewhere, then remotely or locally used to connect to BMC, ultimately sets IPMI PEFs. I discovered that Dell makes a deployment kit that basically contains all the tools OMSA uses to accomplish this.

The Dell OpenManage Development Kit can be had here:

http://www.dell.com/support/drivers/us/en/19/DriverDetails/Product/poweredge-r720?driverId=65JXF&osCode=RH60&fileId=3196318431&languageCode=EN&categoryId=SM

The Linux version (seems to be 64-bit only, there used to be a 32-bit version but I can't find it) include a bootable image for installing firmware, etc... but also getting a console prompt with all the deployment tools accessible. Download it, burn it, insert it into a server and boot it. At the prompt you have access to the 'syscfg' command.

The documentation can be found here, but what you want is the reference guide!

http://www.dell.com/support/Manuals/us/en/19/Product/dell-opnmang-dplymnt-toolkit-v4.2

Using the syscfg command, you can set a PEF to have BMC trigger an action when a regular IPMI alert would be issued. The current usage would look like this:

syscfg pcp --filter=tempfail --filteraction=powerdown

Now when IPMI would normally report a tempfail alert, BMC will issue a power down event. The OS should be informed of the event via APIC and try to gracefully power down. Barring that, the built-in thermal thresholds will do their thing.

If you're familiar with the ipmitool, you can also check (and possibly set PEFs with it, but I haven't tried) the new PEF you set with something like this:

ipmitool <options> pef list

If you grep for "Temperature" you'll see something like this: (can't C&P from the console)

11 | active | 0x11 | Temperature | Any | Critical | Threshold | (0x01/0x0204),<LC,<UC | Alert,Power-off | 1

The Power-off being the newly added PEF action.

I haven't figured out the correct usage to set the temperature threshold using Dell's tools, BUT I have using ipmitool!

ipmitool <options> sensor list | grep Ambient

Ambient Temp | 24.000 | degrees C | ok | na | na | 3.000 | 8.000 | 42.000 | 47.000 | na

You can then set a new threshold based on the ipmitool sensor thresh parameter usage. Here's an exmaple where I change the upper critical threshold to 48C:

ipmitool <options> sensor thresh "Ambient Temp" ucr 48.000

You can try issuing a upper critical temperature event manually, but it seems to only issue the event and is not subject to PEF filter action settings. (issuing event 1 is easier than manually identifying the sensor, etc...)

ipmitool <options> event 1

What I did was set the shutdown temp to 25C and turn off my server room AC for 5 minutes with a colleague while we monitored everything. The target server shutdown right at 25C.