HP ProLiant DL380 G3 Running Windows Server 2000 has crashed between 6-7:30am for the past 5 days
I have a HP ProLiant DL380 G3 running Windows Server 2000 that has been crashing everyday between 6-730am. This started when I changed out a failing hard drive 6 days ago. I have looked at the scheduled tasks which does not have anything pertaining to this issue. Below are the only things I see in the system log and some of the dump files. Can this be a hardware issue if this happens at a certain time frame everyday? Any help is greatly appreciated. Thanks
The previous system shutdown at 6:07:55 AM on 2/7/2012 was unexpected.
System Information Agent: Health: The server is operational again. The server has previously been shutdown by the Automatic Server Recovery (ASR) feature and has just become operational again. [SNMP TRAP: 6025 in CPQHLTH.MIB]
BugCheck 7A, {3, c0000005, 3400028, 0}
Probably caused by : memory_corruption ( nt!MiMakeSystemAddressValidPfn+42 )
Followup: MachineOwner
0: kd> !analyze -v
- *
- Bugcheck Analysis *
- *
KERNEL_DATA_INPAGE_ERROR (7a) The requested page of kernel data could not be read in. Typically caused by a bad block in the paging file or disk controller error. Also see KERNEL_STACK_INPAGE_ERROR. If the error status is 0xC000000E, 0xC000009C, 0xC000009D or 0xC0000185, it means the disk subsystem has experienced a failure. If the error status is 0xC000009A, then it means the request failed because a filesystem failed to make forward progress. Arguments: Arg1: 00000003, lock type that was held (value 1,2,3, or PTE address) Arg2: c0000005, error status (normally i/o status code) Arg3: 03400028, current process (virtual address for lock type 3, or PTE) Arg4: 00000000, virtual address that could not be in-paged (or PTE contents if arg1 is a PTE address)
MODULE_NAME: nt
IMAGE_NAME: memory_corruption
BugCheck A, {0, 2, 1, 804137d6}
Probably caused by : ntkrnlmp.exe ( nt!CcGetVirtualAddress+ba )
- *
- Bugcheck Analysis *
- *
IRQL_NOT_LESS_OR_EQUAL (a) An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high. This is usually caused by drivers using improper addresses. If a kernel debugger is available get the stack backtrace. Arguments: Arg1: 00000000, memory referenced Arg2: 00000002, IRQL Arg3: 00000001, bitfield : bit 0 : value 0 = read operation, 1 = write operation bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status) Arg4: 804137d6, address which referenced memory
MODULE_NAME: nt
IMAGE_NAME: ntkrnlmp.exe
The first thing to understand here is that Windows 2000 is no longer supported by Microsoft. New security vulnerabilities are no longer patched. Windows Update on this server is now meaningless. That alone means it's well past time to move away from this server anyway.
The second thing to consider is that, given the consistent timing, you may want to go over any scheduled tasks on the system that are active during this period. Also look for other environmental factors — really, anything that increases load on the server at a certain time of day could push hardware that is only beginning to fail over the edge.
Which brings up the next thing: given the age of the equipment, it's very likely just something like an old RAM stick that's gone bad, especially as one of your errors is memory corruption. But again, that really indicates to me that this server has passed it's end of life and should be replaced, because even if you find and correct this issue you're likely to have another one fairly soon down the road.
It's time to update. See my comments here in the following question: HP DL380 G3 2U For Basic Web Server in 2012 and Best sysadmin WTF?
Since you're working with 9 year-old hardware, there's the usual risk of failing components. This particular issue may be firmware related, though. Please make sure you're running the most recent firmware for Windows 2000 + DL380 G3. Pay particular attention to the system BIOS version and the Smart Array controller (5i? 5300? 641? 6400?). See if that makes a difference for now.