How to diagnose computer lockups and freezes?

I built a desktop computer a couple years back with the following specs:

  • CPU: Intel Core 2 Quad Q9300 Yorkfield 2.5GHz 6 MB L2 Cache LGA 775 95W Quad-Core Processor BX80580Q9300
  • Motherboard: EVGA 122-CK-NF68-T1 LGA 775 NVIDIA nForce 680i SLI ATX Intel Motherboard
  • Video Card: Two EVGA 256-P2-N758-TR GeForce 8600GT SCC 256 MB 128-bit GDDR3 PCI Express x16 SLI Supported Video Card
  • PSU: SeaSonic S12 Energy Plus SS-550HT 550W ATX12V V2.3 / EPS12V V2.91 SLI Certified CrossFire Ready 80 PLUS Certified Active PFC Power Supply
  • Memory: Two G.SKILL 4 GB (2 x 2 GB) 240-Pin DDR2 SDRAM DDR2 800 (PC2 6400) Dual Channel Kit Desktop Memory Model F2-6400CL5D-4GBPQ

Since its inception, the machine has periodically locked up, the regularity having varied over the years from once a day to once a month. Typically, lockups happen once every few days.

By "lockup" I mean my computer just freezes. The screen locks up, I can't move the mouse. Hitting keys on my keyboard that normally turn LEDs on or off on the keyboard (such as Caps Lock) no longer turn the LEDs on or off. If there was music playing at the time of the lockup, noise keeps coming out of the speakers, but it's just the current frequency/note that plays indefinitely. There is no BSOD.

When such a lockup occurs I have to do a hard reboot by either turning off the computer or hitting the reset button.

I have the most recent version of the NVIDIA hardware drivers, and update them semi-regularly, but that hasn't seemed to help. I am currently using Windows 7 x64, but was previously using Windows Server 2003 x64 and having the same lockup issues.

My guess is that it's somehow video driver or motherboard related, but I don't know how to go about diagnosing this problem to narrow down which of the two is the culprit.


Additional information re: cooling Regarding cooling... I've not installed any after-market cooling systems aside from two regular fans I scavenged from an older computer. The fan atop the CPU is the one that shipped with it. One of the two scavenged fans I added it located at the bottom tower of the corner, in an attempt to create some airflow from front to back. The second fan is pointed directly at the two video cards.


SpeedFan installation and readings Per studiohack's suggestion, I installed SpeedFan, which provided the following temperature readings:

  • GPU: 63C
  • GPU: 65C
  • System: 76C
  • CPU: 64C
  • AUX: 36C
  • Core 0: 78C
  • Core 1: 76C
  • Core 2: 79C
  • Core 3: 79C

Update #3: Another Lockup :-( Well, I had another lockup last night. :-( SpeedFan reported the CPU temp at 38 C when it happened, and there was no spike in temperature leading up to the freeze.

One thing I notice is that the freeze seems more likely to happen if I am watching a video. In fact, of the last 5 freezes over the past month, 4 of them have been while watching a video on Flickr. Not necessarily the same video, but a video nevertheless. I don't know if this is just coincidence or if it means anything. (As an aside, each night before bedtime my 2 year old daughter sits on my lap and watches some home videos on Flickr and, in the last month, has learned the phrase, "Uh oh, computer broke.")


Update #4: MemTest86 and 3DMark06 Test Results:

Per suggestions in the comments, I ran the MemTest86 overnight and it cycled through the 8 GB of memory 5 times without error. I also ran the 3DMark06 test without a problem (see my scores at http://3dmark.com/3dm06/15163549).

So... what now? :-)

Any further suggestions on what to check? Is there some way to get a stack trace or something when the computer locks like that?

Resolution

I have never did figure out the particular problems, but based on the suggestions here and elsewhere, I'm presuming it was a motherboard issue. In any event, I recently upgraded my system, buying a new motherbeard, PSU, CPU, and RAM, and that new rig has been working splendidly the past several weeks. I am using the same graphic cards as in the old setup, so I think it's safe to reason that they weren't the cause of the problem.


Solution 1:

Judging by what you posted temp and cooling wise your computer is overheating and that's the first thing to rectify. 64 C on an idle load is not acceptable and isn't really preferred with a full load. I'm a little paranoid and freak out whenever my CPU get over 35, but really 50 should be your max on a load.

Invest in a good cooling solution for your system. A pretty decent system will only set you back 20 to 30 dollars. If you are looking for some help on what to look for take a look at this Tom's Hardware review of sub $40 cooling solutions.

Also you might want to enable your Blue Screen of Death (as terrible as that sounds) so that you can debug the problematic lockups. This is done by:

--> right clicking on "Computer" from the start menu

--> Select "Properties"

--> Select "Advanced System Settings"

--> Select the "Advanced" Tab

--> Select the "Startup and Recovery"

--> Make sure that "Write an event to the system log" is enabled.

Sometimes there are cleaners that automatically stop BSOD's from recording (Advanced System Care) and you might want to look into preventing that. Once you've checked this issue, then I suggest using NirSoft's BlueScreenView to view the crash details/debug related issues.

Finally, I would check and recheck your PC and ALL of your connections. I actually had a similar situation and found out that one of the internal motherboard USB cables was incorrectly connected, thus causing issues.

Update

I have put together some questions for general troubleshooting and diagnosis of crashes or freezes. Please refer to them as well, as they may also help you in your search for the issue.

  • Forcing a crash to create a dump

  • Troubleshooting via Dump reports

  • Troubleshooting hardware related crashes

Solution 2:

Hard system freezes (where you can't use hotkeys like CTRL+ALT+DEL) are caused by hanging drivers,
so you will have to either replace the device or update the driver. Troubleshooting can be done:

  1. Download the setup from Windows Performance Analysis Tools for your Windows version.
  2. Install the software on your system.
  3. Open a command prompt as administrator, and copy paste the next command:

    xperf -start perf!GeneralProfiles.InBuffer && timeout -1 && xperf -stop perf!GeneralProfiles.InBuffer myTrace.etl
    
  4. Press ENTER once to start the command, now you will have to wait till your system hangs.
    You can do whatever you want to. Please no heavy activity like gaming or private things...

  5. Right after your system stops hanging you go to the console and press ENTER.
  6. After waiting some time a log file myTrace.etl will be produced, compress this to a zip file.
  7. Put this compressed version of the file somewhere online (perhaps 2shared).
  8. Share the link here, I will do an attempt to find and show you the cause of your problem.