How to debug a PC that sometimes fails to boot (& enters a boot loop)?

I've been battling with my machine for weeks now, hoping for community's ideas or experience.

My PC usually fails on boot and enters a restart loop. Typically, it passes the ASUS (mobo) screen and shuts down during the Windows load screen with the wheel spinning. The PC will auto-restart shortly after shut down, and repeat failure. The shut down isn't exclusive to the Windows load screen; it's also occurred prior to reaching the Windows screen or after entering ASUS UEFI. Generally, it will shut down between 3-15 seconds after startup.

The caveat is this: about 10% of the time, it boots and works completely fine. The Windows load screen has come to serve as a "gate" for me: if it doesn't fail here, it's going to work. The PC has never shut down after reaching the Windows user log-in screen. Everything looks normal after successful boots (temps, hardware installed, etc.)

Here is what I've already tried:

  • Cleaned / removed dust from interior, components
  • Reseated components & wires (GPU, PSU, RAM x2, wireless card, hard drives x2)
  • Reapplied thermal compound to CPU, reseated
  • Updated mobo drivers & BIOS
  • Updated GPU drivers
  • Reset CMOS (removed battery, reseated)
  • Reset motherboard overclocks

Specs:

  • Windows 10 Home 64-bit
  • ASUS Z97-A
  • Crucial 128 GB SSD (boots from this drive)
  • GeForce GTX 970
  • i7 4790k

A motherboard beep speaker is installed; it beeps 1 short time as it starts-up, which according to ASUS means the device is OK and booted normally.

Some ideas I'm still considering:

  • Reseating everything again
  • Replacing the CMOS battery with new
  • Updating more drivers (hard drive, etc.)
  • Reformatting / reinstalling Windows

Thank you for any help & consideration.


Solution 1:

I would first rule out HDD and memory issues by scanning both of these with a bootable tool (either from USB or a CD if available).

You can use memtest86+ for memory (available at https://www.memtest.org/ - a single full run should be sufficient, probably takes half an hour), and the manufacturer of the HDD inside your PC should have a bootable diagnostic tool available for that. Both can be run non destructively.

Do make sure your bios is set up to boot from said device, and in case of USB make sure the drive itself is bootable.

Solution 2:

Since I understand this problem appears only on cold boots (the PC was completely turned off, in S5 state, and you press the button for the first time) the most plausible explanation to me is either some problem with the power section of the motherboard, maybe faulty capacitors, or some problem with the SSD (corrupt/damaged files?).

I would test the RAM with memtest as some other answers described, but the intermitent nature of it makes me suspect this is not likely the case.

Have you tried to do a system file check?

sfc /scannow

https://support.microsoft.com/en-us/topic/use-the-system-file-checker-tool-to-repair-missing-or-corrupted-system-files-79aa86cb-ca52-166a-92a3-966e85d4094e

From all the solutions you are considering, reinstalling windows is the only one that have some chances of fixing the problem. Since SSDs are quite cheap this days, I would probably buy a 240gb SSD to upgrade your current one, do a clean install on this one, so you remove two possible failure points at once and if the problem is fixed just move the relevant files from the old HDD. Else... well, motherboard, cpu and RAM just gained a lot of tickets for the raffle.

I had some bad experiences in the past with this platform (socket 1150), with two different pcs, different chipsets, different cpus, and both crashing with no apparent reason (but also can withstand 24 hours of OCCT on the same day after a BSOD just happened) but might be just some bad luck on my part. Both were upgraded and I did no more research on the topic.

In any case... Does the problem appear (or has chances of appearing) if you REBOOT the system after a succesful boot?