What's causing AppCrash and BSOD events, general instability?
SOLUTION: It was the RAM settings all along :-| It never occurred to me that the stock settings on a stock board with stock RAM would be so far off that it'd cause system instability. I've never done any overclocking, so I never looked very closely at those settings. Once I chose the DOCP profile that matched my RAM, everything cleared up, and it's even a little faster. Thanks to Twisty Impersonator for the process guide and to magicandre1981 for the suggestion that prompted me to check the settings. Hopefully, this will save someone else 2 years of frustration.
EDIT: Well, I think the cause has become clear. After replacing ALL the hardware, and STILL seeing a problem, I decided to go back to the hardware idea. In short: if I run with two sticks of RAM, everything is fine. It doesn't matter which two sticks. If I put in all four, I start having problems. This seems like a pretty clear indication of a bad motherboard.
The Symptoms:
For the last several years my machine has been generally unstable, off and on. Typically manifests as BSODs with varying stop codes.
- Upgrading the RAM improved the stability for a while.
- Upgrading the motherboard improved the stability for a while.
- Replacing the
C:
drive improved the stability for a while. - Refreshing or reinstalling the OS has occasionally been necessary, and usually improves stability for a while.
I have replaced literally every functional component in the system, except the CPU and Blu-ray drive. I have not ruled out the CPU, but there is still a vast swath of software-"things" that might also be at fault.
Each time, the problem has returned after a few months.
Most recently, the symptoms have changed slightly. I am open to the possibility that this is a completely unrelated problem, but it seems too similar to the problems I have been battling the whole time, to be mere coincidence.
A few weeks I rebooted my computer to update, and it would not POST
. I fussed with it for a while (checking connections, MemOK!
button, disconnect power, TPU
on/off, EPU
on/off, etc.) and got it to POST
, but the OS would not load. I forget the exact presentation of symptoms, but IIRC it would just sit and spin.
Reinstalled the OS and things were quiet for a week or so, until apps began crashing. At first, it seemed like all the apps that were crashing were installed on the same SSD. Without room to move things around and test, I upgraded to the new Samsung drives. But apps are still crashing.
- Flashed latest BIOS update. No change.
- Turns out, you have to reset the CMOS when you flash the BIOS. Potential symptoms are much like mine. I reset the CMOS. No change.
- It was generally high-demand applications that would crash (Dishonored 2, Diablo III, ESO, etc). But crashes are happening between 35°C-45°C for CPU and GPU - So probably not temperature.
- It is not running out of RAM.
-
MemTest
has never shown any problems. I have run it dozens of times. - No CPU test has ever shown any issues, except at high temperatures.
- No GPU test has ever shown any issues, except at high temperatures.
- I've reinstalled my video drivers a few dozen times.
- I had Task Manger crash while I was watching yesterday.
- Tried to install a Windows Store App. Some background process crashed. Had to try again. Worked fine.
- Event Viewer has just
AppCrash
events
AppCrash
events are being produced by a wide range of applications. Varying sizes, locations, demands, etc. It is typically once a day, maybe less. But high-resource applications crash pretty reliably within 30 minutes or so.
I should clarify that these are not Windows is looking for a solution
AppHang events. The application just vanishes, like I closed it, and Windows has nothing to say about it except the AppCrash event in the Event Viewer. Less often, there is a BSOD. Lately, I have seen IRQ not less than or equal
, and others that I cannot remember... (I don't have any memory dumps anymore? That's weird...).
System specs:
- OS: Windows 10 Pro (upgraded from Win7 during free upgrade period)
- CPU: AMD Phenom II 1090 (no overclocking)
- Cooling: CoolerMaster 150mm CPU fans, several case fans
- Mainboard: ASUS M4A99X EVO R2.0
- RAM: G.Skill 16GB(4x4) DDR3-1333
- GPU: MSI GTX 970 (no overclocking)
- PSU: Corsair CX750M
- System drive: Samsung 850 EVO 500GB
- Other drives: Samsung 850 EVO 500GB, other conventional drives, optical drive
- A/V: Windows Defender, no other AV
Crash dump:
Prompted by this post: https://superuser.com/questions/1281659/possible-to-determine-which-core-a-faulting-application-was-on-when-it-crashed
Hit a new BSOD while it was idling last night. Details from WhoCrashed
below:
Crash dump directory: C:\WINDOWS\Minidump
Crash dumps are enabled on your computer.
On Wed 1/3/2018 9:00:13 AM GMT your computer crashed
crash dump file: C:\WINDOWS\Minidump\010318-12546-01.dmp
This was probably caused by the following module: ntoskrnl.exe (nt+0x1640E0)
Bugcheck code: 0x1E (0xFFFFFFFFC0000005, 0xFFFFF8019CED183E, 0xFFFF968442FBEB68, 0xFFFF968442FBE3B0)
Error: KMODE_EXCEPTION_NOT_HANDLED
file path: C:\WINDOWS\system32\ntoskrnl.exe
product: Microsoft® Windows®
Operating System company: Microsoft Corporation
description: NT Kernel & System
Bug check description: This indicates that a kernel-mode program generated an exception
which the error handler did not catch. This appears to be a typical software driver bug
and is not likely to be caused by a hardware problem. The crash took place in the Windows
kernel. Possibly this problem is caused by another driver that cannot be identified at this time.
On Wed 1/3/2018 9:00:13 AM GMT your computer crashed
crash dump file: C:\WINDOWS\memory.dmp
This was probably caused by the following module: ntdll.sys (ntdll!ZwFlushBuffersFile+0x14)
Bugcheck code: 0x1E (0xFFFFFFFFC0000005, 0xFFFFF8019CED183E, 0xFFFF968442FBEB68, 0xFFFF968442FBE3B0)
Error: KMODE_EXCEPTION_NOT_HANDLED
Bug check description: This indicates that a kernel-mode program generated an exception
which the error handler did not catch. This appears to be a typical software driver bug
and is not likely to be caused by a hardware problem. A third party driver was identified
as the probable root cause of this system error. It is suggested you look for an update for
the following driver: ntdll.sys.G
Google query: ntdll.sys KMODE_EXCEPTION_NOT_HANDLED
Memory dumps (full and mini) will be here, as they are available: https://1drv.ms/f/s!AhSzRvnavkrXhPpNy8Qjhaj6LbbTwQ
@magicandre1981 recommended chkdsk /f
based on the results of my memory dump. C:
is the only drive for which a pagefile is enabled (it's system managed), so that's the one I ran it on. Here are the results:
Checking file system on C: The type of the file system is NTFS.
A disk check has been scheduled.
Windows will now check the disk.
Stage 1: Examining basic file system structure ...
605184 file records processed. File verification completed.
Deleting orphan file record segment 699DD.
10717 large file records processed. 0 bad file records processed.
Stage 2: Examining file name linkage ...
14846 reparse records processed. 704776 index entries processed. Index verification completed.
0 unindexed files scanned. 0 unindexed files recovered to lost and found. 14846 reparse records processed.
Stage 3: Examining security descriptors ...
Cleaning up 1426 unused index entries from index $SII of file 0x9.
Cleaning up 1426 unused index entries from index $SDH of file 0x9.
Cleaning up 1426 unused security descriptors.
Security descriptor verification completed.
49797 data files processed. CHKDSK is verifying Usn Journal...
37651904 USN bytes processed. Usn Journal verification completed.
CHKDSK discovered free space marked as allocated in the
master file table (MFT) bitmap.
CHKDSK discovered free space marked as allocated in the volume bitmap.
Windows has made corrections to the file system.
No further action is required.
487284001 KB total disk space.
209659436 KB in 259738 files.
162276 KB in 49798 indexes.
0 KB in bad sectors.
729085 KB in use by the system.
65536 KB occupied by the log file.
276733204 KB available on disk.
4096 bytes in each allocation unit.
121821000 total allocation units on disk.
69183301 allocation units available on disk.
Internal Info:
00 3c 09 00 f0 b8 04 00 7e 93 08 00 00 00 00 00 .<......~.......
98 05 00 00 66 34 00 00 00 00 00 00 00 00 00 00 ....f4..........
Windows has finished checking your disk.
Please wait while your computer restarts.
No luck. Even after chkdsk fixed these issues, I'm still having the same crashes, though no new BSODs yet.
Another BSOD as I was opening the browser to update this question. Memdumps available once they finish uploading.
But the original reason I came to update is that I found a whole crapton (51 to be precise) of events that look exactly the same. It looks like they happened about every half-hour, starting right after I left for work (7:30am) until about 8:30pm. They might still be happening. They all look like exactly this:
Fault bucket 0x1E_c0000005_fltmgr!FltpPreFsFilterOperation, type 0
Event Name: BlueScreen
Response: Not available
Cab Id: 0
Problem signature:
P1: 1e
P2: ffffffffc0000005
P3: fffff8019ced183e
P4: ffff968442fbeb68
P5: ffff968442fbe3b0
P6: 10_0_16299
P7: 0_0
P8: 256_1
P9:
P10:
Attached files:
\\?\C:\WINDOWS\Minidump\010318-12546-01.dmp
\\?\C:\WINDOWS\TEMP\WER-18531-0.sysdata.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER5795.tmp.WERInternalMetadata.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER57A5.tmp.csv
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER57B6.tmp.txt
\\?\C:\Windows\Temp\WER8F12.tmp.WERDataCollectionStatus.txt
These files may be available here:
C:\ProgramData\Microsoft\Windows\WER\ReportQueue\Kernel_1e_b49232881f44bde28acca17f0ad8bac3b4fbb67_00000000_cab_031c57c4
Analysis symbol:
Rechecking for solution: 0
Report Id: 3c2abe43-d7d6-4561-9b0d-2adf1f40c745
Report Status: 388
Hashed bucket:
I have a hard time believing that the CPU would have this issue for so long, and the computer still be functional. I haven't had much success exploring software/configuration issues.
Any ideas?
Almost 3 weeks later.... After MUCH shenanigans, I finally acquire a new CPU (upgraded from Phenom II to FX-8350). Replacement was easy enough. Then probe common problem-areas, and apps are still crashing.
As soon as I posted "sad-face," Windows told me something about a "Device Health Report." It reports trouble with a driver. Unfortunately, but unsurprisingly, the Troubleshooter was unable to detect any kind of problem. I uninstalled the two "USB Root Hub" devices in error state from the Device Manager.
Does this provide any additional clues? I'm really at a loss, now...
Here is a list of driver information...? https://docs.google.com/spreadsheets/d/1xAliAOt1s8rQ_ePX5OwTRVFPB3kFYgc3-1HRUznMpR0/edit?usp=sharing
Divide & Conquer
First, you must try to determine if this is hardware or software issue. Sometimes it involves both, but initially it's best to assume not.
In my experience, the most effective way to determine which camp is at fault is to boot to a second, completely different OS (without changing any hardware, mind you) and attempt to reproduce the problem. It's best to use an OS that doesn't use any of the same code as the suspect OS. For example, if your suspect system runs Windows, you could use Ubuntu for your test OS. Live CDs are good for this.
With intermittently occurring problems this can be challenging, but however you go about it, you need to know if:
- Both OSes are affected, meaning you have a hardware issue, or
-
Only your suspect OS is affected, meaning you may have either:
- A software issue, or
- An incompatibility between a hardware component and specific software (which is almost always a 3rd party driver).
If you think it's hardware
You've already tested and replaced a lot of components. If the unwanted behavior manifests itself in your test OS, you are armed with conclusive evidence something you've not yet replaced is at fault. For those components that don't lend themselves to comprehensive testing (e.g. the motherboard), you'll probably want to try replacing other, less costly components first, but eventually you may have no choice but swap the more expensive components as well.
If you think it's software
If the test OS doesn't trigger the fault, you can be more confident there's a problem with the software in your target OS. However, if the failure has historically not been able to be produced on-demand or otherwise occurs only intermittently, there remains a chance it's still a hardware issue that simply wasn't triggered in the test OS. Don't dwell on this; just keep it in mind when testing your tentative solutions.
When sorting out what code is at fault, you obviously want to follow up on specific error messages, such as Windows' bugcheck codes, errors logged in the event logs, or in application-specific logs. I'll skip over these steps based on the assumption you've exhausted those leads and need a more general approach.
When it's unclear what software is at fault, your weapon of choice is to remove the software from the equation and run the system long enough to give the problem a chance to occur, if it's going to. You can do this by:
- Uninstall the software.
- Disable it using a tool such as Microsoft AutoRuns.
- Disable it by booting into Safe Mode.
- Create a second Windows installation without the software in question (useful if you really need the software for day-to-day use and want to be able to easily switch between "testing" and "production" mode).
When doing this I like to categorize the system's software as follows and troubleshooting accordingly:
- Windows own code and inbox drivers. Least likely to be at fault. Easily confirmed by testing the system using a pristine install (one without any 3rd party code).
- Third party drivers. Always causing trouble. Usually crash in non-random ways such that a pattern emerges. Test by using different driver versions, or by swapping out hardware components.
- Third party system-level software (e.g. security software). Troublesome. These are rarely required for proper system operation and can be completely uninstalled in order to test their influence.
- User applications. Highly variable crash behavior. On modern versions of Windows these rarely crash or lockup the entire system. Failures only occur when the application is running, so it's easy to track failures and correlate them with programs that were running at the time. Watch out for user applications that have an always-on component such as startup items or systems services.
Keep a semi-detailed work log
Final thought here. Keep a log of ask the problems you encounter and troubleshooting steps you take. With a difficult and drawn-out problem like this one it's easy to forget details. Being able to review this as you work may help you rule out causes or make connections between facts that otherwise might be lost in the struggle.
Anecdotal Story
I worked on a system that reminds me of your situation. It was a laptop (which limited my hardware swapping options) that would lock up randomly. It would do it 10 seconds after power-on, then not for days, and then after being on for hours. I updated everything, tested and replaced every hardware component I could, and reinstalled Windows (at least once, if not twice).
It ended up being the motherboard. After it was replaced, the laptop ran for many years without further trouble.