Windows Server crash with multiple RAID controllers

I have used lsi controllers ,

First of all:

Any brand of controllers usually try to connect each other and see each other for various configurations. And when their rom software version changes they mostly cannot recover arrays or volumes.

Second:

It is possible to force bios of other brands and different versions to different cards since cards of similar years use same LSI chip. It is called cross flashing I guess. From brand to brand from IR mode to IT mode (SATA forward only disables ram and raid on card).

What I would do is simple. If you cant figure out reason of crash from event viewer, do the following, if one bullet point fails to solve the issue switch to next

  • check if PSU is sufficient

  • switch and check the pci-e ports (maybe some metal scrap or paper in it)

  • examine cards visually for burn marks or broken smd's

  • match the rom software versions and figure out problematic one

  • cross flash problematic one to match all brands and put same rom software ver.

  • cross flash all into IT mode do your raid thingies with mdadm software raid.

Usually in these situations we buy some new cards. But sata - software raid approach is very solid. I use that on every setup. You just require to use some bash mdadm commands correctly. Its a couple of commands very simple compared to LSI manual , thousands of raid controller commands , patrol reads, consistency check planning etc.

My favorite setup is software raid 10 mdadm with bcache from multiple ssd's. It works great with iscsi and samba. You just need to adjust raid chunk and fs cluster sizes correctly.

Do it carefully not to lose data. It is the method I use but I am taking no responsibility to any damage to your equipment or data loss or any other type of damage. These cross flashing things and raid setups are risky (always and ever).


Answer above helped me to identify the problem - this is good algorithm. Sometimes we suspect something that seems more obvious - in my case usage of several RAID controllers from different vendors seemed suspicious. I checked everything from accepted answer - but the server was crashing. At the very beginning I notices warnings in event viewer - WHEA-Logger, "A corrected hardware error has occurred.". According to this article this is only warning. But eventually I switched RBSU from C3 State to "performance" and now server is stable. I hope this will help owners of HP DL580 G7. Also I added more powerful PSU.