SSD goes undetected/offline unless cold/hard booted

I've had a SSD for a little over a year on a sever. And once in a while, the server will bluescreen/be completely unresponsive. I've found out that it's the SSD going offline while the system is running because after rebooting (soft reset), the BIOS won't detect the SSD--unless I completely power off the system and turn it back on. Then the SSD is detected. I've swapped cables etc. What's the cause for this? Can it be a bad SSD? (Doesn't make sense that it would just "go offline"). Running Windows Server 2008...the logs don't tell me anything either.

This is a OCZ oynx, firmware is the latest. My HDTune results show it has a lot of bad sectors but I'm not sure if I trust the results.

Edit

HD Sentinel surface test shows about 19 bad sectors. Once I get my backup ssd (corsair), I'll reformat/reinitialize it to see if that fixes the issue.


Solution 1:

SSD wears out all the time, there’s a special region used for “remaps”. If your drive can’t remap and shows bad blocks it means you’re running out of spare blocks and your SSD is dead. Trash it or RMA if it’s under warranty still.

Solution 2:

Sectors on an SSD aren't mapped to real sectors. When you write to a "sector", the firmware for the SSD is actually writing that data to a yet-unused part of the underlying SSD. It will always choose the part that's least been written to, to accomplish "wear levelling". That said, no disk utility should see bad sectors unless something's gone wrong. I'd recommend replacing the drive and seeing if that helps.

By the way: each SSD ships with some percentage more space than it actually claims to have when it's polled by the OS. This extra buffer space is used when real sectors start dying due to too many writes. This combined with the write levelling is why SSD manufacturers are claiming that their devices have the same if not a longer mean time between failure as a physical drive. If you have an unusually high write load, though, this may not be true.