Issues with SSD : rising CRC errors , freezing, sometimes read-only
Replace your SSD
People have tried a lot of things in the comments, but this SSD seems to have some issues.
Judging by the S.M.A.R.T readouts, your drive has not seen a lot of action (~250 power on days, ~6 TB written) and you say it is about 2 years old. This should well be inside the warranty!
My advice is
- backup all you data immediately (though you say you have that covered already)
- remove / replace the SSD (depending on your budget, of course)
- send the disk to the manufacturer for replacement
Your " Slim S70 " disk should be covered under the 5 year warranty of Silicon Power
Just send them a RMA request here.
Some time before May 11, 2017 you updated your SSD Firmware. However a new version was released in September 2017 and you should apply it using Windows.
Run fstrim
to discard unused blocks in the file system:
$ sudo fstrim --verbose --all
/mnt/c: 16 EiB (18446744073709551615 bytes) trimmed
/mnt/e: 16 EiB (18446744073709551615 bytes) trimmed
/: 23.4 GiB (25132920832 bytes) trimmed
In my case the results for Windows 10 partitions /mnt/c
and /mnt/e
were out of this world. So I checked the files and no harm was done to the data.
Run fsck -f
on your SSD after booting with a Live-USB when the partition is not mounted. Another option is running fsck -f
from grub - How to fsck hard drive while hard drive is unmounted, using bootable USB stick?.
As mentioned in comments a bad SATA cable can cause errors. But as this answer points out, a loose connection can also cause errors. To rule out a bad/loose connection, remove the plugs from your SSD, blow compressed air over them and the male pins on the drive and firmly reseat the cables.
How much is your time worth?
The last question is how much is your time worth. Assuming you've spent 10 hours on this problem it works out to $5 / hour because many brand new 120GB SATA III SSDs can be purchased from ebay.com
Feb 23/2018 update
I read all the other answers tonight. One answer says to return it. But if you do and they find nothing wrong they'll simply send it back and you'll be without a drive for 2 weeks to 2 months.
Another answer says smartctl reports there is nothing wrong with the drive.
In this answer I suggested running fsck -f
and you responded that no errors were reported.
Run fsck
every boot
As a compromise between the negative answer (return it) and the positive answer (nothing is wrong), my inclination would be to run fsck
on every boot. If an error is discovered the boot is paused and you can read the error message. To summarize the link use:
sudo tune2fs -c 1 /dev/sdX
Note: replace X
with your drive letter, ie a
, b
, etc..
If after a month of no errors, change the value from 1
to 30
which is typical for most systems I believe. On a typical SSD the fsck
will run quickly.
Clean and re-seat SATA cables
Others mentioned replacing the SATA cable which is problematic for a laptop. As a compromise consider unplugging all cables on the drive side, using compressed air on male and female ends and then plugging the cables back in firmly.
There is nothing wrong with your drive. All tests pass. You are simply misinterpreting the SMART data.
Firstly, the first screenshot contains raw data and you cannot draw any conclusions about it. I have no idea what use its creator thinks that data would be to anybody, but it doesn't really mean anything. Unless the meaningful columns can be reached by scrolling right in the window or something.
Let me explain the columns in the SMART report (the latter report you posted).
- Attribute name: name of the metric
- Value: current value, higher is better. Values are often out of 100 where 100 = best, but can use any scale as long as higher is better. Even if the metric is something like "error rate", it's normalised so higher values mean lower error rates.
- Worse: worst observed value, higher is better.
- Thresh: if value drops below this, it's a fail condition. At or above = pass.
- Type: what a fail condition would mean for this metric.
- Old_age: this metric is indicative of age/usage of the drive, not a specific problem.
- Pre-fail: this metric is indicative of a potential problem with the drive, increasing chance of drive failure.
- When_failed: When this entered failure mode, if ever
- Raw_value: internal measurement of the drive that contributed to the value - this is not useful for end user and lower or higher values do not necessarily indicate better or worse.
To address some specific areas of the report:
SMART overall-health self-assessment test result: PASSED
This reflects everything passed. None of the metrics measured has ever entered a failure state.
The log of "errors" is relatively typical for a drive. These do not necessarily indicate unrecoverable errors or even problems with the drive itself; their reports are vague, so you can't tell what actually happened from this except that it was during DMA transfer at the controller, but if anything was important it would be reflected in the overall health report. In particular, these ones could be something fairly innocent like writes that were cancelled at the controller end, or the OS requesting some feature during load that the drive doesn't support, which may be entirely normal when probing device capabilities.
Finally, a note about CRC errors or error rates: all drives have an error rate. Drives store data at such high densities that a certain number of bit errors is expected and designed for, by using error correction code. The error correction code ensures that a certain number of bit errors per chunk of bits may occur and be 100% corrected. The drive is constantly applying the error correction code all the time, and the error correction code is designed so that the chance of an unrecoverable error occurring randomly is very low (as in, significantly less likely than winning the lottery) in a well functioning drive. If you see an error rate in any stats and it's treated like no big deal, it's because it isn't, it'll just be corrected errors.