How do I check the health of a SSD?

Solution 1:

to check the health of a SSD

For Ubuntu, Mint, or Debian based distributions

# apt-get install smartmontools

The Media_Wearout_Indicator is what you are looking for. For 100 means your ssd has 100% life, the lower number means less life left.

# smartctl -a /dev/sda | grep Media_Wearout_Indicator

To show your sdd information

# smartctl -a /dev/sda

You can read the complete article at Nam Huy Linux Blog - How to check SSD life left on linux

Solution 2:

Install Gnome Disk Utility and check SMART Data and Tests for wear-leveling-count or similar. The higher that number (%, from 1 to 100), the more "used up" your SSD is, which means you are more likely to have problems. But if you have a recent SSD, you need not worry about it.

Installed via

 sudo apt-get install gnome-disk-utility

start it via

either menu->Settings->Disk utility

or via command line

sudo gnome-disks

Solution 3:

If you don't have an Intel-brand SSD: READ THIS.

Watch out !! -- I was blithely mislead by 'smartmontools.' I have a Samsung SSD, and the smartmonitor/'smartctl' tool happily misreported that '233' (hex 'E9') attribute was 'Media_Wearout_Indicator'; in fact -- no, for Samsung (and other manufacturers) it is up to entirely different. This and other forum postings, stack-exchange question/answers, and power-user blogs I found seem to be 'Intel focused,' with only vague hints that 'it may vary.' (Versus any suggestion that you need to watch out for wrong and erroneous labeling of the attribute by smartmontools).

As I was preparing to copy my SSD to a new harddrive I'd bought (because of what smartmontools had told me), I booted to windows (I have a dual boot system), to learn something about SSD's from what the windows-only Samsung tool 'Samsung_Magician_v43.exe' had to tell me about my drive -- it was shockingly uninformative.

After what's been hours of digging - I've finally been able to run the windows only tools: hddgaurdian, and then also CrystalDiskInfo: Surprise! both tools independently tell me my Samsung SSD is 'just fine' (hdd guardian says '5 stars' and Crystal Disk "98% OK"). By contrast the smartctl tool explicitly labeled the attribute with 'decimal- 233 / 'hex- E9' as "Media Wearout Indicator" -- and told me its value was "1" or 1% -- an indicator of (the risk of) pending failure. To be as sure as I can, I dug and dug and was finally able to locate at least something from Samsung official: Samsung White Paper 07: Communicating With Your SSD [archive.org]

The document indeed implies that the attribute 'hex E9' /'decimal '233' is not used by Samsung the same way. ( Samsung: I'm very disappointed, please either fix your official software-tool, or at least make it clear that you do not provide wear out indication information!)

Further - if you have neither an Intel SSD nor Samsung SSD - be warned, this info does seem to vary across manufacturers. ( e.g. see the attribute label chart on https://code.google.com/p/hddguardian/wiki/about_reliability for the only useful indication of the degree of variability that I found. )

The so-what: If you don't have an Intel SSD-- do not be mislead by the false attribute name labels provided by smartmonitor. Perhaps it will improve in the future, but the version installed by default for Ubuntu 12.04 LTS (April, 2014) was total fail. Instead of telling you it 'doesn't know' -- smartctl just mislabeled the attribute. I did not find another tool for linux that made the 'correct' information transparent or clear.

Solution 4:

For (at least some) NVMe drives, you can do

smartctl -a /dev/nvme0

You can then look for a line like:

Percentage Used:                    5%

Here lower numbers are better and 100% means the drive is "worn out". Manufacturer documentation suggests that it is possible to get numbers above 100% if you keep using the drive beyond this point (example from Seagate, see page 12).

Note that if you use the namespace or partition devices, like /dev/nvme0n1 or /dev/nvme0n1p1, it won't work and you will instead get a message like Read NVMe SMART/Health Information failed: NVMe Status 0x4002.

Solution 5:

For Kingston drives on Debian-based computers

Similar to this answer execute

# apt-get install smartmontools

However when I execute the command to show the drive info, it looks like SMART was disabled:

# smartctl -a /dev/sda 
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-45-generic] (local build)
[ ... ]
SMART support is: Available - device has SMART capability.
SMART support is: Disabled

You need to enable that by executing the following as root:

# smartctl -s on -a /dev/sda

You can then execute a self-test by doing either a short test (which took me about 1 minute):

# smartctl -t short -a /dev/sda

or a more thorough test (which took me about 1.5 hours):

# smartctl -t long -a /dev/sda

Note, in most circumstances you do not need to unmount the drive to execute these tests. If you do, see man smartctl.

Now, when you execute smartctl -a /dev/sda you should then see a self-assessment test result. This is probably all you really need to concern yourself with:

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

If you like details, you will also see a table like this:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   095   095   050    Old_age   Always       -       0/178007034
  5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always       -       0
  9 Power_On_Hours_and_Msec 0x0032   092   092   000    Old_age   Always       -       7626h+46m+45.580s
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       8
171 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       4
177 Wear_Range_Delta        0x0000   000   000   000    Old_age   Offline      -       1
181 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0012   100   100   000    Old_age   Always       -       0
189 Airflow_Temperature_Cel 0x0000   030   035   000    Old_age   Offline      -       30 (Min/Max 24/35)
194 Temperature_Celsius     0x0022   030   035   000    Old_age   Always       -       30 (Min/Max 24/35)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age   Offline      -       0/178007034
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       0
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age   Offline      -       0/178007034
204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age   Offline      -       0/178007034
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail  Always       -       100
231 SSD_Life_Left           0x0013   100   100   010    Pre-fail  Always       -       0
233 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       3498
234 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       2885
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       2885
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always       -       868

If you are looking for what all of these values mean, see the Kingston documentation.