What does SMART testing do and how does it work?
The drive firmware runs the tests.
-
The details of the tests can be read in eg www.t13.org/Documents/UploadedDocuments/technical/e01137r0.pdf, which summarises the elements of the short and long tests thus:
an electrical segment wherein the drive tests its own electronics. The particular tests in this segment are vendor specific, but as examples: this segment might include such tests as a buffer RAM test, a read/write circuitry test, and/or a test of the read/write head elements.
a seek/servo segment wherein the drive tests it capability to find and servo on data tracks. The particular methodology used in this test is also vendor specific.
a read/verify scan segment wherein the drive performs read scanning of some portion of the disk surface. The amount and location of the surface scanned are dependent on the completion time constraint and are vendor specific.
The criteria for the extended self-test are the same as the short self-test with two exceptions: segment (3) of the extended self-test shall be a read/verify scan of all of the user data area, and there is no maximum time limit for the drive to perform the test.
It is safe to perform non-destructive testing while the OS is running, though some performance impact is likely. As the
smartctl
man page says for both-t short
and-t long
,
This command can be given in normal system operation (unless run in captive mode)
If you invoke captive mode with -C
, smartctl
assumes the drive can be busied-out to unavailability. This should not be done on a drive the OS is using.
As the man page also suggests, the offline testing (which simply means periodic background testing) is not reliable, and never officially became part of the ATA specifications. I run mine from cron, instead; that way I know when they should happen, and I can stop it if I need to.
- The results can be seen in the
smartctl
output. Here's one with a test running:
[root@risby images]# smartctl -a /dev/sdb smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.1.6-201.fc22.x86_64] (local build) Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org [...] SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 20567 - # 2 Extended offline Completed without error 00% 486 - SMART Selective self-test log data structure revision number 0 Note: revision number not 1 implies that no selective self-test has ever been run SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Self_test_in_progress [90% left] (0-65535) 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing
Note two previous completed tests (at 486 and 20567 hours power-on, respectively) and the current running one (10% complete).
SMART implementations are manufacturer-dependent, sometimes quite extensive logs are available via smart -a
command. Here's what I get on one of my Self-encrypting drives from Hitachi:
SMART Error Log Version: 1
ATA Error Count: 3
Error 3 occurred at disk power-on lifetime: 2543 hours (105 days + 23 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
10 51 08 00 08 00 00 Error: IDNF at LBA = 0x00000800 = 2048
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 68 00 08 00 40 00 00:00:06.139 READ FPDMA QUEUED
27 00 00 00 00 00 e0 00 00:00:06.126 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:00:06.125 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 00:00:06.125 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 00:00:06.125 READ NATIVE MAX ADDRESS EXT
...
This white paper sheds some light on the error codes appearing in the log. Common error abbreviations are:
- AMNF - Address mark not found
- TONF - Track 0 not found
- ABRT - Command aborted
- IDNF - Sector ID not found
- UNC - Uncorrectable data
- BBK - Bad block mark
In my case, the IDNF error (ID Not Found) can be traced back to an incident when the drive was plugged via USB-to-SATA adapter and happened to be underpowered, which prevented it to seek properly.