SMART - Seek Error Rate

I have read that seek errors is an incremented count of track seeks and that the count resets to zero after a fixed number of thousands of seek commands. This is evident in some of the BackBlaze hard drives (see Figure 1 below).

In Figure 1 the seek rate for the hard drive increases up to and including day 234. The count is then reset on day 235.

Is this incremental count just the total time that the drive has taken to locate a specific piece of stored data?

Does anyone know why this count is reset and if it means anything? I.e. does resetting just reset the count or does it perhaps mean that the disks seek rate is restored to as good as new at day 235?

I am wondering if I can visualise the seek error rate as in Figure 2. Figure 2 (if my understanding is correct) is the total time that the drive has taken to locate a specific piece of stored data without the count rest at day 235. If the count reset does not improve the health of the disk, or if it does not affect the seek rate after the count is reset, then I guess this is fine.

The counters are reset like an odometer rolling over after running out of integers. Many device controllers will have different thresholds, but a 0 count does not mean that the drive is without errors, just as a vehicle with 1,000,010km on the odometer is not "fresh off the assembly line".

If you would like to build a graph as seen in Figure 2, you could write a little data collection utility that reads the SMART information off your storage device and records it in a database (or anywhere you see fit, really). The smartmontools package is the one I usually reach for to display storage device info.

You can install it like this:

Open Terminal (if it's not already open)
Install the smartmontools package:
```
sudo apt install smartmontools
```

Query a storage medium, for example, an NVMe device:

sudo smartctl --all /dev/nvme0n1

This will give you a lot of information:

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-17-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SAMSUNG MZVLW512HMJP-000L7
Serial Number:                      S359NX0K103156
Firmware Version:                   7L7QCXY7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 512,110,190,592 [512 GB]
Unallocated NVM Capacity:           0
Controller ID:                      2
NVMe Version:                       1.2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
Namespace 1 Utilization:            81,254,830,080 [81.2 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 b181b5c4a3
Local Time is:                      Thu May 27 21:57:29 2021 JST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Warning  Comp. Temp. Threshold:     69 Celsius
Critical Comp. Temp. Threshold:     72 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.60W       -        -    0  0  0  0        0       0
 1 +     6.00W       -        -    1  1  1  1        0       0
 2 +     5.10W       -        -    2  2  2  2        0       0
 3 -   0.0400W       -        -    3  3  3  3      210    1500
 4 -   0.0050W       -        -    4  4  4  4     2200    6000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        33 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    1%
Data Units Read:                    20,937,566 [10.7 TB]
Data Units Written:                 26,780,407 [13.7 TB]
Host Read Commands:                 359,002,242
Host Write Commands:                683,010,154
Controller Busy Time:               5,130
Power Cycles:                       1,027
Power On Hours:                     3,812
Unsafe Shutdowns:                   85
Media and Data Integrity Errors:    0
Error Information Log Entries:      719
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               33 Celsius
Temperature Sensor 2:               39 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0        719     0  0x0008  0x4004      -            0     0     -
  1        718     0  0x0008  0x4004      -            0     0     -
  2        717     0  0x0008  0x4004      -            0     0     -
  3        716     0  0x0008  0x4004      -            0     0     -
  4        715     0  0x0008  0x4004      -            0     0     -
  5        714     0  0x0008  0x4004      -            0     0     -
  6        713     0  0x0008  0x4004      -            0     0     -
  7        712     0  0x0008  0x4004      -            0     0     -
  8        711     0  0x0008  0x4004      -            0     0     -
  9        710     0  0x0008  0x4004      -            0     0     -
 10        709     0  0x0008  0x4004      -            0     0     -
 11        708     0  0x0008  0x4004      -            0     0     -
 12        707     0  0x0008  0x4004      -            0     0     -
 13        706     0  0x0008  0x4004      -            0     0     -
 14        705     0  0x0008  0x4004      -            0     0     -
 15        704     0  0x0008  0x4004      -            0     0     -
... (48 entries not read)

This is probably a bit too much information, so you can get just the error counts like this:

sudo smartctl -l error /dev/nvme0n1

The above command returns the same output as seen in the "Error Information" section from the previous command. Note that NVMe devices will return at most 16 entries by default. If you are querying an NVMe device that has more, you can specify the number of entries to return like this:

sudo smartctl -l error,64 /dev/nvme0n1

For my device, there are 64 flash storage chips in total, so I would add ,64 to the command above. You can show information for up to 256 entries.

Hope this gives you a wealth of information to play with and track.

SMART - Seek Error Rate

Related

Recent Posts