SMART - Seek Error Rate

I have read that seek errors is an incremented count of track seeks and that the count resets to zero after a fixed number of thousands of seek commands. This is evident in some of the BackBlaze hard drives (see Figure 1 below).

Figure 1

In Figure 1 the seek rate for the hard drive increases up to and including day 234. The count is then reset on day 235.

Is this incremental count just the total time that the drive has taken to locate a specific piece of stored data?

Does anyone know why this count is reset and if it means anything? I.e. does resetting just reset the count or does it perhaps mean that the disks seek rate is restored to as good as new at day 235?

Figure2

I am wondering if I can visualise the seek error rate as in Figure 2. Figure 2 (if my understanding is correct) is the total time that the drive has taken to locate a specific piece of stored data without the count rest at day 235. If the count reset does not improve the health of the disk, or if it does not affect the seek rate after the count is reset, then I guess this is fine.


The counters are reset like an odometer rolling over after running out of integers. Many device controllers will have different thresholds, but a 0 count does not mean that the drive is without errors, just as a vehicle with 1,000,010km on the odometer is not "fresh off the assembly line".

If you would like to build a graph as seen in Figure 2, you could write a little data collection utility that reads the SMART information off your storage device and records it in a database (or anywhere you see fit, really). The smartmontools package is the one I usually reach for to display storage device info.

You can install it like this:

  1. Open Terminal (if it's not already open)

  2. Install the smartmontools package:

    sudo apt install smartmontools
    
  3. Query a storage medium, for example, an NVMe device:

    sudo smartctl --all /dev/nvme0n1
    

    This will give you a lot of information:

    smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-17-generic] (local build)
    Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Number:                       SAMSUNG MZVLW512HMJP-000L7
    Serial Number:                      S359NX0K103156
    Firmware Version:                   7L7QCXY7
    PCI Vendor/Subsystem ID:            0x144d
    IEEE OUI Identifier:                0x002538
    Total NVM Capacity:                 512,110,190,592 [512 GB]
    Unallocated NVM Capacity:           0
    Controller ID:                      2
    NVMe Version:                       1.2
    Number of Namespaces:               1
    Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
    Namespace 1 Utilization:            81,254,830,080 [81.2 GB]
    Namespace 1 Formatted LBA Size:     512
    Namespace 1 IEEE EUI-64:            002538 b181b5c4a3
    Local Time is:                      Thu May 27 21:57:29 2021 JST
    Firmware Updates (0x16):            3 Slots, no Reset required
    Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
    Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
    Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
    Warning  Comp. Temp. Threshold:     69 Celsius
    Critical Comp. Temp. Threshold:     72 Celsius
    
    Supported Power States
    St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
     0 +     7.60W       -        -    0  0  0  0        0       0
     1 +     6.00W       -        -    1  1  1  1        0       0
     2 +     5.10W       -        -    2  2  2  2        0       0
     3 -   0.0400W       -        -    3  3  3  3      210    1500
     4 -   0.0050W       -        -    4  4  4  4     2200    6000
    
    Supported LBA Sizes (NSID 0x1)
    Id Fmt  Data  Metadt  Rel_Perf
     0 +     512       0         0
    
    === START OF SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    SMART/Health Information (NVMe Log 0x02)
    Critical Warning:                   0x00
    Temperature:                        33 Celsius
    Available Spare:                    100%
    Available Spare Threshold:          10%
    Percentage Used:                    1%
    Data Units Read:                    20,937,566 [10.7 TB]
    Data Units Written:                 26,780,407 [13.7 TB]
    Host Read Commands:                 359,002,242
    Host Write Commands:                683,010,154
    Controller Busy Time:               5,130
    Power Cycles:                       1,027
    Power On Hours:                     3,812
    Unsafe Shutdowns:                   85
    Media and Data Integrity Errors:    0
    Error Information Log Entries:      719
    Warning  Comp. Temperature Time:    0
    Critical Comp. Temperature Time:    0
    Temperature Sensor 1:               33 Celsius
    Temperature Sensor 2:               39 Celsius
    
    Error Information (NVMe Log 0x01, 16 of 64 entries)
    Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
      0        719     0  0x0008  0x4004      -            0     0     -
      1        718     0  0x0008  0x4004      -            0     0     -
      2        717     0  0x0008  0x4004      -            0     0     -
      3        716     0  0x0008  0x4004      -            0     0     -
      4        715     0  0x0008  0x4004      -            0     0     -
      5        714     0  0x0008  0x4004      -            0     0     -
      6        713     0  0x0008  0x4004      -            0     0     -
      7        712     0  0x0008  0x4004      -            0     0     -
      8        711     0  0x0008  0x4004      -            0     0     -
      9        710     0  0x0008  0x4004      -            0     0     -
     10        709     0  0x0008  0x4004      -            0     0     -
     11        708     0  0x0008  0x4004      -            0     0     -
     12        707     0  0x0008  0x4004      -            0     0     -
     13        706     0  0x0008  0x4004      -            0     0     -
     14        705     0  0x0008  0x4004      -            0     0     -
     15        704     0  0x0008  0x4004      -            0     0     -
    ... (48 entries not read)
    

    This is probably a bit too much information, so you can get just the error counts like this:

    sudo smartctl -l error /dev/nvme0n1
    

    The above command returns the same output as seen in the "Error Information" section from the previous command. Note that NVMe devices will return at most 16 entries by default. If you are querying an NVMe device that has more, you can specify the number of entries to return like this:

    sudo smartctl -l error,64 /dev/nvme0n1
    

    For my device, there are 64 flash storage chips in total, so I would add ,64 to the command above. You can show information for up to 256 entries.

Hope this gives you a wealth of information to play with and track.