EXT4-fs error after Ubuntu 17.04 upgrade

I have a Dell XPS 15 9550. I've been running Ubuntu 16.10 on it for four months with no dramas.

Two days ago, I upgraded to Ubuntu 17.04. About an hour after upgrading, my hard-drive remounted into read-only mode. When I jumped to a tty screen, this appeared:

[ 746.341551] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #525023: comm NetworkManager: reading directory iblock 0
[ 746.343318] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #524289: comm pool: reading directory iblock 0
[ 746.356125] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272213: comm systemd-udevd: reading directory iblock 0
[ 746.356139] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272210: comm systemd-udevd: reading directory iblock 0
[ 746.356332] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272193: comm systemd-udevd: reading directory iblock 0
[ 746.356338] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272825: comm systemd-udevd: reading directory iblock 0
[ 746.356400] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272210: comm systemd-udevd: reading directory iblock 0
[ 746.474632] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #524539: comm unity-settings-: reading directory iblock 0
[ 746.992814] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #5506108: comm BrowserBlocking: reading directory iblock 0
[ 746.304451] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #5506117: comm BrowserBlocking: reading directory iblock 0

Here's what fdisk -l shows:

Disk /dev/nvme0n1: 477 GiB, 512110190592 bytes, 1000215216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 3CD27380-DAC8-48DC-910A-D084CE857DA3

Device             Start        End   Sectors   Size Type
/dev/nvme0n1p1      2048    1026047   1024000   500M EFI System
/dev/nvme0n1p2   1026048    1288191    262144   128M Microsoft reserved
/dev/nvme0n1p3   1288192  487948287 486660096 232.1G Microsoft basic data
/dev/nvme0n1p4 972302336  973223935    921600   450M Windows recovery environmen
/dev/nvme0n1p5 973223936  998094847  24870912  11.9G Windows recovery environmen
/dev/nvme0n1p6 998094848 1000204287   2109440     1G Windows recovery environmen
/dev/nvme0n1p7 487948288  939046911 451098624 215.1G Linux filesystem
/dev/nvme0n1p8 939046912  972302335  33255424  15.9G Linux swap

Partition table entries are not in disk order.

I rebooted, and continued to get the error around once an hour. So I reinstalled Ubuntu 17.04 from scratch. However I am still getting the same issue.

I tried running fsck by creating a /forcefsck file (I created a wrapper shell script that adds the -v flag and outputs stdout to a file). Here's the result:

fsck.fat 4.0 (2016-05-06)                               
Checking we can access the last sector of the filesystem
Boot sector contents:                                   
System ID "MSDOS5.0"                                    
Media byte 0xf8 (hard disk)                             
       512 bytes per logical sector                     
      4096 bytes per cluster                            
      6206 reserved sectors                             
First FAT starts at byte 3177472 (sector 6206)          
         2 FATs, 32 bit entries                         
    508416 bytes per FAT (= 993 sectors)                
Root directory start at cluster 2 (arbitrary size)      
Data area starts at byte 4194304 (sector 8192)          
    126976 data clusters (520093696 bytes)              
63 sectors/track, 255 heads                             
      2048 hidden sectors                               
   1024000 sectors total                                
Reclaiming unconnected clusters.                        
Checking free cluster summary.                          
/dev/nvme0n1p1: 212 files, 15526/126976 clusters    

I tried booting from a live USB and running e2fsck -p /dev/nvme0n1p7 as suggested here (https://askubuntu.com/a/768813/679041). It didn't give any errors.

I also tried to run smartctl -t long /dev/nvme0n1p7 however the results seem to indicate that the tool doesn't work with my particular SSD:

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.10.0-19-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       PM951 NVMe SAMSUNG 512GB
Serial Number:                      S29PNX0H611013
Firmware Version:                   BXV77D0Q
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
Namespace 1 Utilization:            254,982,533,120 [254 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Mon Apr 17 17:45:48 2017 AEST
Firmware Updates (0x06):            3 Slots
Optional Admin Commands (0x0017):   Security Format Frmw_DL *Other*
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.00W       -        -    0  0  0  0        5       5
 1 +     4.20W       -        -    1  1  1  1       30      30
 2 +     3.10W       -        -    2  2  2  2      100     100
 3 -   0.0700W       -        -    3  3  3  3      500    5000
 4 -   0.0050W       -        -    4  4  4  4     2000   22000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
Read NVMe SMART/Health Information failed: NVMe Status 0x2002

Any idea of why this issue might be occuring and how I might solve it? Thanks! :)


Solution 1:

As pointed out in a comment by Elder Geek, this is due to a known bug.

From the bug report:

APST support just landed in the latest Zesty kernel (4.10.0-14.16) as part of https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1664602. That patch has a quirk for certain 256GB Samsung drives found in Dell laptops that do not behave well when APST is enabled. I am experiencing the same symptoms with the same model laptop except with a 512GB Samsung. Prior to manually disabling APST the drive would die and system would go down in flames with I/O errors within 20 to 40 minutes of boot.

Until a proper fix is implemented, a workaround is suggested, which involves adding a kernel parameter:

Please try nvme_core.default_ps_max_latency_us=5500, if the issue persists, please try nvme_core.default_ps_max_latency_us=200.

To add a kernel boot parameter, edit the configuration file for GRUB:

sudo nano /etc/default/grub

Find the line beginning GRUB_CMDLINE_LINUX_DEFAULT and add the boot parameter to the others already between the quotes. For example, in this case you will probably end up with

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=5500"

Save the file and exit, then to make the change effective, run

sudo update-grub 

Solution 2:

First, I'd visit the Samsung support web site and assure that you've got the latest firmware installed for your model SSD.

Then, your fsck didn't make a whole lot of sense, so do it this way...

To check the file system on your Ubuntu partition...

  • boot to the GRUB menu
  • choose Advanced Options
  • choose Recovery mode
  • choose Root access
  • at the # prompt, type sudo fsck -f /
  • repeat the fsck command if there were errors
  • type reboot