Cannot shrink btrfs filesystem although there is still data and metadata space left : ERROR: unable to resize '/home': No space left on device

I cannot shrink btrfs filesystem although there is still data and metadata space left :

$ sudo btrfs filesystem resize -11G /home;echo $?
Resize '/home' of '-11G'
ERROR: unable to resize '/home': No space left on device
1

Here are some btrfs filesystem info about /home :

$ sudo btrfs filesystem df /home | column -t
Data,           single:  total=92.01GiB,   used=80.68GiB
System,         DUP:     total=8.00MiB,    used=16.00KiB
System,         single:  total=4.00MiB,    used=0.00B
Metadata,       DUP:     total=1.00GiB,    used=631.41MiB
Metadata,       single:  total=8.00MiB,    used=0.00B
GlobalReserve,  single:  total=224.00MiB,  used=0.00B
$ sudo btrfs filesystem show /home
Label: none  uuid: c7ee56a8-ef45-46c8-86d1-13879201a1e7
    Total devices 1 FS bytes used 81.30GiB
    devid    1 size 100.00GiB used 94.04GiB path /dev/mapper/home_VG-home

$ sudo btrfs filesystem usage -T /home
Overall:
    Device size:         100.00GiB
    Device allocated:         94.04GiB
    Device unallocated:        5.96GiB
    Device missing:          0.00B
    Used:             81.91GiB
    Free (estimated):         17.29GiB  (min: 14.31GiB)
    Data ratio:               1.00
    Metadata ratio:           1.99
    Global reserve:      224.00MiB  (used: 0.00B)

             Data     Metadata Metadata  System  System              
Id Path      single   single   DUP       single  DUP      Unallocated
-- --------- -------- -------- --------- ------- -------- -----------
 1 /dev/dm-0 92.01GiB  8.00MiB   2.00GiB 4.00MiB 16.00MiB     5.96GiB
-- --------- -------- -------- --------- ------- -------- -----------
   Total     92.01GiB  8.00MiB   1.00GiB 4.00MiB  8.00MiB     5.96GiB
   Used      80.68GiB    0.00B 631.41MiB   0.00B 16.00KiB            

and here the output of dmesg :

$ dmesg | tail -11
[44202.411949] BTRFS info (device dm-0): new size for /dev/dm-0 is 97706311680
[44202.412156] BTRFS info (device dm-0): relocating block group 120288444416 flags 1
[44208.119721] BTRFS info (device dm-0): relocating block group 119214702592 flags 1
[44211.611669] BTRFS info (device dm-0): relocating block group 118140960768 flags 1
[44212.495603] BTRFS info (device dm-0): relocating block group 117067218944 flags 1
[44213.006830] BTRFS info (device dm-0): relocating block group 95592382464 flags 1
[44216.613870] BTRFS info (device dm-0): relocating block group 120288444416 flags 1
[44222.780073] BTRFS info (device dm-0): relocating block group 119214702592 flags 1
[44225.843279] BTRFS info (device dm-0): relocating block group 118140960768 flags 1
[44226.575236] BTRFS info (device dm-0): relocating block group 117067218944 flags 1
[44226.930918] BTRFS info (device dm-0): relocating block group 95592382464 flags 1

EDIT1 : The btrfs balance failed :

$ sudo btrfs balance start /home
ERROR: error during balancing '/home': No space left on device
There may be more info in syslog - try dmesg | tail

There nothing in dmesg | tail about it.

EDIT2 : I had to do the following to be able to start the btrfs balance :

$ sudo btrfs balance start -musage=0 -dusage=0 -v /home
Dumping filters: flags 0x7, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=0
  SYSTEM (flags 0x2): balancing, usage=0
  DATA (flags 0x2): balancing, usage=0
Done, had to relocate 0 out of 95 chunks

EDIT3 : The btrfs balance has ran for 68 minutes and then failed :

$ time sudo btrfs balance start -v /home 
Dumping filters: flags 0x7, state 0x0, force is off
  DATA (flags 0x0): balancing
  METADATA (flags 0x0): balancing
  SYSTEM (flags 0x0): balancing
ERROR: error during balancing '/home': Input/output error
There may be more info in syslog - try dmesg | tail

real    68m10.221s
user    0m0.008s
sys     4m20.236s

Here is what dmesg shows :

[74421.794756] ata2.00: exception Emask 0x0 SAct 0xc00 SErr 0x0 action 0x0
[74421.794766] ata2.00: irq_stat 0x40000001
[74421.794773] ata2.00: failed command: READ FPDMA QUEUED
[74421.794783] ata2.00: cmd 60/08:50:48:96:f8/00:00:25:00:00/40 tag 10 ncq 4096 in
[74421.794783]          res 41/40:08:48:96:f8/00:00:25:00:00/40 Emask 0x409 (media error) <F>
[74421.794788] ata2.00: status: { DRDY ERR }
[74421.794791] ata2.00: error: { UNC }
[74421.794794] ata2.00: failed command: READ FPDMA QUEUED
[74421.794802] ata2.00: cmd 60/10:58:40:af:ed/00:00:20:00:00/40 tag 11 ncq 8192 in
[74421.794802]          res 41/40:58:48:96:f8/00:00:25:00:00/40 Emask 0x9 (media error)
[74421.794806] ata2.00: status: { DRDY ERR }
[74421.794809] ata2.00: error: { UNC }
[74421.798253] ata2.00: configured for UDMA/100
[74421.798303] sd 1:0:0:0: [sdb] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[74421.798315] sd 1:0:0:0: [sdb] tag#10 Sense Key : Medium Error [current] [descriptor] 
[74421.798326] sd 1:0:0:0: [sdb] tag#10 Add. Sense: Unrecovered read error - auto reallocate failed
[74421.798337] sd 1:0:0:0: [sdb] tag#10 CDB: Read(10) 28 00 25 f8 96 48 00 00 08 00
[74421.798344] blk_update_request: I/O error, dev sdb, sector 637048392
[74421.798366] BTRFS error (device dm-0): bdev /dev/dm-0 errs: wr 38, rd 451, flush 0, corrupt 0, gen 0
[74421.798425] sd 1:0:0:0: [sdb] tag#11 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[74421.798435] sd 1:0:0:0: [sdb] tag#11 Sense Key : Medium Error [current] [descriptor] 
[74421.798444] sd 1:0:0:0: [sdb] tag#11 Add. Sense: Unrecovered read error - auto reallocate failed
[74421.798453] sd 1:0:0:0: [sdb] tag#11 CDB: Read(10) 28 00 20 ed af 40 00 00 10 00
[74421.798459] blk_update_request: I/O error, dev sdb, sector 552447808
[74421.798523] ata2: EH complete

EDIT 4 : I'm actually using /dev/sdb :

$ sudo smartctl -a /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.4.0-143-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Toshiba 2.5" HDD MQ01ABD...
Device Model:     TOSHIBA MQ01ABD100
Serial Number:    84EWT2U5T
LU WWN Device Id: 5 000039 5b1f852cb
Firmware Version: AX1P4M
User Capacity:    1 000 204 886 016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Mon Apr  1 23:34:41 2019 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  120) seconds.
Offline data collection
capabilities:            (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 243) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       1735
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       5639
  5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       8259
 10 Spin_Retry_Count        0x0033   212   100   030    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       5623
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       563
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       203
193 Load_Cycle_Count        0x0032   099   099   000    Old_age   Always       -       17892
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       23 (Min/Max 10/46)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       9200
198 Offline_Uncorrectable   0x0030   001   001   000    Old_age   Offline      -       255
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       0
222 Loaded_Hours            0x0032   080   080   000    Old_age   Always       -       8117
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       177
240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 1029 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1029 occurred at disk power-on lifetime: 8257 hours (344 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 50 48 96 f8 40  Error: UNC at LBA = 0x00f89648 = 16291400

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 10 58 40 af ed 40 00      03:13:20.172  READ FPDMA QUEUED
  60 08 50 48 96 f8 40 00      03:13:16.469  READ FPDMA QUEUED
  60 08 48 40 96 f8 40 00      03:13:16.469  READ FPDMA QUEUED
  60 08 40 38 96 f8 40 00      03:13:16.469  READ FPDMA QUEUED
  60 08 38 30 96 f8 40 00      03:13:16.469  READ FPDMA QUEUED

Error 1028 occurred at disk power-on lifetime: 8257 hours (344 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 70 48 96 f8 40  Error: UNC at LBA = 0x00f89648 = 16291400

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 10 70 78 90 f8 40 00      03:13:11.731  READ FPDMA QUEUED
  60 d0 68 a8 89 f8 40 00      03:13:11.731  READ FPDMA QUEUED
  61 e0 60 60 aa 0b 40 00      03:13:11.727  WRITE FPDMA QUEUED
  61 00 58 60 a2 0b 40 00      03:13:11.723  WRITE FPDMA QUEUED
  61 00 50 60 9a 0b 40 00      03:13:11.625  WRITE FPDMA QUEUED

Error 1027 occurred at disk power-on lifetime: 8133 hours (338 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 c0 f8 bd 51 40  Error: UNC at LBA = 0x0051bdf8 = 5357048

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 58 e0 70 fa 40 40 00      00:18:59.971  READ FPDMA QUEUED
  61 08 d8 d8 45 2b 40 00      00:18:59.971  WRITE FPDMA QUEUED
  61 08 d0 d0 78 6b 40 00      00:18:59.971  WRITE FPDMA QUEUED
  61 08 c8 18 42 2b 40 00      00:18:59.971  WRITE FPDMA QUEUED
  60 08 c0 f8 bd 51 40 00      00:18:59.971  READ FPDMA QUEUED

Error 1026 occurred at disk power-on lifetime: 8133 hours (338 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 f8 bd 51 40  Error: WP at LBA = 0x0051bdf8 = 5357048

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 38 10 28 5f 6b 40 00      00:18:55.963  WRITE FPDMA QUEUED
  61 08 08 68 85 6f 40 00      00:18:55.963  WRITE FPDMA QUEUED
  60 00 00 f0 bd 51 40 00      00:18:55.946  READ FPDMA QUEUED
  60 00 f0 80 75 56 40 00      00:18:55.944  READ FPDMA QUEUED
  60 00 e8 80 73 56 40 00      00:18:55.930  READ FPDMA QUEUED

Error 1025 occurred at disk power-on lifetime: 8119 hours (338 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 b8 f8 7f 48 40  Error: UNC at LBA = 0x00487ff8 = 4751352

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 b8 f8 7f 48 40 00      01:10:35.049  READ FPDMA QUEUED
  ea 00 00 00 00 00 a0 00      01:10:35.017  FLUSH CACHE EXT
  61 08 98 88 4b cb 40 00      01:10:35.017  WRITE FPDMA QUEUED
  61 08 70 98 c1 0c 40 00      01:10:35.017  WRITE FPDMA QUEUED
  61 08 60 a0 45 cb 40 00      01:10:35.017  WRITE FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      6780         -
# 2  Short offline       Completed without error       00%         1         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

$ echo $?
64

and dmesg reports 2 bad sectors during the last btrfs balance operation :

$ dmesg | grep I/O.error.*sector
[74421.798344] blk_update_request: I/O error, dev sdb, sector 637048392
[74421.798459] blk_update_request: I/O error, dev sdb, sector 552447808

Remapped those the bad sectors :

$ dmesg | grep I/O.error.*sector | awk '/sector/{print "sudo hdparm --yes-i-know-what-i-am-doing --repair-sector "$NF" /dev/sdb"}' | sh -x
+ sudo hdparm --yes-i-know-what-i-am-doing --repair-sector 637048392 /dev/sdb

/dev/sdb:
re-writing sector 637048392: succeeded
+ sudo hdparm --yes-i-know-what-i-am-doing --repair-sector 552447808 /dev/sdb

/dev/sdb:
re-writing sector 552447808: succeeded

EDIT 5 : It seems this command was enough to have more than 11G unallocated :

$ sudo btrfs balance start -musage=0 -dusage=0 -v /home
Dumping filters: flags 0x7, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=0
  SYSTEM (flags 0x2): balancing, usage=0
  DATA (flags 0x2): balancing, usage=0
Done, had to relocate 0 out of 95 chunks

The btrfs filesystem resize succeeded. (I'm sorry, I've lost the output of the btrfs filesystem resize)


Solution 1:

You're requesting the volume to shrink by 11GB, yet you only have about 6GB unallocated.

You can more efficiently use allocated extents by rebalancing the volume. Executing a command similar to btrfs balance start /home will start that process, and it may take some time to complete.

But I don't know if that will free up enough for a large amount of shrinkage.