LVM: "Couldn't find device with uuid" yet blkid finds the UUID

I have a SLES 11.2 PPC (3.0.58-0.6.6-ppc64) system that lost track of its volume group (containing LVs with data that is not critical but would be rather nice to get back). The disks are connected via two fibre paths from a SAN.

The problem started when I rebooted it before a planned power outage last Friday. I didn't have time to troubleshoot before bringing it down again. The volume group had previously been used successfully for about two years.

vgscan and pvscan return nothing:

# pvscan -vP
Partial mode. Incomplete logical volumes will be processed.
 Wiping cache of LVM-capable devices
 Wiping internal VG cache
 Walking through all physical volumes
No matching physical volumes found
# vgscan -vP
Partial mode. Incomplete logical volumes will be processed.
  Wiping cache of LVM-capable devices
  Wiping internal VG cache
Reading all physical volumes.  This may take a while...
  Finding all volume groups
No volume groups found

vgcfgrestore reports it can't find the PVs:

# vgcfgrestore vgclients
Couldn't find device with uuid PyKfIa-cCs9-gBoh-Qb50-yOw4-dHQw-N1YELU.
Couldn't find device with uuid FXfSAO-P9hO-Dgtl-0Ihf-x2jX-TnHU-kSqUA2.
Cannot restore Volume Group vgclients with 2 PVs marked as missing.
Restore failed.

Yet blkid can find those UUIDs:

# blkid -t UUID=PyKfIa-cCs9-gBoh-Qb50-yOw4-dHQw-N1YELU
/dev/mapper/3600a0b800029df24000011084db97741: UUID="PyKfIa-cCs9-gBoh-Qb50-yOw4-dHQw-N1YELU" TYPE="LVM2_member" 
/dev/sdl: UUID="PyKfIa-cCs9-gBoh-Qb50-yOw4-dHQw-N1YELU" TYPE="LVM2_member" 
/dev/sdw: UUID="PyKfIa-cCs9-gBoh-Qb50-yOw4-dHQw-N1YELU" TYPE="LVM2_member" 
# blkid -t UUID=FXfSAO-P9hO-Dgtl-0Ihf-x2jX-TnHU-kSqUA2
/dev/mapper/3600a0b800029df24000017ae4f45f30b: UUID="FXfSAO-P9hO-Dgtl-0Ihf-x2jX-TnHU-kSqUA2" TYPE="LVM2_member" 
/dev/sdg: UUID="FXfSAO-P9hO-Dgtl-0Ihf-x2jX-TnHU-kSqUA2" TYPE="LVM2_member" 
/dev/sdr: UUID="FXfSAO-P9hO-Dgtl-0Ihf-x2jX-TnHU-kSqUA2" TYPE="LVM2_member" 

/etc/lvm/backup/vgclients has all the right info and does not say the PVs are missing:

# egrep "(N1YELU|kSqUA2|dm-|ALLOC)" /etc/lvm/backup/vgclients
                    id = "PyKfIa-cCs9-gBoh-Qb50-yOw4-dHQw-N1YELU"
                    device = "/dev/dm-7"    # Hint only
                    status = ["ALLOCATABLE"]
                    id = "FXfSAO-P9hO-Dgtl-0Ihf-x2jX-TnHU-kSqUA2"
                    device = "/dev/dm-12"   # Hint only
                    status = ["ALLOCATABLE"]

I confirmed on the SAN that the volumes dedicated (and named) for LVM on this server are presented, and the identifier (ends in f30b or 7741) matches on the SAN and on the server:

# multipath -ll | egrep -A5 "(f30b|7741)"
3600a0b800029df24000017ae4f45f30b dm-7 IBM,1814      FAStT
size=575G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| `- 6:0:0:1   sdr  65:16  active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 5:0:0:1   sdg  8:96   active ghost running
--
3600a0b800029df24000011084db97741 dm-12 IBM,1814      FAStT
size=834G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| `- 5:0:0:7   sdl  8:176  active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 6:0:0:7   sdw  65:96  active ghost running

Neither device has a partition table (by design):

# fdisk -l /dev/dm-7 /dev/dm-12 | grep table
Disk /dev/dm-7 doesn't contain a valid partition table
Disk /dev/dm-12 doesn't contain a valid partition table

I am able to read from the devices directly:

# dd if=/dev/dm-7 of=/tmp/a bs=1024 count=1
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 0.00121051 s, 846 kB/s
# strings /tmp/a
LABELONE
LVM2 001FXfSAOP9hODgtl0Ihfx2jXTnHUkSqUA2

I have tried rebooting and also deleting sd(r|g|l|w) and dm-(7|12) and rescanning, to no effect.

I've tried recreating the PV using the backup values, but it still says it can't find them.

# pvcreate --uuid "PyKfIa-cCs9-gBoh-Qb50-yOw4-dHQw-N1YELU" --restorefile /etc/lvm/backup/vgclients /dev/mapper/3600a0b800029df24000011084db97741 -t
  Test mode: Metadata will NOT be updated and volumes will not be (de)activated.
  Couldn't find device with uuid PyKfIa-cCs9-gBoh-Qb50-yOw4-dHQw-N1YELU.
  Couldn't find device with uuid FXfSAO-P9hO-Dgtl-0Ihf-x2jX-TnHU-kSqUA2.
  Device /dev/mapper/3600a0b800029df24000011084db97741 not found (or ignored by filtering).

Here is my lvm.conf, though as far as I know the only change I've made is to increase the log level:

# egrep -v "^( *#|$)" /etc/lvm/lvm.conf
devices {
    dir = "/dev"
    scan = [ "/dev" ]
    preferred_names = [ ]
    filter = [ "a|^/dev/sda$|", "r/.*/" ]
    cache = "/etc/lvm/.cache"
    write_cache_state = 1
    sysfs_scan = 1      
    md_component_detection = 1
    ignore_suspended_devices = 0
}
log {
    verbose = 0
    syslog = 1
    overwrite = 0
    level = 2

    indent = 1
    command_names = 0
    prefix = "  "
}
backup {
    backup = 1
    backup_dir = "/etc/lvm/backup"
    archive = 1
    archive_dir = "/etc/lvm/archive"

    retain_min = 10
    retain_days = 30
}
shell {
    history_size = 100
}
global {

    umask = 077
    test = 0
    units = "h"
    activation = 1
    proc = "/proc"
    locking_type = 3
    fallback_to_clustered_locking = 1
    fallback_to_local_locking = 1
    locking_dir = "/var/run/lvm/lock"
}
activation {
    missing_stripe_filler = "/dev/ioerror"
    reserved_stack = 256
    reserved_memory = 8192
    process_priority = -18
    mirror_region_size = 512
    readahead = "auto"
    mirror_log_fault_policy = "allocate"
    mirror_device_fault_policy = "remove"

    udev_rules = 1
    udev_sync = 1
}
dmeventd {
    mirror_library = "libdevmapper-event-lvm2mirror.so"
    snapshot_library = "libdevmapper-event-lvm2snapshot.so"
}

So what gives? Where'd my VG go and how do I get it back?


Solution 1:

A document in the Novell knowledge base seems to apply here: It explains that on SLES, LVM by default doesn't scan multipath devices, and so will never see them in this situation.

To resolve the issue, you can implement the workaround Novell gives:

In /etc/lvm.conf in the devices section, change the filter to read:

filter = [ "a|/dev/sda.*|", "a|/dev/disk/by-id/dm-uuid-.*mpath-.*|", "r|.*|"]

(This is for SLES 11. For other versions, see the KB article linked.)

Solution 2:

vgreduce --removemissing helped in my case when I was having the same issue.

This is how I created this issue.

I was going to extend an existing logical volume but midway through that that from another terminal I run fdisk on the device that was going to be used for extension. It had some previous partition table that I thought of deleting. Since I had already added this disk to the volume group (just not extended the logical volume itself yet) but that physical volume no more existed so the issue of missing UUID happened.

To solve that I rand the vgreduce --removemissing and then it was ok.