Bootable but kinda broken SSD -> Cloned SSD is not bootable
I have a 120GB SSD with Ubuntu installed that seems to be slowly failing. Every once in a while the root filesystem is mounted read-only and I need to run fsck in some sort of recovery console.
So I used CloneZilla 2.7.2-39 to clone the entire SSD to a bigger one (500GB). I used the default "Expert" opions except that I unchecked -r
("Resize the filesystem to fit partition size of the target system") and I checked -rescue
("Continue reading next one when disk blocks read errors") because there are blocks that cannot be read.
The cloning always works without errors at first, and at some point no further blocks can be read until the end of the cloning process. The point where it stops working seems to be different from reboot to reboot but it's always somehwere in the middle of the second partition.
After cloning I inspected the cloned SSD on Windows:
- Disk management:
Detects the drive and reports the expected partitions (a 512MB EFI-partition, a ~120GB partition, and the rest is unpartitioned space). It can't do anything with the partitions, of course, because it doesn't understand ext4, but that's to be expected. - wmic:
wmic:root\cli>partition
BlockSize | Bootable | Description | DeviceID | DiskIndex | Index | NumberOfBlocks | PrimaryPartition | Size | StartingOffset | Type |
---|---|---|---|---|---|---|---|---|---|---|
512 | TRUE | GPT: System | Disk #2, Partition #0 | 2 | 0 | 1048576 | TRUE | 536870912 | 1048576 | GPT: System |
512 | FALSE | GPT: Unbekannt | Disk #2, Partition #1 | 2 | 1 | 233402368 | FALSE | 119502012416 | 537919488 | GPT: Unknown |
Basically confirms what the disk management shows, but it also lists the exact sizes and they seem to make sense.
... and with the CloneZilla shell:
- lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 465.8G 0 disk
├─sda1 8:1 0 512M 0 part
└─sda2 8:2 0 111.3G 0 part
Seems legitimate so far.
But the cloned SSD is not bootable. It's not recognized as a bootable medium. (See update 1)
-
fsck /dev/sda1
shows no errors. -
fsck /dev/sda2
shows a ton of errors (as expected, since a lot of the blocks could not be read). I tried both leaving the errors and fixing them (which seems to work, as a second run shows no errors, then). -
fsck /dev/sda
shows "ext2fs_open2: Bad magic number in super-block [etc...]" with no option to fix it, and the suggested e2fsck commands also can't fix it.
My questions:
How is it possible that the original SSD boots fine but the clone doesn't? If the data can be read when booting, why can it not be read when cloning?
And how can I clone the SSD in such a way that the clone is also bootable? (If corrupted files end up staying corrupted or being removed by e.g. fsck, that's fine.)
Before you ask "Why?":
It's Ubuntu 20.04 18.04.1 LTS with a bunch of configuration already done. At the time I did not write down the exact steps of what was configured, because it was a very messy process to get everything to work. Rather than setting up a completely new OS (I tried that, short story is "nothing works") I would rather keep a backup of this slightly corrupted but otherwise working one.
Update 1:
The EFI partition on the cloned SSD has the same UUID as on the original SSD. Here's a screenshot of some data of the cloned SSD:
The cloned SSD actually is recognized as a boot option. But when I boot from it, I end up in a grub shell.
Update 2:
In the grub shell of the original SSD:ls
: (hd0) (hd0,gpt2) (hd0,gpt1)
echo $prefix
: (hd0,gpt2)/boot/grub
ls $prefix
: gfxblacklist.txt unicode.pf2 x86_64-efi/ locale/ fonts/ grubenv grub.cfg
configfile $prefix/grub.cfg
: The screen turns black and as far as I can tell, nothing else happens for at least 2 minutes. Pressing Ctrl+C or Esc does not have any effect. I cut power before waiting longer.set
:
From CloneZilla I can mount the first partition (to "/foo") and access the files. I found a grub.cfg, but the path seems different: "/foo/EFI/ubuntu/grub.cfg"
Contents:search.fs_uuid 64702138-591a-4535-8e60-2e2348477870 root hd2,gpt2
set prefix=($root)'/boot/grub'
configfile $prefix/grub.cfg
Not sure if that helps. The referred UUID is not the UUID of the second partition, btw! That would be bf07a56c-4d8d-9952-2bd16756d2b7.
In the grub shell of the clone:ls
: (hd0) (hd0,gpt2) (hd0,gpt1)
echo $prefix
: (hd0,gpt2)/boot/grub
ls $prefix
: error: directory is encrypted.
set
:
There are also a few lines at the top that don't fit on the screen. I can't find a way to scroll up, though. "Page Up" and "Page Down" don't work.
Solution 1:
UEFI booting relies on registering EFI executable as boot entry into its non-volatile memory to determine what to boot (unless the file is put under the "fallback" path, which is not the default case for grub-install
). The registration / boot entry in turn relies on partition (not filesystem) UUID to determine which partition/filesystem it should look at to find the executable with the specified path.
Assuming CloneZila is "smart" enough to (or it just unintentionally did because it performed partition clonings instead of disk cloning) change the partition UUIDs in the partition table to avoid UUID collision, the UEFI will no longer be able to find the EFI executable (i.e. the bootloader; probably grub; well, or shim) that was registered to it.
Therefore, either search for how to update the boot entry by re-registering it to the UEFI again with efibootmgr
, or do grub-install
appropriately again (which can be tricky as there are quite some variables), or perhaps the easiest way, remove the source drive (and avoid having it plugged together with the clone again), check the partition UUID in the boot entry with efibootmgr -v
, then change the one in the partition table of the clone with e.g. gdisk
.