grub2: update-grub failed to get canonical path of `none'

$ sudo update-grub
/usr/sbin/grub-probe: error: failed to get canonical path of `none'.

This is the situation I'm in after an interrupted upgrade from vivid to wily

[edit]

Further delving into grub source code, the second command is probably the failing one:

$ grub-probe --target=device /
/dev/md2
$ grub-probe --target=device /boot
grub-probe: error: failed to get canonical path of `none'.

The following also gives the error:

$ sudo grub-probe -t device /boot/grub
grub-probe: error: failed to get canonical path of `none'.
$ sudo grub-probe -t fs_uuid /boot/grub
grub-probe: error: failed to get canonical path of `none'.

[/edit]

I don't have /boot/grub/grub.cfg present (or older /boot/grub/menu.lst)

It was impossible to install a boot loader during grub configuration:

http://imgur.com/a/LqPa8

Grub failed to install on the available options (/dev/sda /dev/sdb or /dev/md2)

md1 wasn't given as an option, even though it is currently mounted at /boot :

$ cat /etc/fstab
proc /proc proc defaults 0 0
/dev/md/0 none swap sw 0 0
/dev/md/1 /boot ext3 defaults 0 0
/dev/md/2 / ext4 defaults 0 0

I've got a raid setup with /dev/sda and /dev/sdb anyhow:

$ sudo fdisk -l
Disk /dev/sda: 447.1 GiB, 480103981056 bytes, 937703088 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00032e61

Device     Boot   Start       End   Sectors   Size Id Type
/dev/sda1          2048   8390656   8388609     4G fd Linux raid autodetect
/dev/sda2       8392704   9441280   1048577   512M fd Linux raid autodetect
/dev/sda3       9443328 937701040 928257713 442.6G fd Linux raid autodetect


Disk /dev/sdb: 447.1 GiB, 480103981056 bytes, 937703088 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00074c3d

Device     Boot   Start       End   Sectors   Size Id Type
/dev/sdb1          2048   8390656   8388609     4G fd Linux raid autodetect
/dev/sdb2       8392704   9441280   1048577   512M fd Linux raid autodetect
/dev/sdb3       9443328 937701040 928257713 442.6G fd Linux raid autodetect


Disk /dev/md2: 442.5 GiB, 475133575168 bytes, 927995264 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/md0: 4 GiB, 4292804608 bytes, 8384384 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/md1: 511.7 MiB, 536543232 bytes, 1047936 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Grub appears to be installed (detection from another answer on serverfault):

$ sudo dd bs=512 count=1 if=/dev/sda 2>/dev/null | strings
ZRr=
`|f 
\|f1
GRUB 
Geom
Hard Disk
Read
 Error

When I run grub-emu, I get a blank prompt:

enter image description here

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 15.10
Release:        15.10
Codename:       wily

This is on a server with only ssh access so I don't have access to the live CD if grub fails!

[edit] output of df -h:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             63G     0   63G   0% /dev
tmpfs            13G  714M   12G   6% /run
/dev/md2        436G  178G  236G  44% /
tmpfs            63G  8.0K   63G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            63G     0   63G   0% /sys/fs/cgroup
none            436G  178G  236G  44% /boot
tmpfs            13G     0   13G   0% /run/user/0
tmpfs            13G     0   13G   0% /run/user/1002
/dev/md2        436G  178G  236G  44% /var/cache/apt/archives
none            436G  178G  236G  44% /bin
none            436G  178G  236G  44% /etc
none            436G  178G  236G  44% /initrd
none            436G  178G  236G  44% /lib
none            436G  178G  236G  44% /lib32
none            436G  178G  236G  44% /lib64
none            436G  178G  236G  44% /sbin
none            436G  178G  236G  44% /usr
none            436G  178G  236G  44% /var

[further edit] the above command seems to report that /boot is mounted at none. I think this might be the none grub-probe is complaining about. Here's the output of mount -l which shows two separate mount 'entries'; investigating how to remove the second now.

mount -l |grep boot
/dev/md1 on /boot type ext3 (rw,relatime,data=ordered)
none on /boot type aufs (rw,relatime,si=6ea5aad590be877d)

Solution 1:

Ok I seem to have got it with the following (everything is simple in retrospect):

$ umount /boot

I tried this as I noticed that there were two 'mounts' for /boot:

$ mount -l |grep boot
/dev/md1 on /boot type ext3 (rw,relatime,data=ordered)
none on /boot type aufs (rw,relatime,si=6ea5aad590be877d)

And that the latter was overriding the former:

$ df -h |grep boot
none            436G  178G  236G  44% /boot

After umount the same commands look like this:

$ mount -l |grep boot
/dev/md1 on /boot type ext3 (rw,relatime,data=ordered)
$ df -h |grep boot
/dev/md1        488M   75M  388M  17% /boot

(no idea how the second mount happened)

I was then able to reinstall grub as follows (I've raid1 so that's why there's two commands for sda and sdb):

$ grub-install /dev/sda
Installing for i386-pc platform.
Installation finished. No error reported.
$ grub-install /dev/sdb
Installing for i386-pc platform.
Installation finished. No error reported.
$ update-grub
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-3.19.0-30-generic
Found initrd image: /boot/initrd.img-3.19.0-30-generic
Found linux image: /boot/vmlinuz-3.19.0-25-generic
Found initrd image: /boot/initrd.img-3.19.0-25-generic
done

Postscript

After reboot the server came back up (could ping it), but I found I couldn't ssh in. This turned out to be a separate problem to do with /dev/null (might have gotten broken at the same time). I was able to ssh in using a separate rescue system and apply this fix: http://thesystemadministrator.net/linux-administration/sshd-deamon-failing-to-start