XenServer 6.2 VM 100% CPU and 100% memory usage and won't boot

I had this problem today after a power failure.

I had 2 vms and one of them won't boot and was using 100% of CPU and 100% of memory.

Since it was very dificult to solve (at least for me) I want to detail here the steps I've made to fix it, mixing many tutorials.


I had this problem too. This helped me:

It could be possible that the problem was GRUB not sending signal video.

I've seen many threads regarding this, and it is highly probable that my VM's were stuck at this GRUB screen where you MUST select the OS to boot (giving the fact that they booted by pressing enter).

https://askubuntu.com/questions/372164/how-to-load-ubuntu-server-automatically-in-grub

Because this happened to me after power failures also, in a non-virtualized pc.

Just clicked in the blank area, hit enter, and it started booting.


First, I tried to use the Ubuntu server installation CD and go to Recue a Broken System, then tried to reinstall GRUB, but it failed.

Then I shut down the failing VM with Force Shutdown.

Secondly, listed the VMs in the XenCenter Console

[root@xen01 ~]# xe vm-list
uuid ( RO)           : d56d5ae8-62de-5e7e-41f9-1bd707d727d9
     name-label ( RW): fdev-appgw
    power-state ( RO): halted


uuid ( RO)           : 87aba275-0e05-4160-bebf-efc85fe93386
     name-label ( RW): fdev-tracker
    power-state ( RO): halted


uuid ( RO)           : c81439c2-a345-4f04-947e-34554718ce7e
     name-label ( RW): Control domain on host: fdev-xen01
    power-state ( RO): running

fdev-tracker was the one failing.

Listed it's disks. I must admit I don't know why I have 2 disks here, since I am relatively newbie to Linux. But I used the first one, the one that says Device: hdb

[root@xen01 ~]# xe vbd-list vm-name-label=fdev-tracker
uuid ( RO)             : d461e06d-9cc3-7762-f141-0b3d2abe7b3c
          vm-uuid ( RO): 87aba275-0e05-4160-bebf-efc85fe93386
    vm-name-label ( RO): fdev-tracker
         vdi-uuid ( RO): 92dd9489-b450-4766-8853-b8b2fc9597ad
            empty ( RO): false
           device ( RO): hdb


uuid ( RO)             : 969fc0c8-1fcf-ed2c-ed6e-a71dc3c359d9
          vm-uuid ( RO): 87aba275-0e05-4160-bebf-efc85fe93386
    vm-name-label ( RO): fdev-tracker
         vdi-uuid ( RO): ba9e2ed8-c9db-4f95-8f14-2d51c99ea992
            empty ( RO): false
           device ( RO): hdd

After it I put this commands to be able to mount the disk in my other Linux VM. I don't know exactly what they do, but it is what the tutorials says. Please note that d56d5ae8-62de-5e7e-41f9-1bd707d727d9 is the UUID of the working VM. I had problems before because the tutorial didn't do clear on this. 92dd9489-b450-4766-8853-b8b2fc9597ad is the UUID of the failing machine VDI.

[root@xen01 ~]# xe vbd-create vm-uuid=d56d5ae8-62de-5e7e-41f9-1bd707d727d9 vdi-uuid=92dd9489-b450-4766-8853-b8b2fc9597ad device=autodetect
91022555-2b86-4faf-cce1-eb62efc8aab7

It outputs an UUID. I used it to plug it to the working machine.

[root@xen01 ~]# xe vbd-plug uuid=91022555-2b86-4faf-cce1-eb62efc8aab7

After it I ssh'ed the working VM and entered parted:

jsivil@appgw:/proc$ sudo parted
GNU Parted 2.3
Using /dev/xvda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print devices                                                    
/dev/xvda (10,7GB)
/dev/xvdb (21,5GB)
(parted) quit                  

/dev/xvdb this one, of 21 GB is the disk of the failing VM.

I've tried to do a fsck on it:

jsivil@appgw:/proc$ sudo fsck -p -c -v -f /dev/xvdb
fsck from util-linux 2.20.1
fsck.ext2: Bad magic number in super-block while trying to open /dev/xvdb
/dev/xvdb: 
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

But I remembered that when I formatted, it had 2 partitions, one for the entire filesystem (ext4) and other for swap (ext3 I think). So maybe that was causing trouble.

Then I saw another tutorial, it was using a program called kpartx. I didn't have it, so I did:

sudo apt-get install kpartx

I then did:

jsivil@appgw:/proc$ sudo kpartx -a /dev/xvdb

It seems that it makes the partitions visible or something like that. They are now in /dev/mapper/

    jsivil@appgw:/proc$ sudo fsck -p -c -v -f /dev/mapper/   #tab press
    control  xvdb1    xvdb2    xvdb5

So I made the fsck on all xvdb*:

jsivil@appgw:/proc$ sudo fsck -p -c -v -f /dev/mapper/xvdb1
fsck from util-linux 2.20.1
/dev/mapper/xvdb1: Updating bad block inode.

      126881 inodes used (10.05%, out of 1262320)
          65 non-contiguous files (0.1%)
         120 non-contiguous directories (0.1%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 117890/29
      778957 blocks used (15.43%, out of 5047040)
           0 bad blocks
           1 large file

       99695 regular files
       17528 directories
          55 character device files
          25 block device files
           0 fifos
          28 links
        9564 symbolic links (8869 fast symbolic links)
           5 sockets
------------
      126900 files
jsivil@appgw:/proc$ sudo fsck -p -c -v -f /dev/mapper/xvdb
xvdb1  xvdb2  xvdb5  
jsivil@appgw:/proc$ sudo fsck -p -c -v -f /dev/mapper/xvdb2
fsck from util-linux 2.20.1
fsck.ext2: Attempt to read block from filesystem resulted in short read while trying to open /dev/mapper/xvdb2
Could this be a zero-length partition?
jsivil@appgw:/proc$ sudo fsck -p -c -v -f /dev/mapper/xvdb5
fsck from util-linux 2.20.1
fsck: fsck.swap: not found
fsck: error 2 while executing fsck.swap for /dev/mapper/xvdb5

I don't know why it failed on xvdb2 (nor what it is, because for me it should have only 2 partitions). xvdb5 was swap, so that wasn't important. Next, I've tried to mount to see if I was able to see my files (I was able using the Ubuntu Server CD), but I was curious.

cd to /run/shm
jsivil@appgw:/run/shm$ mkdir /run/shm/a
jsivil@appgw:/run/shm$ sudo mount -t ext4 /dev/mapper/xvdb1 a

I cd'ed to "a" and all was there. I umount ed it.

Next, I returned to the primary guide about XenServer and tried to unplug it from VM

[root@xen01 ~]# xe vbd-unplug uuid=91022555-2b86-4faf-cce1-eb62efc8aab7
The VM rejected the attempt to detach the device.
type: VBD
ref: 91022555-2b86-4faf-cce1-eb62efc8aab7
msg:

It seems that some of the previous steps in the working VM had the disk in a "in use" state or something.

So I rebooted the working VM. Tried to do it again, but the VM was still rebooting. So I got another error. Waited until it finished and was able to do it.

[root@xen01 ~]# xe vbd-unplug uuid=91022555-2b86-4faf-cce1-eb62efc8aab7 
You attempted an operation on a VM which requires PV drivers to be installed but the drivers were not detected.
vm: d56d5ae8-62de-5e7e-41f9-1bd707d727d9 (fdev-appgw)
[root@xen01 ~]# xe vbd-unplug uuid=91022555-2b86-4faf-cce1-eb62efc8aab7 
[root@xen01 ~]# xe vbd-destroy  uuid=91022555-2b86-4faf-cce1-eb62efc8aab7 

After all this, I've tried to boot the failing machine, but I had no luck :( The same problem was there, 100% CPU and 100% memory.

I put on the Ubuntu Server Installation CD (I have it in ISO Storage) and forced reboot. Entered Rescue Broken System, and mounted xvdb1 as my root filesystem. After it, I went to "Reinstall GRUB". Remember it was failing previously, but this time it succeded.

I ejected the CD and rebooted.

My VM is working again!!!

This seems to be a rare problem because I couldn't find much information about this, only one guy on ServerFault, but with no working answer (like destroying vm domain or something like that).

I hope it helps somebody and feel free to edit this and clear the concepts that I don't know about.