XenServer 6.2 VM 100% CPU and 100% memory usage and won't boot
I had this problem today after a power failure.
I had 2 vms and one of them won't boot and was using 100% of CPU and 100% of memory.
Since it was very dificult to solve (at least for me) I want to detail here the steps I've made to fix it, mixing many tutorials.
I had this problem too. This helped me:
It could be possible that the problem was GRUB not sending signal video.
I've seen many threads regarding this, and it is highly probable that my VM's were stuck at this GRUB screen where you MUST select the OS to boot (giving the fact that they booted by pressing enter).
https://askubuntu.com/questions/372164/how-to-load-ubuntu-server-automatically-in-grub
Because this happened to me after power failures also, in a non-virtualized pc.
Just clicked in the blank area, hit enter, and it started booting.
First, I tried to use the Ubuntu server installation CD and go to Recue a Broken System, then tried to reinstall GRUB, but it failed.
Then I shut down the failing VM with Force Shutdown.
Secondly, listed the VMs in the XenCenter Console
[root@xen01 ~]# xe vm-list
uuid ( RO) : d56d5ae8-62de-5e7e-41f9-1bd707d727d9
name-label ( RW): fdev-appgw
power-state ( RO): halted
uuid ( RO) : 87aba275-0e05-4160-bebf-efc85fe93386
name-label ( RW): fdev-tracker
power-state ( RO): halted
uuid ( RO) : c81439c2-a345-4f04-947e-34554718ce7e
name-label ( RW): Control domain on host: fdev-xen01
power-state ( RO): running
fdev-tracker was the one failing.
Listed it's disks. I must admit I don't know why I have 2 disks here, since I am relatively newbie to Linux. But I used the first one, the one that says Device: hdb
[root@xen01 ~]# xe vbd-list vm-name-label=fdev-tracker
uuid ( RO) : d461e06d-9cc3-7762-f141-0b3d2abe7b3c
vm-uuid ( RO): 87aba275-0e05-4160-bebf-efc85fe93386
vm-name-label ( RO): fdev-tracker
vdi-uuid ( RO): 92dd9489-b450-4766-8853-b8b2fc9597ad
empty ( RO): false
device ( RO): hdb
uuid ( RO) : 969fc0c8-1fcf-ed2c-ed6e-a71dc3c359d9
vm-uuid ( RO): 87aba275-0e05-4160-bebf-efc85fe93386
vm-name-label ( RO): fdev-tracker
vdi-uuid ( RO): ba9e2ed8-c9db-4f95-8f14-2d51c99ea992
empty ( RO): false
device ( RO): hdd
After it I put this commands to be able to mount the disk in my other Linux VM. I don't know exactly what they do, but it is what the tutorials says. Please note that d56d5ae8-62de-5e7e-41f9-1bd707d727d9 is the UUID of the working VM. I had problems before because the tutorial didn't do clear on this. 92dd9489-b450-4766-8853-b8b2fc9597ad is the UUID of the failing machine VDI.
[root@xen01 ~]# xe vbd-create vm-uuid=d56d5ae8-62de-5e7e-41f9-1bd707d727d9 vdi-uuid=92dd9489-b450-4766-8853-b8b2fc9597ad device=autodetect
91022555-2b86-4faf-cce1-eb62efc8aab7
It outputs an UUID. I used it to plug it to the working machine.
[root@xen01 ~]# xe vbd-plug uuid=91022555-2b86-4faf-cce1-eb62efc8aab7
After it I ssh'ed the working VM and entered parted
:
jsivil@appgw:/proc$ sudo parted
GNU Parted 2.3
Using /dev/xvda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print devices
/dev/xvda (10,7GB)
/dev/xvdb (21,5GB)
(parted) quit
/dev/xvdb
this one, of 21 GB is the disk of the failing VM.
I've tried to do a fsck on it:
jsivil@appgw:/proc$ sudo fsck -p -c -v -f /dev/xvdb
fsck from util-linux 2.20.1
fsck.ext2: Bad magic number in super-block while trying to open /dev/xvdb
/dev/xvdb:
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
or
e2fsck -b 32768 <device>
But I remembered that when I formatted, it had 2 partitions, one for the entire filesystem (ext4) and other for swap (ext3 I think). So maybe that was causing trouble.
Then I saw another tutorial, it was using a program called kpartx
. I didn't have it, so I did:
sudo apt-get install kpartx
I then did:
jsivil@appgw:/proc$ sudo kpartx -a /dev/xvdb
It seems that it makes the partitions visible or something like that. They are now in /dev/mapper/
jsivil@appgw:/proc$ sudo fsck -p -c -v -f /dev/mapper/ #tab press
control xvdb1 xvdb2 xvdb5
So I made the fsck on all xvdb*:
jsivil@appgw:/proc$ sudo fsck -p -c -v -f /dev/mapper/xvdb1
fsck from util-linux 2.20.1
/dev/mapper/xvdb1: Updating bad block inode.
126881 inodes used (10.05%, out of 1262320)
65 non-contiguous files (0.1%)
120 non-contiguous directories (0.1%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 117890/29
778957 blocks used (15.43%, out of 5047040)
0 bad blocks
1 large file
99695 regular files
17528 directories
55 character device files
25 block device files
0 fifos
28 links
9564 symbolic links (8869 fast symbolic links)
5 sockets
------------
126900 files
jsivil@appgw:/proc$ sudo fsck -p -c -v -f /dev/mapper/xvdb
xvdb1 xvdb2 xvdb5
jsivil@appgw:/proc$ sudo fsck -p -c -v -f /dev/mapper/xvdb2
fsck from util-linux 2.20.1
fsck.ext2: Attempt to read block from filesystem resulted in short read while trying to open /dev/mapper/xvdb2
Could this be a zero-length partition?
jsivil@appgw:/proc$ sudo fsck -p -c -v -f /dev/mapper/xvdb5
fsck from util-linux 2.20.1
fsck: fsck.swap: not found
fsck: error 2 while executing fsck.swap for /dev/mapper/xvdb5
I don't know why it failed on xvdb2 (nor what it is, because for me it should have only 2 partitions). xvdb5 was swap, so that wasn't important. Next, I've tried to mount to see if I was able to see my files (I was able using the Ubuntu Server CD), but I was curious.
cd to /run/shm
jsivil@appgw:/run/shm$ mkdir /run/shm/a
jsivil@appgw:/run/shm$ sudo mount -t ext4 /dev/mapper/xvdb1 a
I cd'ed to "a" and all was there. I umount ed it.
Next, I returned to the primary guide about XenServer and tried to unplug it from VM
[root@xen01 ~]# xe vbd-unplug uuid=91022555-2b86-4faf-cce1-eb62efc8aab7
The VM rejected the attempt to detach the device.
type: VBD
ref: 91022555-2b86-4faf-cce1-eb62efc8aab7
msg:
It seems that some of the previous steps in the working VM had the disk in a "in use" state or something.
So I rebooted the working VM. Tried to do it again, but the VM was still rebooting. So I got another error. Waited until it finished and was able to do it.
[root@xen01 ~]# xe vbd-unplug uuid=91022555-2b86-4faf-cce1-eb62efc8aab7
You attempted an operation on a VM which requires PV drivers to be installed but the drivers were not detected.
vm: d56d5ae8-62de-5e7e-41f9-1bd707d727d9 (fdev-appgw)
[root@xen01 ~]# xe vbd-unplug uuid=91022555-2b86-4faf-cce1-eb62efc8aab7
[root@xen01 ~]# xe vbd-destroy uuid=91022555-2b86-4faf-cce1-eb62efc8aab7
After all this, I've tried to boot the failing machine, but I had no luck :( The same problem was there, 100% CPU and 100% memory.
I put on the Ubuntu Server Installation CD (I have it in ISO Storage) and forced reboot. Entered Rescue Broken System, and mounted xvdb1 as my root filesystem. After it, I went to "Reinstall GRUB". Remember it was failing previously, but this time it succeded.
I ejected the CD and rebooted.
My VM is working again!!!
This seems to be a rare problem because I couldn't find much information about this, only one guy on ServerFault, but with no working answer (like destroying vm domain or something like that).
I hope it helps somebody and feel free to edit this and clear the concepts that I don't know about.