Ubuntu VM "read only file system" fix?

I was going to install VMWare tools on an Ubuntu server Virtual Machine, but I ran into the issue of not being able to create a cdrom directory in the /mnt directory. I then tested to see if it was just a permissions issue, but I couldn't even create a folder in the home directory. It continues to state that it is a read only file system. I know a little about Linux, and I'm not comfortable with it yet. Any advice would be much appreciated.

Requested Information from a comment:

username@servername:~$ mount
/dev/sda1 on / type ext4 (rw,errors=remount-ro)
proc on /proc type proc (rw)
none on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
udev on /dev type tmpfs (rw,mode=0755)
none on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
none on /dev/shm type tmpfs (rw,nosuid,nodev)
none on /var/run type tmpfs (rw,nosuid,mode=0755)
none on /var/lock type tmpfs (rw,noexec,nosuid,nodev)
none on /lib/init/rw type tmpfs (rw,nosuid,mode=0755) binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noexec,nosuid,nodev)

For sure root output.

root@server01:~# mount
/dev/sda1 on / type ext4 (rw,errors=remount-ro)
proc on /proc type proc (rw)
none on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
udev on /dev type tmpfs (rw,mode=0755)
none on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
none on /dev/shm type tmpfs (rw,nosuid,nodev)
none on /var/run type tmpfs (rw,nosuid,mode=0755)
none on /var/lock type tmpfs (rw,noexec,nosuid,nodev)
none on /lib/init/rw type tmpfs (rw,nosuid,mode=0755) binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noexec,nosuid,nodev)

alt text

alt text


Solution 1:

Although this is a relatively old question, the answer is still the same. You have a virtual machine (running on a physical host) and some sort of storage (either shared storage – a FC SAN, iSCSI storage, an NFS share – or local storage).

With virtualisation, many virtual machines try to access the same physical resources at the same time. Due to physical limitations (number of read/write operations – IOPS; throughput; latency) there might be a problem to satisfy all storage requests of all physical machines at the same time. What usually happens: you will be able to see "SCSI retries" and failed SCSI operations in the operating systems of your virtual machines. If you get too many errors/retries in a certain period of time, the kernel will set the mounted filesystems read-only in order to prevent damage to the filesystem.

To cut the long story short: Your physical storage is not "powerful" enough. There are too many processes (virtual machines) accessing the storage system at the same time, your virtual machines do not get the response from the storage fast enough, and the filesystem goes read-only.

There are not terribly many things you can do. The obvious solution is better/additional storage. You can also modify the parameters for SCSI timeouts in the Linux kernel. Details are described, e.g., in:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009465

http://www.cyberciti.biz/tips/vmware-esx-server-scsi-timeout-for-linux-guest.html

However, this will only "postpone" your problems, because the kernel only gets more time before the filesystem will be set read-only. (I.e., you do not solve the cause of the problem.)

My experience (several years with VMware) is that this problem only exists with Linux kernels (we're using RHEL and SLES) and not with Windows servers. Also, this problem occurs on all sorts of storage – FC, iSCSI, local storage. For us, the most critical (and expensive) component in our virtual infrastructure is storage. (We're now using HP LeftHand with 1 Gbps iSCSI connections, and have not had any storage issues ever since. We chose LeftHand (over traditional FC-solutions) for its scalability.

Solution 2:

A likely explanation is that there is a hardware problem (partial disk failure), and that the kernel remounted the root filesystem as read-only as soon as it detected the problem, in order to minimize the problem. A more reliable¹ way to check current mount options is cat /proc/mounts (grep ' / ' /proc/mounts for the root filesystem, ignore a rootfs / … line which is an artefact of the boot process). You will presumably find that rw,errors=remount-ro has changed to ro (other options may be displayed in addition).

The kernel logs probably contain the message Remounting filesystem read-only, preceded by disk access errors. The logs normally live in /var/log/kern.log, however if this is on a now read-only filesystem the message will not show up there, though the preceding errors should. You can also see the latest few kernel errors with the dmesg command.

As an aside, under Ubuntu, the usual place for mount points (used by the desktop interface) is under /media (e.g. /media/cdrom0), though you can use /mnt or /mnt/cdrom if you like.

¹ mount reports from /etc/mtab. If the root filesystem is read-only, /etc/mtab can't be kept up-to-date.

Solution 3:

What happened was, there was a power failure in the data center recently. Since then, I haven't touched my server. Once our data center loses power, VSphere makes Ubuntu's file system read only until it is restarted. I would have tried restarting but I didn't want all of the monitoring to go crazy. I have silenced Nagios (monitoring service) and everything is working fine now that I have restarted the system. Thanks for all of the input. It is much appreciated.

Solution 4:

Might be obvious, but are you "root" user when trying to do this? /mnt is owned by root and only writable by root. You might also check to see if you had errors on boot. Your output above says that / (and thus /mnt) should be remounted read only if boot process sees errors. You can change this (ie remounting as r/w) with the mount command, but I wouldn't do this unless you're sure that whatever caused the error isn't serious.