GlusterFS failing to mount at boot with Ubuntu 14.04

I managed to make this work through a combination of answers in this thread and this one: GlusterFS is failing to mount on boot

As per @Dan Pisarski edit /etc/init/mounting-glusterfs.conf to read:

exec start wait-for-state WAIT_FOR=networking WAITER=mounting-glusterfs-$MOUNTPOINT

As per @dialt0ne change /etc/fstab to read:

[serverip]:[vol]  [mountpoint]  glusterfs  defaults,nobootwait,_netdev,backupvolfile-server=[backupserverip],direct-io-mode=disable  0       0

Works For Me(tm) on Ubuntu 14.04.2 LTS


I have run into the same problem on AWS on ubuntu 12.04. Here are some things you can do that worked for me:

  • add more fetch-attempts in your fstab

This will allow you to retry the volfile server while the network is unavailable.

  • add a backup volfile server in your fstab

This will allow for you to mount the filesystem from another gluster server member if the primary is down for some reason.

  • add nobootwait in your fstab

This allows the instance to continue booting while this filesystem isn't mounted.

A sample entry from my current fstab is:

10.20.30.40:/fs1 /example glusterfs defaults,nobootwait,_netdev,backupvolfile-server=10.20.30.41,fetch-attempts=10 0 2

I have not tested this on 14.04, but it works ok for my 12.04 instances.


It's a bug

This is really a bug (the static-network-up is not a job, it's an event signal).

Moreover, using the network job as suggested in other answers is not the most correct solution.

So, I created this bug report and submitted a patch to this problem.

As a workaround, you can apply my proposed solution (at the end of this answer) and use the _netdev option in your fstab.

A better explanation is showed above too, but you can skip this explanation if you want.

Explanation

This is a bug in the mounting-glusterfs.conf. It can increase unnecessary 30 seconds in the boot in an Ubuntu Server, or even hang the boot process.

Because of this bug, the mountall process thinks that the mount failed (you'll see "Mount failed" errors in /var/log/boot.log). So, when not using the nobootwait/nofail flags in /etc/fstab, the bug can hang the mount process (and the boot process too). When using the nobootwait/nofail flags, the bug will increase the boot time in about 30 seconds.

The bug is caused by the following errors:

  • There is no need to wait for the network is up. The Ubuntu itself has the _netdev mount flag that will retry the mount for each time that an interface brings up;
  • However, it's necessary to wait for the GlusterFS Server daemon (for mounts using localhost);
    • This was implemented in an old commit in the GlusterFS upstream project. However, this commit was overwritten;
  • It's wrong to use the wait-for-state upstart task to wait for a signal. It's used to wait for a job. static-network-up is an event signal, and not a job;
    • This is why the "Unknown job: static-network-up" is logged;
  • It's wrong, when waiting for a job to be started, not passing the WAIT_STATE=running env var because it's not the default in wait-for-state.

Solution

/etc/init/mounting-glusterfs.conf:

author "Louis Zuckerman <[email protected]>"
description "Block the mounting event for glusterfs filesystems until the glusterfs-server is running"

instance $MOUNTPOINT

start on mounting TYPE=glusterfs
task
script
  if status glusterfs-server; then
    start wait-for-state WAIT_FOR=glusterfs-server WAIT_STATE=running \
        WAITER=mounting-glusterfs-$MOUNTPOINT
  fi
end script

PS: Use also the _netdev option in your fstab.


I ran into this as well, and want to preface this answer with the statement that I am not an expert in this area so its possible there is a better solution to this!

But the issue seems to be that static-network-up is an event, not the name of an upstart job. However, the wait-for-state script expects a job name to be passed in as WAIT_FOR value. Thus, the error of "Unknown job" as you discovered above.

To resolve the issue I changed /etc/init/mounting-glusterfs.conf, changing:

exec start wait-for-state WAIT_FOR=static-network-up WAITER=mounting-glusterfs-$MOUNTPOINT

into:

exec start wait-for-state WAIT_FOR=networking WAITER=mounting-glusterfs-$MOUNTPOINT

networking is the name of an actual job (/etc/init/networking.conf) and I believe the job that typically emits static-network-up.

This change worked for me on Ubuntu 14.04.


Thanks for the detailed explanation, I think I understand a lot more than earlier. Latest solution is almost working. The problems (actually one, since the first implies the second):

  • local shares (127.0.0.1:/share) still not mounted
  • mounted TYPE=glusterfs never satisfied, so the services which are dependent of the mounted TYPE=glusterfs state

/etc/fstab:

127.0.0.1:/control-share /mnt/glu-control-share glusterfs defaults,_netdev 0 0

/etc/init/mounting-glusterfs.conf: copied from above

/etc/init/salt-master.conf:

description "Salt Master"

start on (mounted TYPE=glusterfs
          and runlevel [2345])
stop on runlevel [!2345]
limit nofile 100000 100000
...

The local share must be mounted by hand, or by some automatism, salt-master must be started by hand after all reboots.

Noticed later: the above WAIT script in mounting-glusterfs... blocks the whole boot procedure, seems like glusterfs-server state never reaches running.