Systemd: start a unit after another unit REALLY starts

In my particular case I want to start remote-fs unit after all glusterfs completely starts.

My systemd files:

glusterfs target:

node04:/usr/lib/systemd/system # cat glusterfsd.service 
[Unit]
Description=GlusterFS brick processes (stopping only)
After=network.target glusterd.service

[Service]
Type=oneshot
ExecStart=/bin/true
RemainAfterExit=yes
ExecStop=/bin/sh -c "/bin/killall --wait glusterfsd || /bin/true"
ExecReload=/bin/sh -c "/bin/killall -HUP glusterfsd || /bin/true"

[Install]
WantedBy=multi-user.target

remote-fs target:

node04:/usr/lib/systemd/system # cat remote-fs.target 
[Unit]
Description=Remote File Systems
Documentation=man:systemd.special(7)
Requires=glusterfsd.service
After=glusterfsd.service remote-fs-pre.target
DefaultDependencies=no
Conflicts=shutdown.target

[Install]
WantedBy=multi-user.target

OK, all Gluster daemons start successful and I want to mount Gluster filesystem via NFS, but Gluster's NFS share gets ready not immediately after glusterfs.service started, but a few seconds later, so usually remote-fs is unable to mount it even regarding Requires and After directives.

Let's see the log:

Apr 14 16:16:22 node04 systemd[1]: Started GlusterFS, a clustered file-system server.
Apr 14 16:16:22 node04 systemd[1]: Starting GlusterFS brick processes (stopping only)...
Apr 14 16:16:22 node04 systemd[1]: Starting Network is Online.
Apr 14 16:16:22 node04 systemd[1]: Reached target Network is Online.
Apr 14 16:16:22 node04 systemd[1]: Mounting /stor...

Here everything is OK, remote filesystem (/stor) seems to be mount after glusterfs started, as it meant to be according to unit files... But the next lines are:

//...skipped.....
Apr 14 16:16:22 node04 systemd[1]: Started GlusterFS brick processes (stopping only).

What? GlusterFS got ready only for this moment! And then we see:

//...skipped.....
Apr 14 16:16:23 node04 mount[2960]: mount.nfs: mounting node04:/stor failed, reason given by server: No such file or directory
Apr 14 16:16:23 node04 systemd[1]: stor.mount mount process exited, code=exited status=32
Apr 14 16:16:23 node04 systemd[1]: Failed to mount /stor.
Apr 14 16:16:23 node04 systemd[1]: Dependency failed for Remote File Systems.
Apr 14 16:16:23 node04 systemd[1]: Unit stor.mount entered failed state.

Mount failed because NFS server was not ready when systemd attempted to mount the storage.

Due to non-deterministic nature of systemd boot process, sometimes (approx. 1 of 10 boots) mounting this filesystem on boot succeeds.

If onboot mount was unsuccessful, I can login to the server and manually mount the /stor directory, so Gluster's NFS service seems to work fine.

So how to start remote-fs after glusterfsd, i.e. after Started GlusterFS brick processes line appears in the log?

remote-fs seems to be one of the very last targets, so I can't get it start after another "workaround" target which is in fact not required by remote-fs.


Solution 1:

You can analyze systemd boot sequence by following command. View the output file by using a SVG supporting web browser.

systemd-analyze plot > test.svg

That plotting will provide you last boot's timing statistics, which will provide you more clarified point of view to problem.

I solved my NFS mounting problem by adding mount commands in to /etc/rc.local. However I'm not sure, will it work with glusterd integration, worth a try for a quick fix. In order to make systemd run rc.local you should satisfy following condition:

# grep Condition /usr/lib/systemd/system/rc-local.service
ConditionFileIsExecutable=/etc/rc.d/rc.local

Solution 2:

As already suggested by others; I'm not sure whether it's actually a dependency on 'glusterfsd', instead of a general delay in something else, for example a DNS lookup that needs to succeed for it to be able to resolve 'node4' and succesfully mount the NFS share.

We've run into this delay because most of our setups use a local validating resolver, which needs to be available before other services that depend on DNS can start successfully.

The solution to this was to have an 'ExecStartPre' script that basically tests for the availability of the specific dependencies over and over, until it succeeds (exit 0) or times out trying (exit 1).

Make sure you customise outside of the main systemd lib directory, if you can. Changing the package files will mean they'll likely be overwritten on the next update that comes along.