systemd service automatic restart after StartLimitInterval

I want my systemd service to be automatically restarted on failure. Additionally I want to rate limit the restarts. I want to allow maximum of 3 restarts within 90 seconds duration. Hence I have done the following configuration.

[Service]  
Restart=always  
StartLimitInterval=90  
StartLimitBurst=3

Now the service is restarted on failure. After 3 Quick failures/restarts it is not restarting anymore as expected. Now I expected the systemd to start the service after the timeout (StartLimitInterval). But the systemd is not automatically starting the service after the timeout(90sec), if I manually restart the service after the timeout it is working. But I want the systemd to automatically start the service after the StartLimitInterval. Please let me know on how to achieve this feature.


Solution 1:

To have a service restart 3 times at 90 second intervals include the following lines in your systemd service file:

[Unit]
StartLimitIntervalSec=400
StartLimitBurst=3
[Service]
Restart=always
RestartSec=90

Before systemd-230 it was called just StartLimitInterval:

[Unit]
StartLimitInterval=400
StartLimitBurst=3
[Service]
Restart=always
RestartSec=90

This worked worked for me for a service that runs a script using Type=idle. Note that StartLimitIntervalSec must be greater than RestartSec * StartLimitBurst otherwise the service will be restarted indefinitely.

It took me some time with a lot of trial and error to work out how systemd uses these options, which suggests that systemd isn't as well documented as one would hope. These options effectively provide the retry cycle time and maximum retries that I was looking for.

References: https://manpages.debian.org/testing/systemd/systemd.unit.5.en.html for Unit section https://manpages.debian.org/testing/systemd/systemd.exec.5.en.html for Service section

Solution 2:

Some years later and with systemd 232 it dosn't work anymore as described in the question and in the answers from 2016. Option name StartLimitIntervalSec and Sections have changed. Now it has to look like this example:

[Unit]
StartLimitBurst=5
StartLimitIntervalSec=33

[Service]
Restart=always
RestartSec=5
ExecStart=/bin/sleep 6

This will do 5 restarts in 30 sec (5*6) plus one restart in 33 sec. So we have 6 restarts in 33 sec. This exceeds the limit of 5 restarts in 33 sec. So restarts will stop at 5 counts after about 31 sec.

Solution 3:

The behavior you describe is consistent with the documentation:

StartLimitInterval=, StartLimitBurst= Configure service start rate limiting. By default, services which are started more than 5 times within 10 seconds are not permitted to start any more times until the 10 second interval ends. With these two options, this rate limiting may be modified. Use StartLimitInterval= to configure the checking interval (defaults to DefaultStartLimitInterval= in manager configuration file, set to 0 to disable any kind of rate limiting). Use StartLimitBurst= to configure how many starts per interval are allowed (defaults to DefaultStartLimitBurst= in manager configuration file). These configuration options are particularly useful in conjunction with Restart=; however, they apply to all kinds of starts (including manual), not just those triggered by the Restart= logic. Note that units which are configured for Restart= and which reach the start limit are not attempted to be restarted anymore; however, they may still be restarted manually at a later point, from which point on, the restart logic is again activated. Note that systemctl reset-failed will cause the restart rate counter for a service to be flushed, which is useful if the administrator wants to manually start a service and the start limit interferes with that.

I am still trying myself to figure out a way to accomplish the behavior you desire.

Solution 4:

You can use StartLimitAction=reboot. This will reboot the system after the StartLimitInterval.

StartLimitAction= Configure the action to take if the rate limit configured with StartLimitInterval= and StartLimitBurst= is hit. Takes one of none, reboot, reboot-force, or reboot-immediate. If none is set, hitting the rate limit will trigger no action besides that the start will not be permitted. reboot causes a reboot following the normal shutdown procedure (i.e. equivalent to systemctl reboot). reboot-force causes a forced reboot which will terminate all processes forcibly but should cause no dirty file systems on reboot (i.e. equivalent to systemctl reboot -f) and reboot-immediate causes immediate execution of the reboot(2) system call, which might result in data loss. Defaults to none.

Solution 5:

You can set OnFailure to start another service when this fails. In the on-fail service you can run a script that waits and then restarts your service.

For a sample on how to set this up see Systemd status mail on unit failure and modify it to restart the service instead.