How to Automatically Restart a Linux Service After a Crash

This article was last updated on: May 17, 2026 am

Overview

Recently I encountered a Linux Systemd service crash that required manual intervention to restart. So, is there a way to automatically restart a Linux service after a crash?

Systemd

Systemd Restart

Systemd allows you to configure services to automatically restart when they crash.

A typical unit file looks like this:

[Unit]
Description=Tailscale node agent
After=network-online.target
Wants=tailscale-weekly-update.timer

[Service]
Type=oneshot
ExecStart=/usr/bin/tailscale update -yes

[Install]
WantedBy=multi-user.target

In the example above, if the daemon crashes or gets killed, systemd won’t do anything about it.

However, you can have systemd automatically restart the daemon in case it crashes or gets killed unexpectedly. To do this, add the Restart option to the [Service] section. A typical example looks like this:

[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target

StartLimitIntervalSec=600
StartLimitBurst=5

[Install]
WantedBy=multi-user.target

[Service]
Type=notify
EnvironmentFile=-/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
    server \

This will react to any situation that causes the daemon to stop — whenever the daemon stops, systemd will restart it within 5 seconds.

Restart has 2 commonly used values:

always
on-failure: Restart on failure. This covers the broadest range of failure scenarios, such as unclean signals and unclean exit codes.

In this example, the [Unit] section also includes the StartLimitIntervalSec and StartLimitBurst directives. This prevents a failing service from being restarted every 5 seconds indefinitely. If it keeps failing, systemd will stop trying to start the service.

If the service fails to restart 5 times within 600 seconds, it will enter a failed state and no further restart attempts will be made. This ensures that if the service is truly broken, systemd won’t keep trying to restart it. At that point, manual intervention is needed.

If you query the status of the daemon after it has been killed, systemd will show it as activating (auto-restart).

Systemd OnFailure

Restarting a service is great, but taking specific actions when a unit fails is even better. Perhaps the software you’re using has a known bug that requires deleting cache files on crash, or maybe you want to launch a script to collect logs and system information for diagnosing the issue. Systemd allows you to specify a unit to run when a service fails.

[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target

StartLimitIntervalSec=600
StartLimitBurst=5
OnFailure=k3s-recovery.service

[Install]
WantedBy=multi-user.target

[Service]
Type=notify
EnvironmentFile=-/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=on-failure
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
    server \

This example specifies OnFailure=k3s-recovery.service to tell systemd that if the service fails, it should start the k3s-recovery unit.

The k3s-recovery unit is simply a oneshot service unit that runs a script:

[Unit]
Description=K3s recovery

[Service]
Type=oneshot
ExecStart=/usr/local/sbin/k3s-recovery.sh

This script can do anything: perform some manual workarounds to get the service running again, send alerts to a monitoring system, or compress temporary logs and application state for troubleshooting. Here’s an example:

#!/bin/bash

echo 'Attempting to recover!' > /tmp/recovery_info
systemctl stop k3s.service
/usr/local/sbin/k3s-killall.sh
systemctl start k3s.service

Systemd FailureAction Reboot

There’s another possibility — a reboot cures everything! Systemd has a built-in feature to trigger a system reboot when a unit fails. In this example, the system will gracefully reboot when the unit enters a failure state:

[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target

StartLimitIntervalSec=600
StartLimitBurst=5
FailureAction=reboot

[Install]
WantedBy=multi-user.target

[Service]
Type=notify
EnvironmentFile=-/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=on-failure
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
    server \

FailureAction accepts several valid values: none, reboot, reboot-force, reboot-immediate, poweroff, poweroff-force, poweroff-immediate, exit, exit-force, soft-reboot, soft-reboot-force, kexec, kexec-force, halt, halt-force, and halt-immediate.

Summary

This article covered several approaches for automatically handling failures when a service crashes. Systemd includes powerful features that can automatically respond to keep services running.

📚️ References

CloudNative

#BestPractices #Linux #Systemd

How to Automatically Restart a Linux Service After a Crash

https://e-whisper.com/posts/25576/

Author

east4ming

Posted on

July 28, 2023

Licensed under

Terraform Series - How to Conditionally Create Resources Based on a Field When Batch Creating Previous

Cilium Series Part 16: CiliumNetworkPolicy Hands-On Lab Next