systemd Boot Init Units Timers Journald and Services

Reading time
15 min read
Word count
2821 words
Diagram count
2 diagrams

Source: Victor Bona's Obsidian Compendium snapshot, Knowledge base/linux-systems-engineering/07 systemd Boot Init Units Timers Journald and Services.md.

Purpose: Build an operator-grade mental model for systemd as PID 1, service supervisor, boot transaction engine, resource controller, and logging substrate, with clear differences between a local learning machine and production hosts or clusters.

07 systemd Boot Init Units Timers Journald and Services

Related notes: 06 System Calls ABI libc and User Kernel Boundaries, 08 Permissions Users Groups Capabilities and LSMs, 09 cgroups Namespaces Containers and Runtime Isolation, 17 Production Operations Troubleshooting and Runbooks

systemd is not only an init program. On a modern Linux host it is the first long-lived userspace process, the coordinator for boot and shutdown, a dependency resolver, a cgroup manager, a service monitor, an activation broker, and often the front door for host logs. Treat it as an operating system control plane. On a local learning machine, it is acceptable to experiment with unit files, timers, socket activation, and journald retention. On production hosts and clusters, every unit is part of the host reliability model: it affects boot ordering, shutdown behavior, blast radius, logging, resource contention, and incident response.

PID 1 responsibilities

PID 1 has special kernel semantics. If it exits, the system is not viable. If orphaned processes need a parent, PID 1 adopts them. If child processes terminate, PID 1 must reap them. systemd adds higher-level policy on top of those kernel duties:

  • start configured boot targets and their dependencies
  • track services and scopes with cgroups rather than only parent-child process trees
  • supervise process lifecycle and restart policy
  • translate fstab, crypttab, device, and generator output into units
  • coordinate shutdown, reboot, halt, rescue, and emergency modes
  • expose state through D-Bus, systemctl, and journal metadata
  • enforce resource controls and many sandboxing controls through unit configuration

The key production lesson is that PID 1 should own process lifetime. A daemon that double-forks, writes a stale pidfile, or escapes its service cgroup makes supervision less reliable. Prefer foreground services with Type=simple, Type=notify, or Type=exec, and let systemd manage the process tree.

Unit model

A unit is a named object that systemd knows how to load, order, start, stop, reload, monitor, or bind into another unit. Unit files are INI-style files with common sections such as [Unit] and [Install], plus type-specific sections such as [Service], [Timer], [Socket], or [Mount].

Unit typeWhat it representsProduction useCommon trap
.servicesupervised process or process setAPI daemons, workers, agentswriting for SysV daemon behavior instead of foreground execution
.targetsynchronization point and grouping nodeboot modes, dependency anchorsassuming targets run code themselves
.timertime-based activationbackups, cleanup, renewal jobsforgetting the matching .service name
.socketsocket-based activationlazy start, service dependency reductionignoring backlog, permissions, and protocol behavior
.mountmounted filesystemexplicit mount dependenciesusing the wrong escaped unit name for a path
.automounton-demand mount triggeravoid boot blocking on slow storagehiding intermittent storage latency until first access
.pathfilesystem path activationsimple local automationtreating it as a full file watcher pipeline
.slicecgroup resource partitionhost resource budgetsleaving everything in default slices
.scopeexternally created process grouptransient commands, containers, sessionsexpecting full service restart semantics

Load path precedence matters. Administrator-owned system units and drop-ins belong under /etc/systemd/system. Runtime units belong under /run/systemd/system. Package-owned units belong under /usr/lib/systemd/system or the distro equivalent. Do not edit vendor unit files in place on production hosts; use drop-ins so package upgrades remain manageable.

Boot transaction

systemd computes a boot transaction from the default target and the units pulled in through dependencies, generated units, device discovery, and preset state. It is not a linear script. Multiple jobs run in parallel when their ordering constraints allow it.

Rendering diagram...

On a laptop, boot analysis is often about shaving seconds or understanding what started. On production hosts, boot analysis is about determinism: remote filesystems, secrets, network readiness, cloud-init, storage activation, container runtimes, and monitoring agents must come up in a predictable order without deadlocking the machine.

Requirements vs ordering

The most common systemd mistake is confusing requirement dependencies with ordering dependencies.

DirectiveMeaningWhat it does not mean
Requires=If this unit is activated, also activate the listed unit; failure can propagateIt does not order start by itself
Wants=Weaker pull-in; listed unit failure does not fail the requiring unit by itselfIt does not prove readiness
BindsTo=Strong lifecycle binding to another unitIt still often needs ordering
PartOf=Stop or restart propagation from another unitIt does not pull the unit in at boot
After=Start this unit after listed unit start job completesIt does not pull the listed unit in
Before=Start this unit before listed unitIt does not pull the other unit in
Conflicts=Cannot be active togetherIt does not define which one wins without transaction context

Use both a requirement and an ordering directive when both are needed:

[Unit]
Description=Example API
Wants=network-online.target
After=network-online.target

Even that does not guarantee the remote dependency is healthy. network-online.target means the local network stack reached the distro's configured definition of online. It does not mean DNS, a database, a load balancer, or a Kubernetes service endpoint is reachable. In production, add application-level retries and health checks instead of encoding every remote fact into boot ordering.

Services

A service unit tells systemd how to start, stop, reload, and supervise a process.

[Unit]
Description=Guards an example HTTP service
Documentation=https://example.internal/runbooks/example-http
Wants=network-online.target
After=network-online.target

[Service]
Type=notify
User=example
Group=example
EnvironmentFile=-/etc/example/example.env
ExecStart=/usr/local/bin/example-http --config /etc/example/config.toml
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5s
TimeoutStartSec=45s
TimeoutStopSec=30s
WatchdogSec=30s
NotifyAccess=main
NoNewPrivileges=yes
PrivateTmp=yes
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/lib/example /var/log/example
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
AmbientCapabilities=CAP_NET_BIND_SERVICE
SystemCallFilter=@system-service
MemoryMax=512M
CPUQuota=200%

[Install]
WantedBy=multi-user.target

Service type is a contract:

TypeContractUse when
simpleprocess started by ExecStart is the main service immediatelymost foreground daemons without readiness signaling
execlike simple, but systemd waits until execve succeedscatching missing binary or permission failures earlier
forkingprocess forks and parent exitslegacy daemons only
oneshotcommand runs to completionmigrations, setup jobs, small host tasks
notifyservice calls sd_notify for readiness and watchdogproduction daemons that can signal real readiness
dbusreadiness is tied to acquiring a bus nameD-Bus services

For production, prefer Type=notify when the daemon supports it. It distinguishes "process exists" from "service is ready." For a local learning box, Type=simple is fine for most experiments.

Targets

A target is a named synchronization point, not a script. multi-user.target is the normal non-graphical multi-user boot target. graphical.target adds the display stack. rescue.target provides a single-user repair environment with more of the system mounted. emergency.target is smaller and useful when normal boot is broken.

Production guidance:

  • enable long-running services with WantedBy=multi-user.target unless they are explicitly graphical, early boot, or tied to another target
  • avoid custom boot targets unless you own the whole image or appliance behavior
  • know how to boot into rescue or emergency mode from the console before a bad unit breaks remote access
  • for cluster nodes, document what should happen when kubelet, container runtime, storage agents, or node exporters are disabled

Timers

Timers replace many cron use cases and integrate with unit state, logs, missed-run behavior, and dependency management.

# /etc/systemd/system/example-backup.timer
[Unit]
Description=Run example backup

[Timer]
OnCalendar=*-*-* 02:15:00
Persistent=true
RandomizedDelaySec=20m
AccuracySec=1m
Unit=example-backup.service

[Install]
WantedBy=timers.target
# /etc/systemd/system/example-backup.service
[Unit]
Description=Example backup job
Wants=network-online.target
After=network-online.target

[Service]
Type=oneshot
User=backup
ExecStart=/usr/local/sbin/example-backup
Timer choiceField effectProduction guidance
OnCalendar=wall-clock schedulinguse for daily, weekly, monthly operations
OnBootSec=relative to bootuse for delayed host tasks
OnUnitActiveSec=relative to last activationuse for periodic maintenance loops
Persistent=truecatch up missed wall-clock timers after downtimeuseful for backups and renewals, risky for expensive jobs after fleet outage
RandomizedDelaySec=spread execution timeuse on fleets to avoid synchronized load
AccuracySec=coalesce wakeupskeep loose unless exact timing matters

On clusters, prefer native controllers for cluster-level reconciliation. Use systemd timers for node-local tasks such as log cleanup, certificate renewal, agent housekeeping, or backup hooks that truly belong to the host.

Socket activation

Socket units let systemd bind sockets before the service runs and start the service when traffic arrives.

[Socket]
ListenStream=127.0.0.1:9000
SocketUser=example
SocketGroup=example
Accept=no

[Install]
WantedBy=sockets.target

Socket activation reduces explicit dependencies because clients can connect to a stable socket while systemd starts the provider. It is powerful for local IPC, D-Bus-adjacent services, and infrequently used daemons. It is not a magic performance feature. For high-volume production APIs, confirm backlog sizing, readiness behavior, connection handoff, protocol expectations, and observability before relying on it.

Mounts and automounts

systemd derives many mount units from /etc/fstab. A path becomes an escaped unit name; for example /var/lib/example maps to var-lib-example.mount. Mount units can block boot if storage is slow or unavailable. Automount units can defer the mount until first access.

Production rules:

  • for local disks required by services, make dependencies explicit through mount units or paths
  • for remote storage, choose between fail-fast, nofail, automount, and service-level retry deliberately
  • do not let an optional network mount block SSH or emergency access
  • monitor mount state and I/O errors, not only service state

Example service dependency on a required local mount:

[Unit]
RequiresMountsFor=/var/lib/example
After=var-lib-example.mount

Environment files and secrets

Environment= and EnvironmentFile= are configuration mechanisms, not secret stores. Environment variables are often visible through process inspection, crash dumps, service metadata, or accidental logging.

Use environment files for non-secret runtime configuration:

[Service]
EnvironmentFile=-/etc/example/example.env

The leading - means missing file is tolerated. That is convenient on a learning machine. In production, use it only for truly optional files. For required config, fail loudly.

For secrets, prefer a host secret manager, systemd credentials where available, kernel keyrings for narrow cases, or application-native secret retrieval with clear rotation semantics. See 08 Permissions Users Groups Capabilities and LSMs for Linux secret handling and incident response.

Drop-ins and overrides

Use drop-ins to change package units without editing vendor files:

sudo systemctl edit example.service
sudo systemctl cat example.service
sudo systemctl daemon-reload
sudo systemctl restart example.service

Drop-ins are merged in lexicographic order after the main unit. To reset a list directive, assign it to an empty value first:

[Service]
ExecStart=
ExecStart=/usr/local/bin/example --new-mode

Production guidance:

  • keep drop-ins small and named for intent, such as 10-hardening.conf or 20-resource-limits.conf
  • record why the override exists in config management or an image build recipe
  • after package upgrades, inspect systemctl cat and systemd-analyze verify
  • avoid manual one-off overrides on cluster nodes unless the incident record captures them

Journald

systemd-journald collects structured logs from the kernel, stdout and stderr of services, syslog clients, audit messages when forwarded, and native journal clients. It stores metadata such as unit, PID, UID, boot ID, cgroup, executable path, priority, and monotonic timestamp.

Important commands:

journalctl -b
journalctl -b -1
journalctl -u example.service
journalctl -u example.service --since "1 hour ago"
journalctl -p warning..alert
journalctl -k
journalctl -o short-iso
journalctl -o json-pretty -u example.service
journalctl --list-boots

Log persistence depends on storage configuration. Many distros keep journals in memory unless /var/log/journal exists or Storage=persistent is set in journald.conf. A local learning machine can use volatile logs to reduce disk writes. Production hosts should generally persist local boot logs even if a central log pipeline exists, because early boot, network loss, and collector failure are exactly when local evidence matters.

Operational cautions:

  • set retention and size limits so journald cannot consume the filesystem
  • forward to central logging, but do not assume forwarding captures early boot or late shutdown
  • prefer structured fields for custom services when possible
  • use journalctl --vacuum-time or --vacuum-size as a controlled operation, not a blind cron habit
  • preserve relevant logs before rebooting during incident response

Service hardening

systemd hardening is defense in depth. It does not replace application security, Unix permissions, LSM policy, patching, or network controls. It is still valuable because it constrains what a compromised daemon can do.

DirectiveEffectProduction caution
User= and Group=run as non-root identitycreate dedicated service users
DynamicUser=yesallocate ephemeral service identitygood for stateless services, awkward for preexisting file ownership
NoNewPrivileges=yesblock privilege gain through execmay break setuid helper workflows
CapabilityBoundingSet=limit retained capabilitiesavoid broad CAP_SYS_ADMIN
AmbientCapabilities=pass selected capabilities to non-root processuse only with tight bounding set
PrivateTmp=yesprivate /tmp namespacecan break services sharing temp paths
ProtectSystem=strictmake most system paths read-onlypair with explicit ReadWritePaths=
ProtectHome=yesblock home directory accessvalidate apps that read user files
PrivateDevices=yesrestrict device accessmay break hardware, GPU, FUSE, or loop use
RestrictAddressFamilies=restrict socket familiestest DNS, Unix sockets, and IPv6 needs
SystemCallFilter=seccomp syscall filteringvalidate under real workload
LockPersonality=yesblock personality changesusually safe for normal services
MemoryDenyWriteExecute=yesblock writable executable memory mappingscan break JIT runtimes
ProtectKernelTunables=yesblock writes to kernel tunablesusually appropriate for apps
ProtectControlGroups=yesblock direct cgroup modificationavoid for container managers

Use systemd-analyze security example.service as a review aid, not as an absolute score. Some low scores are correct for kubelet, container runtimes, storage agents, or observability tools because their job requires host access. Ordinary application services should be much tighter.

Watchdogs and readiness

A restart policy handles process exit. A watchdog handles process wedging after startup. With Type=notify, a service can call sd_notify("READY=1") when it is actually ready and periodically call WATCHDOG=1 before WatchdogSec expires.

Rendering diagram...

Production guidance:

  • watchdog pings must come from the main event loop or a meaningful health path, not a side thread that can keep pinging while the service is deadlocked
  • set RestartSec to avoid hot loops
  • combine watchdogs with external health checks; systemd sees the host-local process, not full user-visible service health
  • for cluster-managed workloads, avoid fighting the orchestrator with aggressive host-level restarts unless the service is node infrastructure

Restart policies

PolicyBehaviorUse
Restart=nodo not restartoneshot jobs, failure should stay visible
Restart=on-failurerestart on non-zero exit, signal, timeout, watchdogdefault for many daemons
Restart=alwaysrestart even on clean exitpersistent agents that should never stop
Restart=on-abnormalrestart on signal, timeout, watchdogavoid hiding clean application exits
Restart=on-watchdogrestart only on watchdog expiryspecialized supervision

Use rate limits:

[Unit]
StartLimitIntervalSec=5m
StartLimitBurst=5

[Service]
Restart=on-failure
RestartSec=10s

When start limits are hit, the unit stays failed until reset or the interval clears. During incidents, check both the original failure and the rate-limit state.

Resource controls

systemd maps units to cgroups. Resource controls let the host protect itself from a noisy service:

[Service]
MemoryMax=1G
MemoryHigh=768M
CPUQuota=150%
TasksMax=512
IOWeight=100

Local learning machines can use these controls to observe behavior under pressure. Production hosts should set budgets for untrusted or bursty services, but must understand failure semantics. MemoryMax can cause OOM kills inside the service cgroup. CPUQuota can increase latency. TasksMax can break thread-heavy runtimes. Resource limits need metrics and load testing.

On cluster nodes, know which layer owns the budget. Kubernetes usually owns pod cgroups, while systemd owns node services such as kubelet, containerd, journald, and monitoring agents. Do not set host unit limits that starve the orchestrator.

Transient units

Transient units are runtime units created through the systemd API, commonly with systemd-run.

systemd-run --unit=debug-shell --pty /bin/bash
systemd-run --scope -p MemoryMax=2G make -j8
systemd-run --on-calendar='*:0/15' --unit=example-poll /usr/local/bin/poll

Use transient units for controlled debugging, one-off commands with cgroup boundaries, and runtime experiments. For production, durable behavior belongs in versioned unit files, image builds, or configuration management. A transient unit that fixes an incident should be converted into a tracked change or explicitly removed.

systemctl field commands

systemctl status example.service
systemctl start example.service
systemctl stop example.service
systemctl restart example.service
systemctl reload example.service
systemctl enable --now example.service
systemctl disable --now example.service
systemctl mask example.service
systemctl unmask example.service
systemctl list-units --failed
systemctl list-dependencies example.service
systemctl show example.service
systemctl cat example.service
systemctl edit example.service
systemctl daemon-reload
systemctl reset-failed example.service

Production habits:

  • run systemctl cat before assuming which file is active
  • run systemctl show -p FragmentPath -p DropInPaths -p ActiveState -p SubState -p Result unit
  • use enable --now only when you intend both boot activation and immediate start
  • use mask sparingly; it prevents manual starts too
  • after unit file changes, run daemon-reload, then restart or reload the unit as appropriate

Boot analysis

systemd-analyze
systemd-analyze blame
systemd-analyze critical-chain
systemd-analyze plot > boot.svg
systemd-analyze verify /etc/systemd/system/example.service
journalctl -b -p warning..alert
systemctl list-jobs

blame shows elapsed activation time, not necessarily blocking time. critical-chain is better for finding what delayed the boot path. For production boot regressions, compare boot IDs, kernel versions, initramfs changes, storage discovery, network wait units, cloud-init, and failed dependencies.

Failed unit troubleshooting

Start with the unit and the current boot:

systemctl status example.service
journalctl -b -u example.service
systemctl show example.service -p Result -p ExecMainStatus -p ExecMainCode -p NRestarts
systemctl cat example.service
systemd-analyze verify /etc/systemd/system/example.service

Then classify the failure:

SymptomLikely classNext checks
status=203/EXECbinary missing, not executable, bad path, wrong interpreterls -l, shebang, mount availability, SELinux/AppArmor denial
status=217/USERconfigured user or group missinggetent passwd, getent group, image provisioning
timeout on startreadiness mismatch or dependency hangType=, TimeoutStartSec=, app logs, network and storage
rapid restart then failedrestart rate limitStartLimit*, original exit, reset-failed after fix
permission deniedDAC, capability, LSM, read-only paths08 Permissions Users Groups Capabilities and LSMs, journalctl -k, audit logs
works manually but not as serviceenvironment, working directory, privileges, namespacessystemctl show -p Environment, WorkingDirectory=, hardening directives
no logsstdout handling, early crash, journal storageStandardOutput=, journalctl _PID=, core dumps

Do not fix production units by blindly adding After=network-online.target, running as root, disabling hardening, or increasing timeouts. Those changes often hide the failure mode. Reproduce the service environment:

sudo systemd-run --pty --same-dir --wait --collect \
  -p User=example \
  -p WorkingDirectory=/var/lib/example \
  /usr/local/bin/example-http --check

Local vs production operating stance

AreaLocal learning machineProduction host or cluster node
Unit editsdirect experimentation is fineversion controlled, reviewed, and rolled out through config management
Logsvolatile logs may be acceptablepersistent local journal plus central aggregation
Hardeninglearn one directive at a timebaseline hardening with documented exceptions
Timersconvenient replacement for cronjittered, monitored, and owned by a runbook
Restartsaggressive restart can aid iterationrate-limited, observable, and coordinated with orchestrators
Dependenciesacceptable to over-specify while learningkeep explicit dependencies minimal and prove readiness in the app
Transient unitsexcellent for explorationincident tool only unless converted to durable config

The field rule: systemd configuration is executable operations policy. Every directive should answer one of four questions: what starts, when it starts, what it can touch, and what happens when it fails.

Reference URLs