GUIDE · PREVIEW
GUIDE / SER.57
source: docs/guide/services/Maintainer.md
Services

Maintainer

Role

The maintainer is the per-host agent that makes a machine a FortrOS node. It runs as a tier 1 node service (baked into the generation image, managed by s6-rc) and is the single owner of all replicated state on the node.

If the maintainer is healthy, the node is healthy. The boot watchdog ties generation health to maintainer readiness.

What It Owns

  • Gossip mesh participation: SWIM protocol via foca. Failure detection, hash digest broadcasting, membership tracking.
  • CRDT state trees: All state trees (org operational, org config, workload desired, workload observed) are owned by the maintainer. Other services access state via IPC, never directly.
  • TreeSync: Serves and pulls state tree data over TCP/WireGuard. Handles Merkle tree exchange, leaf-level merge, push-notify.
  • WireGuard mesh management: Adds/removes WireGuard peers based on CRDT membership changes. Manages the overlay topology.
  • Certificate lifecycle: Short-lived cert renewal via the gossip mesh. Revocation enforcement at the conn_auth layer.
  • Workload IPC server: Localhost port 7208. The reconciler reads desired state and reports observed state through this interface.
  • Health reporting: Collects system metrics and reports via gossip for the monitoring system.

Why It's One Service

All replicated state goes through one process because CRDT operations must be serialized locally. If multiple processes modified the same state tree concurrently, the local state could become inconsistent before it ever reaches gossip. The maintainer is the serialization point.

Other services (reconciler, key service, provisioner) interact with the maintainer via IPC. They never touch CRDTs, gossip, or WireGuard directly. This is the silo'd design principle applied to the most critical service.

Readiness

The maintainer signals readiness via s6 notification-fd when:

  1. WireGuard interface is up
  2. Gossip mesh joined (at least one peer contacted)
  3. Initial TreeSync pull from a peer or lighthouse complete
  4. Workload IPC server listening

Only after readiness does s6-rc start dependent services (reconciler, key service). The boot watchdog marks the generation as healthy.

Links