Deployments ReplicaSets StatefulSets DaemonSets Jobs and CronJobs

Reading time
7 min read
Word count
1220 words
Diagram count
2 diagrams

Source: Victor Bona's Obsidian Compendium snapshot, Knowledge base/kubernetes/03 Deployments ReplicaSets StatefulSets DaemonSets Jobs and CronJobs.md.

Purpose: Compare Kubernetes workload controllers and show how to operate stateless services, stateful services, node agents, batch jobs, scheduled jobs, rolling updates, and rollbacks safely.

Deployments, ReplicaSets, StatefulSets, DaemonSets, Jobs, and CronJobs

Workload controllers reconcile desired state into Pods described in 02 Containers Pods and Workload Primitives. The controller choice defines identity, replacement behavior, update semantics, ordering, and failure handling. Scheduling and capacity behavior for the Pods they create is covered in 08 Scheduling Resources Requests Limits QoS and Autoscaling.

Rendering diagram...

Controller selection

ControllerBest forIdentityReplacement behaviorAvoid when
DeploymentStateless APIs, web apps, workers that can be duplicatedInterchangeable Pods behind labelsCreates new ReplicaSets for template changesEach replica needs stable storage or ordered identity.
ReplicaSetLow-level replica maintenanceInterchangeable PodsMaintains count onlyYou need rollouts, rollback history, or normal app operations.
StatefulSetQuorum systems, ordered replicas, stable network IDs, persistent volumesStable ordinal names and PVCsOrdered by default, identity persistsThe app can be stateless or managed service is available.
DaemonSetNode agents, log collectors, CNI, CSI, monitoring exportersOne Pod per matching nodeAdds/removes Pods as nodes match selectorThe workload is request-driven and should scale by traffic.
JobFinite batch, migrations, backfills, data repairCompletion-oriented PodsRetries until completion policy is metThe process is a long-running service.
CronJobScheduled Job creationTime-based Job objectsCreates Jobs by schedule and policyExact-once timing is required without idempotency.

Deployments

A Deployment manages ReplicaSets and provides declarative rollouts for stateless Pod templates.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments-api
  labels:
    app.kubernetes.io/name: payments-api
spec:
  replicas: 4
  revisionHistoryLimit: 5
  progressDeadlineSeconds: 600
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app.kubernetes.io/name: payments-api
  template:
    metadata:
      labels:
        app.kubernetes.io/name: payments-api
        app.kubernetes.io/component: api
    spec:
      serviceAccountName: payments-api
      containers:
        - name: api
          image: ghcr.io/example/payments-api:2.8.1
          ports:
            - name: http
              containerPort: 8080
          readinessProbe:
            httpGet:
              path: /ready
              port: http
          resources:
            requests:
              cpu: 250m
              memory: 256Mi
            limits:
              memory: 512Mi

Commands:

kubectl apply -f payments-api-deployment.yaml
kubectl get deploy,rs,pod -n prod -l app.kubernetes.io/name=payments-api
kubectl rollout status -n prod deployment/payments-api
kubectl describe deployment -n prod payments-api
kubectl scale -n prod deployment/payments-api --replicas=6

ReplicaSets

ReplicaSets maintain a replica count for a Pod template. Deployments create and manage ReplicaSets, so direct ReplicaSet authoring is rare. Direct use is mainly for learning, specialized controllers, or repair of orphaned resources.

Risk: changing a Deployment selector or Pod labels incorrectly can orphan ReplicaSets or make multiple controllers fight over Pods. Treat selectors as immutable in practice.

Rolling updates

During a Deployment rolling update, Kubernetes creates a new ReplicaSet, scales it up, and scales the old ReplicaSet down while respecting maxSurge, maxUnavailable, readiness, and PDBs.

Rendering diagram...

maxSurge and maxUnavailable are capacity and availability levers.

SettingMeaningBest fitTradeoff
maxSurge: 1, maxUnavailable: 0Add one extra Pod before removing old PodsUser-facing APIs with strict availabilityNeeds spare cluster capacity.
maxSurge: 25%, maxUnavailable: 25%Default proportional rolloutMedium-sized stateless servicesCan reduce serving capacity during rollout.
maxSurge: 0, maxUnavailable: 1Replace in placeTight clusters where extra capacity is unavailableLower availability and slower recovery from bad releases.
Recreate strategyStop all old Pods, then start new PodsSingle-writer apps that cannot run mixed versionsFull downtime.

Rollout commands:

kubectl set image -n prod deployment/payments-api api=ghcr.io/example/payments-api:2.8.2
kubectl rollout status -n prod deployment/payments-api --timeout=10m
kubectl rollout history -n prod deployment/payments-api
kubectl rollout history -n prod deployment/payments-api --revision=12
kubectl rollout undo -n prod deployment/payments-api
kubectl rollout undo -n prod deployment/payments-api --to-revision=11
kubectl rollout pause -n prod deployment/payments-api
kubectl rollout resume -n prod deployment/payments-api

Production guidance:

  • Use immutable image tags or digests for reproducible rollbacks.
  • Keep revisionHistoryLimit large enough for operational rollback but small enough to avoid clutter.
  • Make readiness strict enough that new Pods enter Service endpoints only when usable.
  • Use PDBs from 02 Containers Pods and Workload Primitives to protect voluntary disruption during rollouts and drains.
  • Watch progressDeadlineSeconds; a failed rollout should fail visibly rather than hang unnoticed.

StatefulSets

StatefulSets manage Pods with stable names, stable ordinals, stable network identity, and per-replica PVCs. A Pod named postgres-0 is replaced as postgres-0, not as a random new identity.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: ledger-db
spec:
  serviceName: ledger-db
  replicas: 3
  podManagementPolicy: OrderedReady
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app.kubernetes.io/name: ledger-db
  template:
    metadata:
      labels:
        app.kubernetes.io/name: ledger-db
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: postgres
          image: postgres:16
          ports:
            - name: postgres
              containerPort: 5432
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
          resources:
            requests:
              cpu: "1"
              memory: 2Gi
            limits:
              memory: 4Gi
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 100Gi

Stateful workload tradeoffs:

NeedKubernetes supportOperational cost
Stable network namespod-ordinal.service.namespace.svcClients or peers must understand membership.
Stable disk per replicavolumeClaimTemplatesStorage lifecycle, backup, restore, expansion, and zone affinity matter.
Ordered startup and updateOrderedReadySlow or stuck replica blocks later replicas.
Parallel managementpodManagementPolicy: ParallelFaster operations but app must handle concurrency.
Scale downHighest ordinal removed firstData movement and quorum rules must be planned.

When not to run databases on Kubernetes

Do not run a database on Kubernetes just because the app is already there. Prefer a managed database or dedicated database platform when:

  • The team cannot operate backup, restore, PITR, failover, replication, and upgrade procedures.
  • Storage classes do not provide predictable latency, zone behavior, expansion, and snapshot integration.
  • The database is business-critical and there is no regular restore drill.
  • The cluster is frequently rebuilt, aggressively autoscaled, or managed by teams without database operational ownership.
  • Licensing, support, or compliance requires a vendor-managed control plane.
  • The workload requires specialized hardware, kernel tuning, or IO isolation the cluster cannot guarantee.

Running databases on Kubernetes can be reasonable when the team owns the database SLO, uses an operator with clear failure semantics, validates restore procedures, reserves capacity, and understands storage topology.

DaemonSets

DaemonSets run one Pod per matching node. They are the normal shape for CNI agents, CSI node plugins, log collectors, metrics agents, security sensors, and node-local proxies.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-log-agent
  namespace: observability
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: node-log-agent
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 10%
  template:
    metadata:
      labels:
        app.kubernetes.io/name: node-log-agent
    spec:
      serviceAccountName: node-log-agent
      tolerations:
        - operator: Exists
      containers:
        - name: agent
          image: ghcr.io/example/log-agent:3.2.0
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              memory: 256Mi

Production guidance:

  • Add tolerations intentionally. operator: Exists puts the agent on every tainted node, including control-plane and specialized nodes.
  • Budget DaemonSet requests in every node pool. One small per-node agent becomes large at cluster scale.
  • Use rolling updates with conservative maxUnavailable for critical agents.
  • Avoid mounting host paths or privileged mode unless the node integration requires it.

Jobs

Jobs run Pods until a completion condition is met. They are for finite work such as migrations, reports, imports, backfills, and repair tasks.

apiVersion: batch/v1
kind: Job
metadata:
  name: ledger-backfill-2026-06-15
spec:
  completions: 20
  parallelism: 4
  backoffLimit: 3
  activeDeadlineSeconds: 7200
  ttlSecondsAfterFinished: 86400
  template:
    spec:
      restartPolicy: OnFailure
      containers:
        - name: backfill
          image: ghcr.io/example/ledger-tools:1.9.0
          args: ["backfill", "--shards=20"]
          resources:
            requests:
              cpu: 500m
              memory: 512Mi
            limits:
              memory: 1Gi

Job rules:

  • Make work idempotent. Retries and duplicate starts can happen.
  • Use activeDeadlineSeconds to cap runaway work.
  • Use ttlSecondsAfterFinished to clean completed Jobs while preserving enough history for diagnosis.
  • Separate schema migrations from application rollout if rollback semantics differ.
  • Prefer indexed Jobs for shard-specific work when each completion needs a stable index.

Commands:

kubectl apply -f ledger-backfill-job.yaml
kubectl get job,pod -n prod -l job-name=ledger-backfill-2026-06-15
kubectl logs -n prod job/ledger-backfill-2026-06-15
kubectl describe job -n prod ledger-backfill-2026-06-15
kubectl delete job -n prod ledger-backfill-2026-06-15

CronJobs

CronJobs create Jobs on a schedule.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-ledger-close
spec:
  schedule: "15 2 * * *"
  timeZone: "Etc/UTC"
  concurrencyPolicy: Forbid
  startingDeadlineSeconds: 900
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 5
  jobTemplate:
    spec:
      backoffLimit: 2
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: close-ledger
              image: ghcr.io/example/ledger-tools:1.9.0
              args: ["close-ledger", "--date=$(RUN_DATE)"]

CronJob tradeoffs:

SettingEffectGuidance
concurrencyPolicy: AllowNew Job can overlap old JobUse only for independent runs.
concurrencyPolicy: ForbidSkip new run if old run is activeBest default for backups, billing, reports, and maintenance.
concurrencyPolicy: ReplaceStop old run and start new oneUse only when latest run fully supersedes old work.
startingDeadlineSecondsLimits late starts after controller downtimeSet according to business usefulness of late execution.
timeZoneInterprets schedule in a named zonePrefer UTC unless business rules require local time.

Manual run from CronJob:

kubectl create job -n prod manual-ledger-close-20260615 --from=cronjob/nightly-ledger-close
kubectl get cronjob,job -n prod
kubectl describe cronjob -n prod nightly-ledger-close

Common mistakes

MistakeSymptomCorrection
Using Deployment for a database that needs stable identityData attaches to the wrong replacement Pod or peer identity changesUse StatefulSet or managed database.
Directly editing ReplicaSetsDeployment later reverts or replaces the changeChange the Deployment template.
Rolling update with weak readinessBad Pods receive traffic and rollout looks healthyMake readiness represent real serving capability.
maxUnavailable too highRollout causes user-visible capacity dropUse maxSurge and capacity planning.
No PDBNode drain removes too many replicasAdd PDB and verify it does not block normal maintenance.
CronJob work is not idempotentDuplicate billing, duplicate emails, corrupt importsUse run keys, locks, or transactional guards.
DaemonSet requests ignored in capacity mathNew nodes are immediately overcommittedInclude DaemonSet overhead in node pool sizing.

Troubleshooting

Deployment rollout:

kubectl rollout status -n prod deployment/payments-api
kubectl describe deployment -n prod payments-api
kubectl get rs -n prod -l app.kubernetes.io/name=payments-api
kubectl get pod -n prod -l app.kubernetes.io/name=payments-api -o wide
kubectl get events -n prod --sort-by=.lastTimestamp

StatefulSet:

kubectl get statefulset,pod,pvc -n prod -l app.kubernetes.io/name=ledger-db
kubectl describe pod -n prod ledger-db-0
kubectl get endpointslice -n prod -l kubernetes.io/service-name=ledger-db
kubectl describe pvc -n prod data-ledger-db-0

DaemonSet:

kubectl get daemonset -n observability node-log-agent
kubectl get pod -n observability -l app.kubernetes.io/name=node-log-agent -o wide
kubectl describe daemonset -n observability node-log-agent

Job and CronJob:

kubectl get job,pod -n prod
kubectl describe job -n prod ledger-backfill-2026-06-15
kubectl logs -n prod job/ledger-backfill-2026-06-15
kubectl get cronjob -n prod nightly-ledger-close -o yaml

Review checklist

  • Controller type matches identity, lifecycle, and update semantics.
  • Selectors are stable and match only intended Pods.
  • Rollout strategy reflects availability, spare capacity, and dependency compatibility.
  • Rollback uses immutable image references or known-good versions.
  • StatefulSet storage, backup, restore, and zone placement are documented and tested.
  • DaemonSet tolerations, host access, and per-node resource cost are intentional.
  • Jobs and CronJobs are idempotent, deadline-bound, and cleaned after completion.
  • PDB, readiness probes, and autoscaling settings are reviewed together.