Containers Pods and Workload Primitives

Reading time
7 min read
Word count
1391 words
Diagram count
2 diagrams

Source: Victor Bona's Obsidian Compendium snapshot, Knowledge base/kubernetes/02 Containers Pods and Workload Primitives.md.

Purpose: Explain the Pod-level primitives that every Kubernetes workload controller builds on, including container lifecycle, sidecar patterns, readiness, disruption behavior, identity, and node-local special cases.

Containers, Pods, and workload primitives

This note is the base layer for 03 Deployments ReplicaSets StatefulSets DaemonSets Jobs and CronJobs and 08 Scheduling Resources Requests Limits QoS and Autoscaling. A Kubernetes workload controller does not run a container directly. It creates or adopts Pods, and the kubelet on the selected node turns each PodSpec into runtime containers, volumes, probes, cgroups, network namespace, and status.

Rendering diagram...

Pod model

A Pod is the smallest schedulable Kubernetes unit. Containers in one Pod share:

  • One network namespace: same Pod IP, same localhost, shared port space.
  • Optional volumes: containers can mount the same volume at different paths.
  • Fate and placement: all containers in a Pod are scheduled to the same node and are normally recreated together.
  • Security context inheritance: Pod-level defaults can be refined at container level.

Pods are deliberately disposable. A bare Pod is useful for debugging, static control-plane components, or one-off experiments, but production services should be owned by a controller such as Deployment, StatefulSet, DaemonSet, or Job.

Pod anatomy

apiVersion: v1
kind: Pod
metadata:
  name: payments-api-debug
  labels:
    app.kubernetes.io/name: payments-api
    app.kubernetes.io/component: api
spec:
  serviceAccountName: payments-api
  restartPolicy: Always
  terminationGracePeriodSeconds: 45
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  initContainers:
    - name: wait-for-schema
      image: ghcr.io/example/db-tools:1.4.2
      command: ["sh", "-c", "until db-migrate status --ready; do sleep 2; done"]
      resources:
        requests:
          cpu: 50m
          memory: 64Mi
        limits:
          memory: 128Mi
  containers:
    - name: api
      image: ghcr.io/example/payments-api:2.8.1
      ports:
        - name: http
          containerPort: 8080
      env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
      readinessProbe:
        httpGet:
          path: /ready
          port: http
        periodSeconds: 5
        failureThreshold: 2
      livenessProbe:
        httpGet:
          path: /healthz
          port: http
        initialDelaySeconds: 30
        periodSeconds: 10
      startupProbe:
        httpGet:
          path: /startup
          port: http
        failureThreshold: 30
        periodSeconds: 2
      lifecycle:
        preStop:
          exec:
            command: ["sh", "-c", "sleep 10"]
      resources:
        requests:
          cpu: 250m
          memory: 256Mi
        limits:
          memory: 512Mi

Containers inside a Pod

Container typeRuns whenPrimary useProduction guidance
App containerAfter init containers completeMain process, API, worker, web serverKeep one primary responsibility per container. Expose clear health checks.
Init containerBefore app containers start, sequentiallyBlocking setup such as migrations, permissions, dependency checksMake idempotent. Avoid long unbounded waits that hide dependency failures.
Native sidecar as restartable init containerStarts in init phase and keeps running with restartPolicy: AlwaysLog shippers, mesh proxies, local agents that must start before app containersPrefer when sidecar startup ordering matters and the cluster version supports restartable init containers.
Ephemeral containerAdded to a running Pod for debuggingInspect namespaces, process state, network toolsNever rely on it for steady-state behavior. It cannot define ports or probes.

Init containers

Init containers run sequentially and must succeed before ordinary app containers start. They are good for deterministic setup, but they are not a substitute for application-level resilience. An app should still handle dependency restarts after it is already running.

apiVersion: v1
kind: Pod
metadata:
  name: init-example
spec:
  initContainers:
    - name: prepare-config
      image: busybox:1.36
      command: ["sh", "-c", "cp /defaults/* /workdir/"]
      volumeMounts:
        - name: workdir
          mountPath: /workdir
  containers:
    - name: app
      image: ghcr.io/example/app:1.0.0
      volumeMounts:
        - name: workdir
          mountPath: /app/config
  volumes:
    - name: workdir
      emptyDir: {}

Native sidecars

Native sidecars are expressed as init containers with restartPolicy: Always. They start before app containers, remain running, and are restarted independently by the kubelet. This is different from a classic sidecar in spec.containers, where all regular containers start without explicit ordering.

apiVersion: v1
kind: Pod
metadata:
  name: native-sidecar-example
spec:
  initContainers:
    - name: log-agent
      image: ghcr.io/example/log-agent:3.2.0
      restartPolicy: Always
      volumeMounts:
        - name: logs
          mountPath: /var/log/app
  containers:
    - name: app
      image: ghcr.io/example/app:1.0.0
      volumeMounts:
        - name: logs
          mountPath: /var/log/app
  volumes:
    - name: logs
      emptyDir: {}

Use native sidecars when the helper must be running before the main app starts, such as a local proxy, log collector, or node integration. Keep resource requests explicit because sidecars change total Pod request and therefore scheduling behavior in 08 Scheduling Resources Requests Limits QoS and Autoscaling.

Ephemeral containers

Ephemeral containers are injected through the ephemeralcontainers subresource for live debugging. They share namespaces according to the Pod configuration and runtime support, which makes them useful when the app image is minimal.

kubectl debug -n prod pod/payments-api-6fbc9d8d8f-q2n8s -it \
  --image=busybox:1.36 --target=api -- sh

kubectl get pod -n prod payments-api-6fbc9d8d8f-q2n8s -o yaml
kubectl describe pod -n prod payments-api-6fbc9d8d8f-q2n8s

Operational constraints:

  • Ephemeral containers are for investigation, not repair by mutation.
  • They do not restart automatically.
  • They should be controlled by RBAC because they can expose process, filesystem, and network context.
  • Admission policy should restrict privileged debugging images in production namespaces.

Restart policies and container restarts

spec.restartPolicy applies to all app containers in a Pod, with values Always, OnFailure, and Never.

PolicyTypical ownersBehaviorCommon mistake
AlwaysDeployment, StatefulSet, DaemonSetRestart containers whenever they exitUsing it for finite batch work and creating endless retries.
OnFailureJobRestart failed containers until Job rules are satisfiedAssuming a completed container will rerun after success.
NeverJob, debug PodLeave failed or completed containers stoppedForgetting that the controller may create a replacement Pod instead.

Container restart backoff is local kubelet behavior. A Pod in CrashLoopBackOff is usually scheduled and running at the infrastructure layer, but at least one container repeatedly exits. Debug the process, configuration, secrets, probes, and dependencies before blaming scheduling.

kubectl get pod -n prod payments-api-6fbc9d8d8f-q2n8s
kubectl describe pod -n prod payments-api-6fbc9d8d8f-q2n8s
kubectl logs -n prod payments-api-6fbc9d8d8f-q2n8s -c api --previous
kubectl get events -n prod --field-selector involvedObject.name=payments-api-6fbc9d8d8f-q2n8s --sort-by=.lastTimestamp

Probes, readiness, and readiness gates

MechanismDecidesEffect
Startup probeWhether the app has finished slow startupDisables liveness and readiness failure handling until it succeeds.
Readiness probeWhether the Pod should receive Service trafficRemoves or adds Pod IPs from Service endpoints.
Liveness probeWhether the container should be restartedKubelet restarts the container after repeated failures.
Readiness gateWhether extra Pod conditions are satisfiedPod is not Ready until custom conditions are true.

Readiness gates let external controllers participate in readiness. They are useful for load balancer registration, service mesh warmup, data-plane programming, or custom admission flows where the application process is healthy but traffic should not arrive yet.

apiVersion: v1
kind: Pod
metadata:
  name: api-with-readiness-gate
spec:
  readinessGates:
    - conditionType: "network.example.com/LoadBalancerReady"
  containers:
    - name: api
      image: ghcr.io/example/api:1.0.0
      readinessProbe:
        httpGet:
          path: /ready
          port: 8080

Production rules:

  • Readiness should answer "can I serve this request class now", not "is the process alive".
  • Liveness should detect unrecoverable local wedged states, not downstream outages.
  • Startup probes prevent slow boot from being killed by liveness.
  • Readiness gates require a controller that patches Pod status conditions. Without it, Pods remain not ready.

PodDisruptionBudget

A PodDisruptionBudget constrains voluntary disruptions such as node drains and some cluster maintenance. It does not prevent involuntary loss from crashes, node power loss, kernel panic, eviction, or bad app releases.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: payments-api
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: payments-api
PDB shapeGood fitRisk
minAvailable: NServices where absolute available replica count mattersCan block drains if replicas are already degraded.
maxUnavailable: NHomogeneous horizontally scaled servicesBad if N is too large for real traffic headroom.
No PDBDisposable dev workloads and single-instance noncritical tasksVoluntary disruption can remove all replicas at once.

PDBs should be paired with sane rolling update settings in 03 Deployments ReplicaSets StatefulSets DaemonSets Jobs and CronJobs and real capacity planning in 08 Scheduling Resources Requests Limits QoS and Autoscaling.

Static Pods

Static Pods are manifests read directly from a node filesystem by kubelet, usually from /etc/kubernetes/manifests. The API server sees mirror Pods, but no workload controller owns them. They are commonly used by kubeadm for control-plane components.

sudo ls /etc/kubernetes/manifests
kubectl get pods -n kube-system -o wide
kubectl describe pod -n kube-system kube-apiserver-control-plane-1

Tradeoffs:

BenefitCost
Kubelet can run critical components before higher-level controllers existUpdates are node-file operations, not Deployment rollouts.
Works even when the scheduler is unavailableStatic Pods cannot use ServiceAccounts, ConfigMaps, Secrets, or dynamic admission like normal Pods.
Simple bootstrap mechanismOperational drift can happen if node files are managed inconsistently.

Avoid static Pods for application workloads. Use them only when node-local bootstrap or control-plane self-hosting requires kubelet-level ownership.

Workload identity considerations

A Pod's identity is primarily its ServiceAccount plus projected tokens, labels, namespace, and network identity. Treat identity as an explicit production design, not a default.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: payments-api
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/payments-api
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments-api
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: payments-api
  template:
    metadata:
      labels:
        app.kubernetes.io/name: payments-api
    spec:
      serviceAccountName: payments-api
      automountServiceAccountToken: true
      containers:
        - name: api
          image: ghcr.io/example/payments-api:2.8.1

Guidance:

  • Create one ServiceAccount per workload or trust boundary.
  • Disable token automount for workloads that never call the Kubernetes API.
  • Prefer projected, short-lived tokens over long-lived static credentials.
  • Scope RBAC to verbs and resources actually needed.
  • For cloud access, prefer provider workload identity integration over storing cloud keys in Secrets.
  • Treat sidecars and ephemeral containers as part of the same Pod trust boundary unless isolation is explicitly enforced.

Common mistakes

MistakeSymptomCorrection
Running production apps as bare PodsApp disappears after delete or node failureUse a controller from 03 Deployments ReplicaSets StatefulSets DaemonSets Jobs and CronJobs.
Liveness checks depend on databasesTransient database outage restarts every app replicaMove dependency checks to readiness or app logic.
No resource requestsPods overpack nodes and autoscalers lack signalSet CPU and memory requests as in 08 Scheduling Resources Requests Limits QoS and Autoscaling.
Sidecar has no requestScheduler underestimates Pod footprintBudget for every container, including native sidecars and agents.
PDB on a single replica with minAvailable: 1Node drains blockAccept downtime, add replicas, or use maintenance-specific handling.
Readiness gate without controllerPod never becomes ReadyDeploy and monitor the status-patching controller.
Overprivileged debug accessDebug shell becomes production escape pathRestrict ephemeral container RBAC and image policy.

Troubleshooting flow

Rendering diagram...

Commands:

kubectl get pod -n prod -o wide
kubectl describe pod -n prod <pod>
kubectl logs -n prod <pod> -c <container> --previous
kubectl get pod -n prod <pod> -o jsonpath='{.status.containerStatuses[*].state}'
kubectl get pod -n prod <pod> -o jsonpath='{.status.conditions}'
kubectl auth can-i get pods --as=system:serviceaccount:prod:payments-api -n prod

Review checklist

  • The Pod is owned by the right controller unless it is intentionally static or debug-only.
  • Every app, init, and sidecar container has resource requests and memory limits.
  • Liveness, readiness, and startup probes answer different questions.
  • Native sidecars use initContainers[*].restartPolicy: Always only when startup ordering matters.
  • The workload ServiceAccount is scoped, named, and reviewed.
  • PDB settings match replica count, availability target, and drain behavior.
  • Termination grace and preStop give the app time to stop receiving traffic and finish in-flight work.
  • Debug access through ephemeral containers is RBAC-controlled and audited.