Production Patterns Anti Patterns and Reference Architectures

Reading time
4 min read
Word count
767 words
Diagram count
3 diagrams

Source: Victor Bona's Obsidian Compendium snapshot, Knowledge base/kubernetes/16 Production Patterns Anti Patterns and Reference Architectures.md.

Purpose: explain practical Kubernetes production architectures, repeatable patterns, and dangerous anti-patterns for application and platform teams.

Production Patterns, Anti Patterns, and Reference Architectures

Production Kubernetes is a set of disciplined defaults. The platform should make ordinary services easy to deploy safely, make dangerous operations visible, and keep failure domains understandable. Architecture quality shows up during deploys, node drains, bad releases, capacity pressure, network incidents, and restores.

Core links: Kubernetes, 03 Deployments ReplicaSets StatefulSets DaemonSets Jobs and CronJobs, 04 Services DNS Ingress Gateway API and Traffic Routing, 07 Storage Volumes PVCs StorageClasses CSI and Stateful Data, 10 Observability Logging Metrics Tracing Events and Probes, 12 Helm Kustomize Manifests and Release Engineering, 14 Cluster Operations Upgrades Backup Restore and Disaster Recovery, Littles law and efficient queue strategy.

Reference Architecture

Rendering diagram...

This shape keeps stateful dependencies explicit, routes external traffic through controlled ingress, and drives changes through GitOps.

Golden Path Application Pattern

Recommended minimum for a stateless service:

  • Deployment with multiple replicas.
  • Readiness and liveness probes with different purposes.
  • CPU and memory requests.
  • Memory limit if the runtime handles it safely.
  • Service with stable DNS.
  • Ingress or Gateway when external traffic is needed.
  • ConfigMap for non-secret config.
  • Secret or external secret reference for sensitive config.
  • PodDisruptionBudget.
  • HorizontalPodAutoscaler when scaling signal is known.
  • NetworkPolicy.
  • ServiceMonitor or equivalent metrics discovery.
  • Owner, app, version, environment, and cost labels.

Deployment excerpt:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-api
  namespace: checkout
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  selector:
    matchLabels:
      app: checkout-api
  template:
    metadata:
      labels:
        app: checkout-api
        owner: team-checkout
    spec:
      containers:
        - name: api
          image: registry.example.com/checkout-api:2.7.0
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /live
              port: 8080
            initialDelaySeconds: 30
          resources:
            requests:
              cpu: 300m
              memory: 384Mi
            limits:
              memory: 768Mi

PDB:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: checkout-api
  namespace: checkout
spec:
  minAvailable: 3
  selector:
    matchLabels:
      app: checkout-api

HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: checkout-api
  namespace: checkout
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: checkout-api
  minReplicas: 4
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65

Probe Patterns

ProbePurposeShould fail when
ReadinessRemove pod from Service endpointsThe pod cannot serve traffic correctly
LivenessRestart a stuck containerThe process is unrecoverably wedged
StartupGive slow apps time to bootStartup exceeded expected bounds

Anti-pattern: using the same deep dependency check for readiness and liveness. If the database is down, readiness should fail. Liveness should usually not restart every pod and amplify the incident.

Traffic Patterns

PatternUse whenWatch out for
Rolling updateOrdinary stateless changesReadiness must be accurate
Blue greenNeed full pre-cutover validationDouble capacity and data compatibility
CanaryNeed gradual exposureRequires metrics and traffic control
Shadow trafficNeed observe-only validationPrivacy and duplicate side effects
Feature flagNeed runtime controlFlag debt and hidden combinations

Commands:

kubectl rollout status deployment/checkout-api -n checkout
kubectl rollout history deployment/checkout-api -n checkout
kubectl rollout undo deployment/checkout-api -n checkout
kubectl get endpointslices -n checkout
kubectl describe ingress checkout -n checkout

Resilience Patterns

Use these together:

  • Multiple replicas across zones.
  • Topology spread constraints.
  • Pod anti-affinity for critical replicas.
  • PodDisruptionBudgets.
  • Graceful shutdown and terminationGracePeriodSeconds.
  • Connection draining at ingress and application layer.
  • Idempotent consumers for queue workers.
  • Backpressure instead of unbounded concurrency.

Topology spread:

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: checkout-api

Stateful Workloads

Kubernetes can run stateful workloads, but production state requires stronger discipline than stateless services.

StatefulSet is useful for:

  • Stable pod identities.
  • Stable persistent volume claims.
  • Ordered rollout or startup when needed.
  • Systems designed for cluster membership.

Use external managed databases when the team cannot operate storage, backups, failover, upgrades, and data integrity inside Kubernetes.

Stateful checklist:

  • Backup and restore tested.
  • StorageClass supports required durability and expansion.
  • Pod anti-affinity prevents all replicas on one node.
  • Quorum behavior during node drains is understood.
  • Upgrade and downgrade path is documented.
  • Data corruption and split-brain risks are addressed.

Platform Reference Architecture

Rendering diagram...

Platform responsibilities:

  • Cluster lifecycle.
  • Identity and RBAC model.
  • Admission policies.
  • Ingress and DNS conventions.
  • Secret delivery model.
  • Observability stack.
  • Backup and DR standards.
  • Golden templates.
  • Documentation and support boundaries.

Application team responsibilities:

  • Service code and image.
  • Resource requests based on behavior.
  • Probes and graceful shutdown.
  • Dependency contracts.
  • SLOs and alerts.
  • Rollout verification.

Anti-patterns

Anti-patternWhy it hurtsReplacement
One giant cluster for every trust boundaryBlast radius and governance conflictSeparate clusters for strong boundaries
Direct kubectl changes in productionDrift and weak auditGitOps with break-glass logging
No resource requestsScheduler cannot plan capacityDefine requests from measurements
Every service gets LoadBalancerCost and network sprawlIngress or Gateway routing
Privileged pods by defaultHost compromise riskRestricted pod security and exemptions
Local persistent data with no backupNode loss becomes data lossDurable storage and restore drills
CRDs installed casuallyAPI surface and upgrade burden growOwnership and lifecycle review
Liveness probes checking dependenciesCascading restartsSeparate readiness from liveness
Huge Helm values filesReleases become hard to reason aboutSmaller charts and platform APIs
Ignoring eventsRoot cause evidence disappearsEvent collection and review

Production Troubleshooting Flow

Rendering diagram...

Commands:

kubectl get pods -n checkout -o wide
kubectl describe pod checkout-api-abc -n checkout
kubectl logs deployment/checkout-api -n checkout --previous
kubectl get events -n checkout --sort-by=.lastTimestamp
kubectl get endpoints,svc,ingress -n checkout
kubectl top pods -n checkout
kubectl describe node worker-3

Review Checklist

  • Each service has probes, requests, ownership labels, and rollout strategy.
  • Critical services have multiple replicas, PDBs, and topology spread.
  • External traffic uses a standard ingress or gateway path.
  • Secrets are not embedded in plain manifests.
  • Stateful systems have backup, restore, and upgrade plans.
  • GitOps or release pipeline is the production write path.
  • Observability covers metrics, logs, traces, events, and alerts.
  • Runbooks explain rollback, drain behavior, dependency failure, and scaling.