Kubernetes

Kubernetes learning notes on API machinery, workloads, networking, storage, security, operations, GitOps, policy, and production platform patterns.

19
161 min
99
40

Study map

Purpose: This note is the entry point for a dense Kubernetes compendium, connecting the learning path, the architecture model, and the operational vocabulary needed to reason about clusters.

Kubernetes

Kubernetes is a declarative control plane for running containerized workloads across a cluster of machines. It is not a container runtime, not a platform-as-a-service by itself, not a CI system, and not a replacement for application architecture. Its core job is to store desired state in an API, continuously compare that desired state with observed cluster state, and drive the system toward convergence.

Start here:

Existing crash course sections:

Compact Definition

Kubernetes is a distributed API-driven automation system. Users and controllers submit API objects such as Pods, Deployments, Services, ConfigMaps, Secrets, Ingresses, Jobs, and custom resources. The control plane stores those objects, validates them, watches for changes, and coordinates controllers and node agents that act on them.

The main mental model is:

  1. You declare desired state.
  2. The API server validates and persists that state.
  3. Controllers watch state changes.
  4. Controllers create or update dependent objects.
  5. Node agents and infrastructure integrations make runtime changes.
  6. Status fields, Events, metrics, and logs report what actually happened.
Rendering diagram...

What Kubernetes Is

Kubernetes isPractical meaning
A declarative APIYou submit objects and let reconciliation loops work toward the requested state.
A cluster schedulerIt places Pods on nodes based on resource requests, constraints, taints, tolerations, affinity, topology, and policies.
A workload orchestratorIt keeps replicas running, restarts failed containers, rolls out changes, and manages batch work.
A service discovery systemServices and CoreDNS give stable names and virtual IPs for changing Pod backends.
An extensible platform kernelCustomResourceDefinitions and controllers let teams add new API types and automation.

What Kubernetes Is Not

Kubernetes is notWhy it matters
Not DockerDocker is a developer toolchain and engine. Kubernetes talks to CRI runtimes such as containerd and CRI-O. Dockershim was removed in Kubernetes v1.24.
Not a complete application platformYou still choose CI, registry, image policy, secrets management, observability, ingress or Gateway implementation, backups, and developer workflows.
Not magic autoscalingScheduling and autoscaling depend on accurate requests, limits, metrics, policies, and capacity.
Not a security boundary by defaultMulti-tenant clusters need RBAC, Pod Security Admission, network policy, admission policy, image controls, node isolation, and runtime hardening.
Not a database backup systemStateful workloads need storage class design, volume snapshots, application-consistent backups, restore drills, and disaster recovery runbooks.

Core Vocabulary

TermMeaningWhere to study
ObjectA persistent API resource with metadata, spec, and often status.01 Kubernetes Mental Model and Architecture
SpecDesired state written by a user, controller, or automation.01 Kubernetes Mental Model and Architecture
StatusObserved state written by controllers or agents.01 Kubernetes Mental Model and Architecture
ReconciliationA loop that watches desired and observed state, then takes action to reduce drift.01 Kubernetes Mental Model and Architecture
NamespaceA scope for names, RBAC, quotas, and many policies.01 Kubernetes Mental Model and Architecture
LabelIndexed key-value metadata used for selection and grouping.01 Kubernetes Mental Model and Architecture
AnnotationNon-identifying metadata for tools, rollout notes, checksums, ownership hints, and integrations.01 Kubernetes Mental Model and Architecture
OwnerReferenceMetadata that expresses ownership so garbage collection can remove dependents.01 Kubernetes Mental Model and Architecture
FinalizerA deletion gate that lets a controller clean external resources before an object disappears.01 Kubernetes Mental Model and Architecture
EventA short-lived diagnostic record about scheduling, pulling images, admission, readiness, or controller decisions.01 Kubernetes Mental Model and Architecture

Production vs Local Clusters

Local clusters such as kind, minikube, k3d, Docker Desktop Kubernetes, and single-node k3s are excellent for learning API behavior, YAML shape, kubectl workflows, and controller development. They do not prove production readiness by themselves.

ConcernLocal clusterProduction cluster
Failure modelOften one machine, limited failure domains.Multiple nodes, zones, capacity pools, upgrades, and real outages.
NetworkingSimplified load balancers, ports, and DNS.CNI choice, policy enforcement, cloud LBs, Gateway or Ingress controllers, DNS scale.
StorageHostPath or simple local volumes are common.CSI drivers, snapshots, backup, restore, reclaim policy, encryption, topology.
SecurityOften permissive for speed.RBAC least privilege, Pod Security Admission, audit logs, network policy, image policy.
OperationsManual kubectl experiments are acceptable.GitOps, policy checks, SLOs, incident response, upgrade plans, capacity management.

Current Official Facts To Remember

  • The Kubernetes API reference currently lists Kubernetes v1.36.
  • Dockershim was removed in Kubernetes v1.24. Modern clusters should use CRI runtimes such as containerd or CRI-O.
  • PodSecurityPolicy was removed in Kubernetes v1.25.
  • Pod Security Admission is stable as of Kubernetes v1.25.
  • Gateway API is an official Kubernetes project and an add-on API family for role-oriented, extensible traffic management. Installing its CRDs and a compatible controller is separate from installing core Kubernetes.

First Principles

Desired state is data

Kubernetes turns operational intent into API data. A Deployment says "keep this many Pods matching this template available." A Service says "give a stable virtual endpoint for Pods matching this selector." A Job says "run this task to completion." A Namespace says "scope names and many controls here."

That data is valuable because many actors can read it consistently: kubectl, controllers, admission webhooks, policy engines, GitOps tools, dashboards, audit systems, and custom operators.

Controllers are the engine

A controller is a reconciliation loop:

  1. Watch relevant objects.
  2. Compare desired state with observed state.
  3. Create, update, delete, or report state.
  4. Requeue when the world changes or when an action fails.

Good Kubernetes operations means reading controller intent and controller feedback. If a Deployment is not progressing, inspect the Deployment, ReplicaSet, Pods, Events, scheduler decisions, image pulls, probes, quotas, policies, and node state. The YAML alone is not the whole system.

The API server is the front door

All durable cluster state changes go through kube-apiserver. Components do not usually edit etcd directly. This gives Kubernetes a uniform security and validation path: authentication, authorization, admission, defaulting, validation, persistence, watch delivery, and audit.

Nodes execute, the control plane decides

Worker nodes run kubelet, kube-proxy or an equivalent dataplane, a container runtime, CNI networking, CSI storage plugins, and workloads. The control plane stores state and makes decisions. Kubelet then turns assigned Pods into running containers and reports status back.

Common Mistakes

MistakeConsequenceBetter practice
Treating YAML as deployment truth without reading statusTeams miss admission, scheduling, image pull, probe, and runtime failures.Always inspect status, conditions, and Events.
Using labels casuallyServices, Deployments, policies, monitoring, and cost allocation select the wrong objects.Design stable label taxonomy early.
Putting operational identity in annotations onlySelectors and policies cannot use annotations.Use labels for identity and selection, annotations for tool metadata.
Assuming namespace equals hard tenancyNodes, CRDs, cluster-scoped resources, shared controllers, and network paths can cross namespaces.Combine namespaces with RBAC, quotas, Pod Security Admission, network policy, and node isolation where needed.
Forgetting owner references and finalizersResources leak or deletions hang.Understand garbage collection and finalizer cleanup paths.
Running production like minikubeHidden gaps in HA, storage, upgrades, security, DNS, and load balancing.Treat local clusters as learning and integration tools, not as proof of production behavior.

Review Checklist

  • Can you explain the difference between desired state, observed state, spec, status, and Events?
  • Can you name the control plane components and the worker node components?
  • Can you trace what happens when a Deployment is applied?
  • Can you explain why labels power selectors and annotations do not?
  • Can you explain how owner references, garbage collection, and finalizers interact?
  • Can you distinguish Kubernetes core APIs from add-on APIs such as Gateway API?
  • Can you describe why dockershim removal did not mean container images stopped working?
  • Can you state why PodSecurityPolicy should not be used on modern clusters?

Exact Coverage Routing

Use this section when you need to search by the exact operational phrase rather than by the broader concept.

PhrasePrimary noteWhat to study there
Declarative desired state01 Kubernetes Mental Model and ArchitectureHow spec, status, server-side apply, and controller loops turn API objects into runtime behavior.
Controller manager01 Kubernetes Mental Model and ArchitectureBuilt-in controllers, ownership, garbage collection, and reconciliation boundaries.
Cloud controller manager01 Kubernetes Mental Model and ArchitectureProvider-owned load balancers, node metadata, routes, and cloud integration boundaries.
When not to run a database on Kubernetes03 Deployments ReplicaSets StatefulSets DaemonSets Jobs and CronJobsThe operational conditions that make managed databases safer than cluster-hosted databases.
Stateful workload design tradeoffs03 Deployments ReplicaSets StatefulSets DaemonSets Jobs and CronJobsIdentity, storage, failover, repair, and backup implications for StatefulSets.
Flannel overview05 Kubernetes Networking CNI NetworkPolicy and Service MeshSimple overlay networking, limits, and when richer policy-aware CNIs are required.
Environment variables vs mounted files06 Configuration Secrets ServiceAccounts and Runtime IdentityReload semantics, leakage risk, config size, and rollout behavior.
Workload identity on cloud providers06 Configuration Secrets ServiceAccounts and Runtime IdentityMapping Kubernetes ServiceAccounts to cloud IAM without static long-lived credentials.
StatefulSets with PVC templates07 Storage Volumes PVCs StorageClasses CSI and Stateful DataPer-replica storage identity, retention, expansion, restore, and migration risks.
SELinux overview09 Security RBAC Pod Security Admission and Supply ChainHost-level mandatory access controls and how they relate to SecurityContext and runtimes.
kubectl events10 Observability Logging Metrics Tracing Events and ProbesEvent querying, event retention limits, and how to combine Events with status and logs.
Structured logging10 Observability Logging Metrics Tracing Events and ProbesLog fields, correlation IDs, severity, sampling, and incident searchability.
SLO-oriented observability10 Observability Logging Metrics Tracing Events and ProbesTelemetry based on user symptoms, error budgets, burn rates, and runbook triggers.
Debugging with ephemeral containers11 Troubleshooting Debugging and Incident ResponseHow to inspect minimal images and live Pods without modifying the workload spec.
Port forwarding10 Observability Logging Metrics Tracing Events and ProbesTemporary local access for inspection, with security and audit limits.
Capturing diagnostics11 Troubleshooting Debugging and Incident ResponseEvidence bundles for Pods, Services, Nodes, PVCs, policies, and rollouts.
Ingress returns 404 or 50211 Troubleshooting Debugging and Incident ResponseController class, route match, backend Service, endpoint readiness, and upstream protocol checks.
TLS certificate failures11 Troubleshooting Debugging and Incident ResponseSecret shape, certificate chain, SNI, issuer, renewal, and controller reload checks.
Rollout stuck11 Troubleshooting Debugging and Incident ResponseDeployment progress, ReplicaSet ownership, PDBs, readiness, quota, image, and admission causes.
NetworkPolicy blocks traffic11 Troubleshooting Debugging and Incident ResponseNamespace selectors, pod selectors, ingress, egress, DNS, and CNI enforcement checks.
Debugging order of operations11 Troubleshooting Debugging and Incident ResponseA deterministic symptom to owner to controller to event to dataplane investigation sequence.
Incident response checklists11 Troubleshooting Debugging and Incident ResponseStabilization, evidence capture, mitigation, communication, recovery, and follow-up.
Helm charts12 Helm Kustomize Manifests and Release EngineeringChart structure, values, templates, hooks, rollback, and release ownership.
Argo Rollouts overview12 Helm Kustomize Manifests and Release EngineeringProgressive delivery using canary, blue green, analysis, and automated promotion controls.
Blue green deployments12 Helm Kustomize Manifests and Release EngineeringSafe traffic switching, rollback, schema compatibility, and capacity cost.
Canary deployments12 Helm Kustomize Manifests and Release EngineeringGradual exposure, metrics gates, blast radius, and abort behavior.
Reconciliation design13 GitOps Controllers Operators CRDs and Platform APIsController inputs, idempotency, status conditions, finalizers, rate limits, and ownership.
Crossplane overview13 GitOps Controllers Operators CRDs and Platform APIsUsing Kubernetes APIs to provision external infrastructure through provider controllers.
Cluster bootstrap overview14 Cluster Operations Upgrades Backup Restore and Disaster RecoveryHow control plane, CNI, CSI, DNS, ingress, policy, and GitOps foundations come online.
kubeadm overview14 Cluster Operations Upgrades Backup Restore and Disaster RecoverySelf-managed cluster bootstrapping, certificates, upgrades, and operational ownership.
Bare metal clusters14 Cluster Operations Upgrades Backup Restore and Disaster RecoveryLoad balancing, storage, power, networking, hardware failure, and upgrade responsibilities.
Talos overview14 Cluster Operations Upgrades Backup Restore and Disaster RecoveryImmutable Kubernetes-focused OS operations and API-driven node management.
k3s and lightweight clusters14 Cluster Operations Upgrades Backup Restore and Disaster RecoveryEdge, lab, and small-footprint clusters without confusing them with full HA production designs.
API deprecations14 Cluster Operations Upgrades Backup Restore and Disaster RecoveryRelease planning, manifest scanning, conversion, and upgrade gates.
CNI upgrades14 Cluster Operations Upgrades Backup Restore and Disaster RecoveryDataplane compatibility, policy behavior, rollback, and node disruption risks.
CSI upgrades14 Cluster Operations Upgrades Backup Restore and Disaster RecoveryDriver compatibility, snapshots, expansion, attach behavior, and restore testing.
Ingress controller upgrades14 Cluster Operations Upgrades Backup Restore and Disaster RecoveryRouting compatibility, annotations, Gateway API migration, TLS reload, and rollback plans.
Backup testing14 Cluster Operations Upgrades Backup Restore and Disaster RecoveryProving backups by restoring real objects, etcd state, PVC data, and control plane dependencies.

Study Order

  1. Read 00 Kubernetes Mastery Roadmap for the sequence.
  2. Read 01 Kubernetes Mental Model and Architecture until the reconciliation and component model is fluent.
  3. Use 00 Kubernetes Mastery Roadmap to practice basic commands.
  4. Revisit each concept with a local cluster, then compare the local behavior with production constraints.

Ordered notes

Kubernetes Mastery Roadmap

Purpose: This note gives a rigorous learning path for mastering Kubernetes from core mental models to production operations without confusing local practice with production readiness. Kubernetes Mastery Roadmap This...

Kubernetes Mental Model and Architecture

Purpose: This note explains Kubernetes as a distributed reconciliation system, with enough architectural detail to debug real clusters and evaluate production designs. Kubernetes Mental Model and Architecture...

Containers Pods and Workload Primitives

Purpose: Explain the Pod level primitives that every Kubernetes workload controller builds on, including container lifecycle, sidecar patterns, readiness, disruption behavior, identity, and node local special cases....

Deployments ReplicaSets StatefulSets DaemonSets Jobs and CronJobs

Purpose: Compare Kubernetes workload controllers and show how to operate stateless services, stateful services, node agents, batch jobs, scheduled jobs, rolling updates, and rollbacks safely. Deployments, ReplicaSets,...

Services DNS Ingress Gateway API and Traffic Routing

Purpose: explain how Kubernetes exposes Pods through Services, DNS, Ingress, and Gateway API, and how traffic actually moves from clients to endpoints in production clusters. Related notes: Kubernetes, 00 Kubernetes...

Kubernetes Networking CNI NetworkPolicy and Service Mesh

Purpose: explain the Kubernetes data network below Services, including CNI plugins, NetworkPolicy, egress control, service mesh, mTLS, and practical troubleshooting. Related notes: Kubernetes, 00 Kubernetes Mastery...

Configuration Secrets ServiceAccounts and Runtime Identity

Purpose: explain how Kubernetes carries application configuration, protects secret material, and assigns runtime identity to workloads with production grade controls. Configuration, Secrets, ServiceAccounts, and...

Storage Volumes PVCs StorageClasses CSI and Stateful Data

Purpose: explain Kubernetes storage primitives, dynamic provisioning, CSI behavior, StatefulSet data patterns, and operational recovery for stateful workloads. Storage, Volumes, PVCs, StorageClasses, CSI, and Stateful...

Scheduling Resources Requests Limits QoS and Autoscaling

Purpose: Explain how Kubernetes places Pods, reserves resources, enforces limits, prioritizes workloads, spreads risk, and scales Pods or nodes under real production constraints. Scheduling, resources, requests,...

Security RBAC Pod Security Admission and Supply Chain

Purpose: explain how Kubernetes security controls compose across identity, admission, pod hardening, secrets, and software supply chain enforcement. Security, RBAC, Pod Security Admission, and Supply Chain This note...

Observability Logging Metrics Tracing Events and Probes

Purpose: explain how to observe Kubernetes workloads through logs, metrics, traces, events, probes, audit records, SLOs, alerts, and runbooks. Observability, Logging, Metrics, Tracing, Events, and Probes This note...

Troubleshooting Debugging and Incident Response

Purpose: provide a practical Kubernetes troubleshooting and incident response playbook for workload, networking, storage, node, policy, and rollout failures. Troubleshooting, Debugging, and Incident Response This note...

Helm Kustomize Manifests and Release Engineering

Purpose: explain how Kubernetes manifests become reliable releases through apply semantics, Helm, Kustomize, validation, policy gates, and repeatable delivery workflows. Helm, Kustomize, Manifests, and Release...

GitOps Controllers Operators CRDs and Platform APIs

Purpose: explain how GitOps, controllers, operators, CRDs, and platform APIs turn Kubernetes into a reconciled platform rather than a collection of manual commands. GitOps, Controllers, Operators, CRDs, and Platform...

Cluster Operations Upgrades Backup Restore and Disaster Recovery

Purpose: explain the operational practices required to run, upgrade, back up, restore, and recover Kubernetes clusters safely. Cluster Operations, Upgrades, Backup, Restore, and Disaster Recovery Cluster operations are...

Multi Tenancy Policy Governance and Cost Management

Purpose: explain how Kubernetes clusters can safely host multiple teams, environments, or tenants through isolation, policy, governance, and cost controls. Multi Tenancy, Policy, Governance, and Cost Management...

Production Patterns Anti Patterns and Reference Architectures

Purpose: explain practical Kubernetes production architectures, repeatable patterns, and dangerous anti patterns for application and platform teams. Production Patterns, Anti Patterns, and Reference Architectures...

Kubernetes Ecosystem Tools and Learning Projects

Purpose: map the Kubernetes ecosystem into practical tool categories and give hands on projects that build production judgment. Kubernetes Ecosystem Tools and Learning Projects The Kubernetes ecosystem is large because...

Kubernetes

Purpose: This note is the entry point for a dense Kubernetes compendium, connecting the learning path, the architecture model, and the operational vocabulary needed to reason about clusters. Kubernetes Kubernetes is a...