Kubernetes
Kubernetes learning notes on API machinery, workloads, networking, storage, security, operations, GitOps, policy, and production platform patterns.
- 19
- 161 min
- 99
- 40
Study map
Purpose: This note is the entry point for a dense Kubernetes compendium, connecting the learning path, the architecture model, and the operational vocabulary needed to reason about clusters.
Kubernetes
Kubernetes is a declarative control plane for running containerized workloads across a cluster of machines. It is not a container runtime, not a platform-as-a-service by itself, not a CI system, and not a replacement for application architecture. Its core job is to store desired state in an API, continuously compare that desired state with observed cluster state, and drive the system toward convergence.
Start here:
- 00 Kubernetes Mastery Roadmap
- 01 Kubernetes Mental Model and Architecture
- 00 Kubernetes Mastery Roadmap
Existing crash course sections:
- 01 Kubernetes Mental Model and Architecture
- 01 Kubernetes Mental Model and Architecture
- 17 Kubernetes Ecosystem Tools and Learning Projects
- 03 Deployments ReplicaSets StatefulSets DaemonSets Jobs and CronJobs
- 04 Services DNS Ingress Gateway API and Traffic Routing
- 06 Configuration Secrets ServiceAccounts and Runtime Identity
- 07 Storage Volumes PVCs StorageClasses CSI and Stateful Data
- 10 Observability Logging Metrics Tracing Events and Probes
- 09 Security RBAC Pod Security Admission and Supply Chain
Compact Definition
Kubernetes is a distributed API-driven automation system. Users and controllers submit API objects such as Pods, Deployments, Services, ConfigMaps, Secrets, Ingresses, Jobs, and custom resources. The control plane stores those objects, validates them, watches for changes, and coordinates controllers and node agents that act on them.
The main mental model is:
- You declare desired state.
- The API server validates and persists that state.
- Controllers watch state changes.
- Controllers create or update dependent objects.
- Node agents and infrastructure integrations make runtime changes.
- Status fields, Events, metrics, and logs report what actually happened.
What Kubernetes Is
| Kubernetes is | Practical meaning |
|---|---|
| A declarative API | You submit objects and let reconciliation loops work toward the requested state. |
| A cluster scheduler | It places Pods on nodes based on resource requests, constraints, taints, tolerations, affinity, topology, and policies. |
| A workload orchestrator | It keeps replicas running, restarts failed containers, rolls out changes, and manages batch work. |
| A service discovery system | Services and CoreDNS give stable names and virtual IPs for changing Pod backends. |
| An extensible platform kernel | CustomResourceDefinitions and controllers let teams add new API types and automation. |
What Kubernetes Is Not
| Kubernetes is not | Why it matters |
|---|---|
| Not Docker | Docker is a developer toolchain and engine. Kubernetes talks to CRI runtimes such as containerd and CRI-O. Dockershim was removed in Kubernetes v1.24. |
| Not a complete application platform | You still choose CI, registry, image policy, secrets management, observability, ingress or Gateway implementation, backups, and developer workflows. |
| Not magic autoscaling | Scheduling and autoscaling depend on accurate requests, limits, metrics, policies, and capacity. |
| Not a security boundary by default | Multi-tenant clusters need RBAC, Pod Security Admission, network policy, admission policy, image controls, node isolation, and runtime hardening. |
| Not a database backup system | Stateful workloads need storage class design, volume snapshots, application-consistent backups, restore drills, and disaster recovery runbooks. |
Core Vocabulary
| Term | Meaning | Where to study |
|---|---|---|
| Object | A persistent API resource with metadata, spec, and often status. | 01 Kubernetes Mental Model and Architecture |
| Spec | Desired state written by a user, controller, or automation. | 01 Kubernetes Mental Model and Architecture |
| Status | Observed state written by controllers or agents. | 01 Kubernetes Mental Model and Architecture |
| Reconciliation | A loop that watches desired and observed state, then takes action to reduce drift. | 01 Kubernetes Mental Model and Architecture |
| Namespace | A scope for names, RBAC, quotas, and many policies. | 01 Kubernetes Mental Model and Architecture |
| Label | Indexed key-value metadata used for selection and grouping. | 01 Kubernetes Mental Model and Architecture |
| Annotation | Non-identifying metadata for tools, rollout notes, checksums, ownership hints, and integrations. | 01 Kubernetes Mental Model and Architecture |
| OwnerReference | Metadata that expresses ownership so garbage collection can remove dependents. | 01 Kubernetes Mental Model and Architecture |
| Finalizer | A deletion gate that lets a controller clean external resources before an object disappears. | 01 Kubernetes Mental Model and Architecture |
| Event | A short-lived diagnostic record about scheduling, pulling images, admission, readiness, or controller decisions. | 01 Kubernetes Mental Model and Architecture |
Production vs Local Clusters
Local clusters such as kind, minikube, k3d, Docker Desktop Kubernetes, and single-node k3s are excellent for learning API behavior, YAML shape, kubectl workflows, and controller development. They do not prove production readiness by themselves.
| Concern | Local cluster | Production cluster |
|---|---|---|
| Failure model | Often one machine, limited failure domains. | Multiple nodes, zones, capacity pools, upgrades, and real outages. |
| Networking | Simplified load balancers, ports, and DNS. | CNI choice, policy enforcement, cloud LBs, Gateway or Ingress controllers, DNS scale. |
| Storage | HostPath or simple local volumes are common. | CSI drivers, snapshots, backup, restore, reclaim policy, encryption, topology. |
| Security | Often permissive for speed. | RBAC least privilege, Pod Security Admission, audit logs, network policy, image policy. |
| Operations | Manual kubectl experiments are acceptable. | GitOps, policy checks, SLOs, incident response, upgrade plans, capacity management. |
Current Official Facts To Remember
- The Kubernetes API reference currently lists Kubernetes v1.36.
- Dockershim was removed in Kubernetes v1.24. Modern clusters should use CRI runtimes such as containerd or CRI-O.
- PodSecurityPolicy was removed in Kubernetes v1.25.
- Pod Security Admission is stable as of Kubernetes v1.25.
- Gateway API is an official Kubernetes project and an add-on API family for role-oriented, extensible traffic management. Installing its CRDs and a compatible controller is separate from installing core Kubernetes.
First Principles
Desired state is data
Kubernetes turns operational intent into API data. A Deployment says "keep this many Pods matching this template available." A Service says "give a stable virtual endpoint for Pods matching this selector." A Job says "run this task to completion." A Namespace says "scope names and many controls here."
That data is valuable because many actors can read it consistently: kubectl, controllers, admission webhooks, policy engines, GitOps tools, dashboards, audit systems, and custom operators.
Controllers are the engine
A controller is a reconciliation loop:
- Watch relevant objects.
- Compare desired state with observed state.
- Create, update, delete, or report state.
- Requeue when the world changes or when an action fails.
Good Kubernetes operations means reading controller intent and controller feedback. If a Deployment is not progressing, inspect the Deployment, ReplicaSet, Pods, Events, scheduler decisions, image pulls, probes, quotas, policies, and node state. The YAML alone is not the whole system.
The API server is the front door
All durable cluster state changes go through kube-apiserver. Components do not usually edit etcd directly. This gives Kubernetes a uniform security and validation path: authentication, authorization, admission, defaulting, validation, persistence, watch delivery, and audit.
Nodes execute, the control plane decides
Worker nodes run kubelet, kube-proxy or an equivalent dataplane, a container runtime, CNI networking, CSI storage plugins, and workloads. The control plane stores state and makes decisions. Kubelet then turns assigned Pods into running containers and reports status back.
Common Mistakes
| Mistake | Consequence | Better practice |
|---|---|---|
| Treating YAML as deployment truth without reading status | Teams miss admission, scheduling, image pull, probe, and runtime failures. | Always inspect status, conditions, and Events. |
| Using labels casually | Services, Deployments, policies, monitoring, and cost allocation select the wrong objects. | Design stable label taxonomy early. |
| Putting operational identity in annotations only | Selectors and policies cannot use annotations. | Use labels for identity and selection, annotations for tool metadata. |
| Assuming namespace equals hard tenancy | Nodes, CRDs, cluster-scoped resources, shared controllers, and network paths can cross namespaces. | Combine namespaces with RBAC, quotas, Pod Security Admission, network policy, and node isolation where needed. |
| Forgetting owner references and finalizers | Resources leak or deletions hang. | Understand garbage collection and finalizer cleanup paths. |
| Running production like minikube | Hidden gaps in HA, storage, upgrades, security, DNS, and load balancing. | Treat local clusters as learning and integration tools, not as proof of production behavior. |
Review Checklist
- Can you explain the difference between desired state, observed state, spec, status, and Events?
- Can you name the control plane components and the worker node components?
- Can you trace what happens when a Deployment is applied?
- Can you explain why labels power selectors and annotations do not?
- Can you explain how owner references, garbage collection, and finalizers interact?
- Can you distinguish Kubernetes core APIs from add-on APIs such as Gateway API?
- Can you describe why dockershim removal did not mean container images stopped working?
- Can you state why PodSecurityPolicy should not be used on modern clusters?
Exact Coverage Routing
Use this section when you need to search by the exact operational phrase rather than by the broader concept.
| Phrase | Primary note | What to study there |
|---|---|---|
| Declarative desired state | 01 Kubernetes Mental Model and Architecture | How spec, status, server-side apply, and controller loops turn API objects into runtime behavior. |
| Controller manager | 01 Kubernetes Mental Model and Architecture | Built-in controllers, ownership, garbage collection, and reconciliation boundaries. |
| Cloud controller manager | 01 Kubernetes Mental Model and Architecture | Provider-owned load balancers, node metadata, routes, and cloud integration boundaries. |
| When not to run a database on Kubernetes | 03 Deployments ReplicaSets StatefulSets DaemonSets Jobs and CronJobs | The operational conditions that make managed databases safer than cluster-hosted databases. |
| Stateful workload design tradeoffs | 03 Deployments ReplicaSets StatefulSets DaemonSets Jobs and CronJobs | Identity, storage, failover, repair, and backup implications for StatefulSets. |
| Flannel overview | 05 Kubernetes Networking CNI NetworkPolicy and Service Mesh | Simple overlay networking, limits, and when richer policy-aware CNIs are required. |
| Environment variables vs mounted files | 06 Configuration Secrets ServiceAccounts and Runtime Identity | Reload semantics, leakage risk, config size, and rollout behavior. |
| Workload identity on cloud providers | 06 Configuration Secrets ServiceAccounts and Runtime Identity | Mapping Kubernetes ServiceAccounts to cloud IAM without static long-lived credentials. |
| StatefulSets with PVC templates | 07 Storage Volumes PVCs StorageClasses CSI and Stateful Data | Per-replica storage identity, retention, expansion, restore, and migration risks. |
| SELinux overview | 09 Security RBAC Pod Security Admission and Supply Chain | Host-level mandatory access controls and how they relate to SecurityContext and runtimes. |
| kubectl events | 10 Observability Logging Metrics Tracing Events and Probes | Event querying, event retention limits, and how to combine Events with status and logs. |
| Structured logging | 10 Observability Logging Metrics Tracing Events and Probes | Log fields, correlation IDs, severity, sampling, and incident searchability. |
| SLO-oriented observability | 10 Observability Logging Metrics Tracing Events and Probes | Telemetry based on user symptoms, error budgets, burn rates, and runbook triggers. |
| Debugging with ephemeral containers | 11 Troubleshooting Debugging and Incident Response | How to inspect minimal images and live Pods without modifying the workload spec. |
| Port forwarding | 10 Observability Logging Metrics Tracing Events and Probes | Temporary local access for inspection, with security and audit limits. |
| Capturing diagnostics | 11 Troubleshooting Debugging and Incident Response | Evidence bundles for Pods, Services, Nodes, PVCs, policies, and rollouts. |
| Ingress returns 404 or 502 | 11 Troubleshooting Debugging and Incident Response | Controller class, route match, backend Service, endpoint readiness, and upstream protocol checks. |
| TLS certificate failures | 11 Troubleshooting Debugging and Incident Response | Secret shape, certificate chain, SNI, issuer, renewal, and controller reload checks. |
| Rollout stuck | 11 Troubleshooting Debugging and Incident Response | Deployment progress, ReplicaSet ownership, PDBs, readiness, quota, image, and admission causes. |
| NetworkPolicy blocks traffic | 11 Troubleshooting Debugging and Incident Response | Namespace selectors, pod selectors, ingress, egress, DNS, and CNI enforcement checks. |
| Debugging order of operations | 11 Troubleshooting Debugging and Incident Response | A deterministic symptom to owner to controller to event to dataplane investigation sequence. |
| Incident response checklists | 11 Troubleshooting Debugging and Incident Response | Stabilization, evidence capture, mitigation, communication, recovery, and follow-up. |
| Helm charts | 12 Helm Kustomize Manifests and Release Engineering | Chart structure, values, templates, hooks, rollback, and release ownership. |
| Argo Rollouts overview | 12 Helm Kustomize Manifests and Release Engineering | Progressive delivery using canary, blue green, analysis, and automated promotion controls. |
| Blue green deployments | 12 Helm Kustomize Manifests and Release Engineering | Safe traffic switching, rollback, schema compatibility, and capacity cost. |
| Canary deployments | 12 Helm Kustomize Manifests and Release Engineering | Gradual exposure, metrics gates, blast radius, and abort behavior. |
| Reconciliation design | 13 GitOps Controllers Operators CRDs and Platform APIs | Controller inputs, idempotency, status conditions, finalizers, rate limits, and ownership. |
| Crossplane overview | 13 GitOps Controllers Operators CRDs and Platform APIs | Using Kubernetes APIs to provision external infrastructure through provider controllers. |
| Cluster bootstrap overview | 14 Cluster Operations Upgrades Backup Restore and Disaster Recovery | How control plane, CNI, CSI, DNS, ingress, policy, and GitOps foundations come online. |
| kubeadm overview | 14 Cluster Operations Upgrades Backup Restore and Disaster Recovery | Self-managed cluster bootstrapping, certificates, upgrades, and operational ownership. |
| Bare metal clusters | 14 Cluster Operations Upgrades Backup Restore and Disaster Recovery | Load balancing, storage, power, networking, hardware failure, and upgrade responsibilities. |
| Talos overview | 14 Cluster Operations Upgrades Backup Restore and Disaster Recovery | Immutable Kubernetes-focused OS operations and API-driven node management. |
| k3s and lightweight clusters | 14 Cluster Operations Upgrades Backup Restore and Disaster Recovery | Edge, lab, and small-footprint clusters without confusing them with full HA production designs. |
| API deprecations | 14 Cluster Operations Upgrades Backup Restore and Disaster Recovery | Release planning, manifest scanning, conversion, and upgrade gates. |
| CNI upgrades | 14 Cluster Operations Upgrades Backup Restore and Disaster Recovery | Dataplane compatibility, policy behavior, rollback, and node disruption risks. |
| CSI upgrades | 14 Cluster Operations Upgrades Backup Restore and Disaster Recovery | Driver compatibility, snapshots, expansion, attach behavior, and restore testing. |
| Ingress controller upgrades | 14 Cluster Operations Upgrades Backup Restore and Disaster Recovery | Routing compatibility, annotations, Gateway API migration, TLS reload, and rollback plans. |
| Backup testing | 14 Cluster Operations Upgrades Backup Restore and Disaster Recovery | Proving backups by restoring real objects, etcd state, PVC data, and control plane dependencies. |
Study Order
- Read 00 Kubernetes Mastery Roadmap for the sequence.
- Read 01 Kubernetes Mental Model and Architecture until the reconciliation and component model is fluent.
- Use 00 Kubernetes Mastery Roadmap to practice basic commands.
- Revisit each concept with a local cluster, then compare the local behavior with production constraints.
Ordered notes
Kubernetes Mastery Roadmap
Purpose: This note gives a rigorous learning path for mastering Kubernetes from core mental models to production operations without confusing local practice with production readiness. Kubernetes Mastery Roadmap This...
Kubernetes Mental Model and Architecture
Purpose: This note explains Kubernetes as a distributed reconciliation system, with enough architectural detail to debug real clusters and evaluate production designs. Kubernetes Mental Model and Architecture...
Containers Pods and Workload Primitives
Purpose: Explain the Pod level primitives that every Kubernetes workload controller builds on, including container lifecycle, sidecar patterns, readiness, disruption behavior, identity, and node local special cases....
Deployments ReplicaSets StatefulSets DaemonSets Jobs and CronJobs
Purpose: Compare Kubernetes workload controllers and show how to operate stateless services, stateful services, node agents, batch jobs, scheduled jobs, rolling updates, and rollbacks safely. Deployments, ReplicaSets,...
Services DNS Ingress Gateway API and Traffic Routing
Purpose: explain how Kubernetes exposes Pods through Services, DNS, Ingress, and Gateway API, and how traffic actually moves from clients to endpoints in production clusters. Related notes: Kubernetes, 00 Kubernetes...
Kubernetes Networking CNI NetworkPolicy and Service Mesh
Purpose: explain the Kubernetes data network below Services, including CNI plugins, NetworkPolicy, egress control, service mesh, mTLS, and practical troubleshooting. Related notes: Kubernetes, 00 Kubernetes Mastery...
Configuration Secrets ServiceAccounts and Runtime Identity
Purpose: explain how Kubernetes carries application configuration, protects secret material, and assigns runtime identity to workloads with production grade controls. Configuration, Secrets, ServiceAccounts, and...
Storage Volumes PVCs StorageClasses CSI and Stateful Data
Purpose: explain Kubernetes storage primitives, dynamic provisioning, CSI behavior, StatefulSet data patterns, and operational recovery for stateful workloads. Storage, Volumes, PVCs, StorageClasses, CSI, and Stateful...
Scheduling Resources Requests Limits QoS and Autoscaling
Purpose: Explain how Kubernetes places Pods, reserves resources, enforces limits, prioritizes workloads, spreads risk, and scales Pods or nodes under real production constraints. Scheduling, resources, requests,...
Security RBAC Pod Security Admission and Supply Chain
Purpose: explain how Kubernetes security controls compose across identity, admission, pod hardening, secrets, and software supply chain enforcement. Security, RBAC, Pod Security Admission, and Supply Chain This note...
Observability Logging Metrics Tracing Events and Probes
Purpose: explain how to observe Kubernetes workloads through logs, metrics, traces, events, probes, audit records, SLOs, alerts, and runbooks. Observability, Logging, Metrics, Tracing, Events, and Probes This note...
Troubleshooting Debugging and Incident Response
Purpose: provide a practical Kubernetes troubleshooting and incident response playbook for workload, networking, storage, node, policy, and rollout failures. Troubleshooting, Debugging, and Incident Response This note...
Helm Kustomize Manifests and Release Engineering
Purpose: explain how Kubernetes manifests become reliable releases through apply semantics, Helm, Kustomize, validation, policy gates, and repeatable delivery workflows. Helm, Kustomize, Manifests, and Release...
GitOps Controllers Operators CRDs and Platform APIs
Purpose: explain how GitOps, controllers, operators, CRDs, and platform APIs turn Kubernetes into a reconciled platform rather than a collection of manual commands. GitOps, Controllers, Operators, CRDs, and Platform...
Cluster Operations Upgrades Backup Restore and Disaster Recovery
Purpose: explain the operational practices required to run, upgrade, back up, restore, and recover Kubernetes clusters safely. Cluster Operations, Upgrades, Backup, Restore, and Disaster Recovery Cluster operations are...
Multi Tenancy Policy Governance and Cost Management
Purpose: explain how Kubernetes clusters can safely host multiple teams, environments, or tenants through isolation, policy, governance, and cost controls. Multi Tenancy, Policy, Governance, and Cost Management...
Production Patterns Anti Patterns and Reference Architectures
Purpose: explain practical Kubernetes production architectures, repeatable patterns, and dangerous anti patterns for application and platform teams. Production Patterns, Anti Patterns, and Reference Architectures...
Kubernetes Ecosystem Tools and Learning Projects
Purpose: map the Kubernetes ecosystem into practical tool categories and give hands on projects that build production judgment. Kubernetes Ecosystem Tools and Learning Projects The Kubernetes ecosystem is large because...
Kubernetes
Purpose: This note is the entry point for a dense Kubernetes compendium, connecting the learning path, the architecture model, and the operational vocabulary needed to reason about clusters. Kubernetes Kubernetes is a...