Compendium

Kubernetes

Kubernetes learning notes on API machinery, workloads, networking, storage, security, operations, GitOps, policy, and production platform patterns.

Notes: 19
Reading: 161 min
References: 99
Diagrams: 40

[INDEX]

Study map

Purpose: This note is the entry point for a dense Kubernetes compendium, connecting the learning path, the architecture model, and the operational vocabulary needed to reason about clusters.

Kubernetes

Kubernetes is a declarative control plane for running containerized workloads across a cluster of machines. It is not a container runtime, not a platform-as-a-service by itself, not a CI system, and not a replacement for application architecture. Its core job is to store desired state in an API, continuously compare that desired state with observed cluster state, and drive the system toward convergence.

Start here:

Existing crash course sections:

Compact Definition

Kubernetes is a distributed API-driven automation system. Users and controllers submit API objects such as Pods, Deployments, Services, ConfigMaps, Secrets, Ingresses, Jobs, and custom resources. The control plane stores those objects, validates them, watches for changes, and coordinates controllers and node agents that act on them.

The main mental model is:

You declare desired state.
The API server validates and persists that state.
Controllers watch state changes.
Controllers create or update dependent objects.
Node agents and infrastructure integrations make runtime changes.
Status fields, Events, metrics, and logs report what actually happened.

Rendering diagram...

What Kubernetes Is

Kubernetes is	Practical meaning
A declarative API	You submit objects and let reconciliation loops work toward the requested state.
A cluster scheduler	It places Pods on nodes based on resource requests, constraints, taints, tolerations, affinity, topology, and policies.
A workload orchestrator	It keeps replicas running, restarts failed containers, rolls out changes, and manages batch work.
A service discovery system	Services and CoreDNS give stable names and virtual IPs for changing Pod backends.
An extensible platform kernel	CustomResourceDefinitions and controllers let teams add new API types and automation.

What Kubernetes Is Not

Kubernetes is not	Why it matters
Not Docker	Docker is a developer toolchain and engine. Kubernetes talks to CRI runtimes such as containerd and CRI-O. Dockershim was removed in Kubernetes v1.24.
Not a complete application platform	You still choose CI, registry, image policy, secrets management, observability, ingress or Gateway implementation, backups, and developer workflows.
Not magic autoscaling	Scheduling and autoscaling depend on accurate requests, limits, metrics, policies, and capacity.
Not a security boundary by default	Multi-tenant clusters need RBAC, Pod Security Admission, network policy, admission policy, image controls, node isolation, and runtime hardening.
Not a database backup system	Stateful workloads need storage class design, volume snapshots, application-consistent backups, restore drills, and disaster recovery runbooks.

Core Vocabulary

Term	Meaning	Where to study
Object	A persistent API resource with metadata, spec, and often status.	01 Kubernetes Mental Model and Architecture
Spec	Desired state written by a user, controller, or automation.	01 Kubernetes Mental Model and Architecture
Status	Observed state written by controllers or agents.	01 Kubernetes Mental Model and Architecture
Reconciliation	A loop that watches desired and observed state, then takes action to reduce drift.	01 Kubernetes Mental Model and Architecture
Namespace	A scope for names, RBAC, quotas, and many policies.	01 Kubernetes Mental Model and Architecture
Label	Indexed key-value metadata used for selection and grouping.	01 Kubernetes Mental Model and Architecture
Annotation	Non-identifying metadata for tools, rollout notes, checksums, ownership hints, and integrations.	01 Kubernetes Mental Model and Architecture
OwnerReference	Metadata that expresses ownership so garbage collection can remove dependents.	01 Kubernetes Mental Model and Architecture
Finalizer	A deletion gate that lets a controller clean external resources before an object disappears.	01 Kubernetes Mental Model and Architecture
Event	A short-lived diagnostic record about scheduling, pulling images, admission, readiness, or controller decisions.	01 Kubernetes Mental Model and Architecture

Production vs Local Clusters

Local clusters such as kind, minikube, k3d, Docker Desktop Kubernetes, and single-node k3s are excellent for learning API behavior, YAML shape, kubectl workflows, and controller development. They do not prove production readiness by themselves.

Concern	Local cluster	Production cluster
Failure model	Often one machine, limited failure domains.	Multiple nodes, zones, capacity pools, upgrades, and real outages.
Networking	Simplified load balancers, ports, and DNS.	CNI choice, policy enforcement, cloud LBs, Gateway or Ingress controllers, DNS scale.
Storage	HostPath or simple local volumes are common.	CSI drivers, snapshots, backup, restore, reclaim policy, encryption, topology.
Security	Often permissive for speed.	RBAC least privilege, Pod Security Admission, audit logs, network policy, image policy.
Operations	Manual kubectl experiments are acceptable.	GitOps, policy checks, SLOs, incident response, upgrade plans, capacity management.

Current Official Facts To Remember

The Kubernetes API reference currently lists Kubernetes v1.36.
Dockershim was removed in Kubernetes v1.24. Modern clusters should use CRI runtimes such as containerd or CRI-O.
PodSecurityPolicy was removed in Kubernetes v1.25.
Pod Security Admission is stable as of Kubernetes v1.25.
Gateway API is an official Kubernetes project and an add-on API family for role-oriented, extensible traffic management. Installing its CRDs and a compatible controller is separate from installing core Kubernetes.

First Principles

Desired state is data

Kubernetes turns operational intent into API data. A Deployment says "keep this many Pods matching this template available." A Service says "give a stable virtual endpoint for Pods matching this selector." A Job says "run this task to completion." A Namespace says "scope names and many controls here."

That data is valuable because many actors can read it consistently: kubectl, controllers, admission webhooks, policy engines, GitOps tools, dashboards, audit systems, and custom operators.

Controllers are the engine

A controller is a reconciliation loop:

Watch relevant objects.
Compare desired state with observed state.
Create, update, delete, or report state.
Requeue when the world changes or when an action fails.

Good Kubernetes operations means reading controller intent and controller feedback. If a Deployment is not progressing, inspect the Deployment, ReplicaSet, Pods, Events, scheduler decisions, image pulls, probes, quotas, policies, and node state. The YAML alone is not the whole system.

The API server is the front door

All durable cluster state changes go through kube-apiserver. Components do not usually edit etcd directly. This gives Kubernetes a uniform security and validation path: authentication, authorization, admission, defaulting, validation, persistence, watch delivery, and audit.

Nodes execute, the control plane decides

Worker nodes run kubelet, kube-proxy or an equivalent dataplane, a container runtime, CNI networking, CSI storage plugins, and workloads. The control plane stores state and makes decisions. Kubelet then turns assigned Pods into running containers and reports status back.

Common Mistakes

Mistake	Consequence	Better practice
Treating YAML as deployment truth without reading status	Teams miss admission, scheduling, image pull, probe, and runtime failures.	Always inspect `status`, `conditions`, and `Events`.
Using labels casually	Services, Deployments, policies, monitoring, and cost allocation select the wrong objects.	Design stable label taxonomy early.
Putting operational identity in annotations only	Selectors and policies cannot use annotations.	Use labels for identity and selection, annotations for tool metadata.
Assuming namespace equals hard tenancy	Nodes, CRDs, cluster-scoped resources, shared controllers, and network paths can cross namespaces.	Combine namespaces with RBAC, quotas, Pod Security Admission, network policy, and node isolation where needed.
Forgetting owner references and finalizers	Resources leak or deletions hang.	Understand garbage collection and finalizer cleanup paths.
Running production like minikube	Hidden gaps in HA, storage, upgrades, security, DNS, and load balancing.	Treat local clusters as learning and integration tools, not as proof of production behavior.

Review Checklist

Can you explain the difference between desired state, observed state, spec, status, and Events?
Can you name the control plane components and the worker node components?
Can you trace what happens when a Deployment is applied?
Can you explain why labels power selectors and annotations do not?
Can you explain how owner references, garbage collection, and finalizers interact?
Can you distinguish Kubernetes core APIs from add-on APIs such as Gateway API?
Can you describe why dockershim removal did not mean container images stopped working?
Can you state why PodSecurityPolicy should not be used on modern clusters?

Exact Coverage Routing

Use this section when you need to search by the exact operational phrase rather than by the broader concept.

Phrase	Primary note	What to study there
Declarative desired state	01 Kubernetes Mental Model and Architecture	How spec, status, server-side apply, and controller loops turn API objects into runtime behavior.
Controller manager	01 Kubernetes Mental Model and Architecture	Built-in controllers, ownership, garbage collection, and reconciliation boundaries.
Cloud controller manager	01 Kubernetes Mental Model and Architecture	Provider-owned load balancers, node metadata, routes, and cloud integration boundaries.
When not to run a database on Kubernetes	03 Deployments ReplicaSets StatefulSets DaemonSets Jobs and CronJobs	The operational conditions that make managed databases safer than cluster-hosted databases.
Stateful workload design tradeoffs	03 Deployments ReplicaSets StatefulSets DaemonSets Jobs and CronJobs	Identity, storage, failover, repair, and backup implications for StatefulSets.
Flannel overview	05 Kubernetes Networking CNI NetworkPolicy and Service Mesh	Simple overlay networking, limits, and when richer policy-aware CNIs are required.
Environment variables vs mounted files	06 Configuration Secrets ServiceAccounts and Runtime Identity	Reload semantics, leakage risk, config size, and rollout behavior.
Workload identity on cloud providers	06 Configuration Secrets ServiceAccounts and Runtime Identity	Mapping Kubernetes ServiceAccounts to cloud IAM without static long-lived credentials.
StatefulSets with PVC templates	07 Storage Volumes PVCs StorageClasses CSI and Stateful Data	Per-replica storage identity, retention, expansion, restore, and migration risks.
SELinux overview	09 Security RBAC Pod Security Admission and Supply Chain	Host-level mandatory access controls and how they relate to SecurityContext and runtimes.
kubectl events	10 Observability Logging Metrics Tracing Events and Probes	Event querying, event retention limits, and how to combine Events with status and logs.
Structured logging	10 Observability Logging Metrics Tracing Events and Probes	Log fields, correlation IDs, severity, sampling, and incident searchability.
SLO-oriented observability	10 Observability Logging Metrics Tracing Events and Probes	Telemetry based on user symptoms, error budgets, burn rates, and runbook triggers.
Debugging with ephemeral containers	11 Troubleshooting Debugging and Incident Response	How to inspect minimal images and live Pods without modifying the workload spec.
Port forwarding	10 Observability Logging Metrics Tracing Events and Probes	Temporary local access for inspection, with security and audit limits.
Capturing diagnostics	11 Troubleshooting Debugging and Incident Response	Evidence bundles for Pods, Services, Nodes, PVCs, policies, and rollouts.
Ingress returns 404 or 502	11 Troubleshooting Debugging and Incident Response	Controller class, route match, backend Service, endpoint readiness, and upstream protocol checks.
TLS certificate failures	11 Troubleshooting Debugging and Incident Response	Secret shape, certificate chain, SNI, issuer, renewal, and controller reload checks.
Rollout stuck	11 Troubleshooting Debugging and Incident Response	Deployment progress, ReplicaSet ownership, PDBs, readiness, quota, image, and admission causes.
NetworkPolicy blocks traffic	11 Troubleshooting Debugging and Incident Response	Namespace selectors, pod selectors, ingress, egress, DNS, and CNI enforcement checks.
Debugging order of operations	11 Troubleshooting Debugging and Incident Response	A deterministic symptom to owner to controller to event to dataplane investigation sequence.
Incident response checklists	11 Troubleshooting Debugging and Incident Response	Stabilization, evidence capture, mitigation, communication, recovery, and follow-up.
Helm charts	12 Helm Kustomize Manifests and Release Engineering	Chart structure, values, templates, hooks, rollback, and release ownership.
Argo Rollouts overview	12 Helm Kustomize Manifests and Release Engineering	Progressive delivery using canary, blue green, analysis, and automated promotion controls.
Blue green deployments	12 Helm Kustomize Manifests and Release Engineering	Safe traffic switching, rollback, schema compatibility, and capacity cost.
Canary deployments	12 Helm Kustomize Manifests and Release Engineering	Gradual exposure, metrics gates, blast radius, and abort behavior.
Reconciliation design	13 GitOps Controllers Operators CRDs and Platform APIs	Controller inputs, idempotency, status conditions, finalizers, rate limits, and ownership.
Crossplane overview	13 GitOps Controllers Operators CRDs and Platform APIs	Using Kubernetes APIs to provision external infrastructure through provider controllers.
Cluster bootstrap overview	14 Cluster Operations Upgrades Backup Restore and Disaster Recovery	How control plane, CNI, CSI, DNS, ingress, policy, and GitOps foundations come online.
kubeadm overview	14 Cluster Operations Upgrades Backup Restore and Disaster Recovery	Self-managed cluster bootstrapping, certificates, upgrades, and operational ownership.
Bare metal clusters	14 Cluster Operations Upgrades Backup Restore and Disaster Recovery	Load balancing, storage, power, networking, hardware failure, and upgrade responsibilities.
Talos overview	14 Cluster Operations Upgrades Backup Restore and Disaster Recovery	Immutable Kubernetes-focused OS operations and API-driven node management.
k3s and lightweight clusters	14 Cluster Operations Upgrades Backup Restore and Disaster Recovery	Edge, lab, and small-footprint clusters without confusing them with full HA production designs.
API deprecations	14 Cluster Operations Upgrades Backup Restore and Disaster Recovery	Release planning, manifest scanning, conversion, and upgrade gates.
CNI upgrades	14 Cluster Operations Upgrades Backup Restore and Disaster Recovery	Dataplane compatibility, policy behavior, rollback, and node disruption risks.
CSI upgrades	14 Cluster Operations Upgrades Backup Restore and Disaster Recovery	Driver compatibility, snapshots, expansion, attach behavior, and restore testing.
Ingress controller upgrades	14 Cluster Operations Upgrades Backup Restore and Disaster Recovery	Routing compatibility, annotations, Gateway API migration, TLS reload, and rollback plans.
Backup testing	14 Cluster Operations Upgrades Backup Restore and Disaster Recovery	Proving backups by restoring real objects, etcd state, PVC data, and control plane dependencies.

Study Order

Read 00 Kubernetes Mastery Roadmap for the sequence.
Read 01 Kubernetes Mental Model and Architecture until the reconciliation and component model is fluent.
Use 00 Kubernetes Mastery Roadmap to practice basic commands.
Revisit each concept with a local cluster, then compare the local behavior with production constraints.

[NOTES]

Ordered notes

No. 18 min

Kubernetes Mastery Roadmap

Purpose: This note gives a rigorous learning path for mastering Kubernetes from core mental models to production operations without confusing local practice with production readiness. Kubernetes Mastery Roadmap This...

No. 212 min3 diagrams

Kubernetes Mental Model and Architecture

Purpose: This note explains Kubernetes as a distributed reconciliation system, with enough architectural detail to debug real clusters and evaluate production designs. Kubernetes Mental Model and Architecture...

No. 37 min2 diagrams

Containers Pods and Workload Primitives

Purpose: Explain the Pod level primitives that every Kubernetes workload controller builds on, including container lifecycle, sidecar patterns, readiness, disruption behavior, identity, and node local special cases....

No. 47 min2 diagrams

Deployments ReplicaSets StatefulSets DaemonSets Jobs and CronJobs

Purpose: Compare Kubernetes workload controllers and show how to operate stateless services, stateful services, node agents, batch jobs, scheduled jobs, rolling updates, and rollbacks safely. Deployments, ReplicaSets,...

No. 512 min3 diagrams

Services DNS Ingress Gateway API and Traffic Routing

Purpose: explain how Kubernetes exposes Pods through Services, DNS, Ingress, and Gateway API, and how traffic actually moves from clients to endpoints in production clusters. Related notes: Kubernetes, 00 Kubernetes...

No. 612 min3 diagrams

Kubernetes Networking CNI NetworkPolicy and Service Mesh

Purpose: explain the Kubernetes data network below Services, including CNI plugins, NetworkPolicy, egress control, service mesh, mTLS, and practical troubleshooting. Related notes: Kubernetes, 00 Kubernetes Mastery...

No. 710 min4 diagrams

Configuration Secrets ServiceAccounts and Runtime Identity

Purpose: explain how Kubernetes carries application configuration, protects secret material, and assigns runtime identity to workloads with production grade controls. Configuration, Secrets, ServiceAccounts, and...

No. 89 min4 diagrams

Storage Volumes PVCs StorageClasses CSI and Stateful Data

Purpose: explain Kubernetes storage primitives, dynamic provisioning, CSI behavior, StatefulSet data patterns, and operational recovery for stateful workloads. Storage, Volumes, PVCs, StorageClasses, CSI, and Stateful...

No. 910 min3 diagrams

Scheduling Resources Requests Limits QoS and Autoscaling

Purpose: Explain how Kubernetes places Pods, reserves resources, enforces limits, prioritizes workloads, spreads risk, and scales Pods or nodes under real production constraints. Scheduling, resources, requests,...

No. 1010 min2 diagrams

Security RBAC Pod Security Admission and Supply Chain

Purpose: explain how Kubernetes security controls compose across identity, admission, pod hardening, secrets, and software supply chain enforcement. Security, RBAC, Pod Security Admission, and Supply Chain This note...

No. 118 min1 diagram

Observability Logging Metrics Tracing Events and Probes

Purpose: explain how to observe Kubernetes workloads through logs, metrics, traces, events, probes, audit records, SLOs, alerts, and runbooks. Observability, Logging, Metrics, Tracing, Events, and Probes This note...

No. 1210 min1 diagram

Troubleshooting Debugging and Incident Response

Purpose: provide a practical Kubernetes troubleshooting and incident response playbook for workload, networking, storage, node, policy, and rollout failures. Troubleshooting, Debugging, and Incident Response This note...

No. 136 min1 diagram

Helm Kustomize Manifests and Release Engineering

Purpose: explain how Kubernetes manifests become reliable releases through apply semantics, Helm, Kustomize, validation, policy gates, and repeatable delivery workflows. Helm, Kustomize, Manifests, and Release...

No. 146 min3 diagrams

GitOps Controllers Operators CRDs and Platform APIs

Purpose: explain how GitOps, controllers, operators, CRDs, and platform APIs turn Kubernetes into a reconciled platform rather than a collection of manual commands. GitOps, Controllers, Operators, CRDs, and Platform...

No. 157 min2 diagrams

Cluster Operations Upgrades Backup Restore and Disaster Recovery

Purpose: explain the operational practices required to run, upgrade, back up, restore, and recover Kubernetes clusters safely. Cluster Operations, Upgrades, Backup, Restore, and Disaster Recovery Cluster operations are...

No. 165 min1 diagram

Multi Tenancy Policy Governance and Cost Management

Purpose: explain how Kubernetes clusters can safely host multiple teams, environments, or tenants through isolation, policy, governance, and cost controls. Multi Tenancy, Policy, Governance, and Cost Management...

No. 174 min3 diagrams

Production Patterns Anti Patterns and Reference Architectures

Purpose: explain practical Kubernetes production architectures, repeatable patterns, and dangerous anti patterns for application and platform teams. Production Patterns, Anti Patterns, and Reference Architectures...

No. 186 min1 diagram

Kubernetes Ecosystem Tools and Learning Projects

Purpose: map the Kubernetes ecosystem into practical tool categories and give hands on projects that build production judgment. Kubernetes Ecosystem Tools and Learning Projects The Kubernetes ecosystem is large because...

No. 2012 min1 diagram

Kubernetes

Purpose: This note is the entry point for a dense Kubernetes compendium, connecting the learning path, the architecture model, and the operational vocabulary needed to reason about clusters. Kubernetes Kubernetes is a...