Storage Volumes PVCs StorageClasses CSI and Stateful Data

Reading time
9 min read
Word count
1746 words
Diagram count
4 diagrams

Source: Victor Bona's Obsidian Compendium snapshot, Knowledge base/kubernetes/07 Storage Volumes PVCs StorageClasses CSI and Stateful Data.md.

Purpose: explain Kubernetes storage primitives, dynamic provisioning, CSI behavior, StatefulSet data patterns, and operational recovery for stateful workloads.

Storage, Volumes, PVCs, StorageClasses, CSI, and Stateful Data

This note connects Kubernetes, Software Engineering#Required topic coverage matrix, 07 Storage Volumes PVCs StorageClasses CSI and Stateful Data, 14 Cluster Operations Upgrades Backup Restore and Disaster Recovery, and 14 Cluster Operations Upgrades Backup Restore and Disaster Recovery. Kubernetes storage is a contract between Pods, the scheduler, storage controllers, nodes, and the underlying platform. The important question is not "can the Pod mount a disk"; it is "what durability, locality, sharing, restore, and failure semantics does this workload need".

Mental Model

Rendering diagram...

Core objects:

ObjectScopePurpose
VolumePod specmakes storage available to containers
PersistentVolumeclusterrepresents real storage capacity
PersistentVolumeClaimnamespacerequests storage for a workload
StorageClassclusterdefines dynamic provisioning behavior
CSI drivercluster componentsimplements storage operations for a backend
VolumeSnapshotnamespacerequests point-in-time copy through CSI

Pod Volumes

A Pod volume lives in the Pod spec. Some volume types are ephemeral and follow the Pod. Others attach persistent storage.

apiVersion: v1
kind: Pod
metadata:
  name: volume-demo
  namespace: apps
spec:
  volumes:
    - name: cache
      emptyDir:
        sizeLimit: 2Gi
    - name: config
      configMap:
        name: app-config
  containers:
    - name: app
      image: busybox:1.36
      command: ["sh", "-c", "sleep 3600"]
      volumeMounts:
        - name: cache
          mountPath: /cache
        - name: config
          mountPath: /etc/app
          readOnly: true

emptyDir

emptyDir is created when a Pod is assigned to a node and deleted when the Pod is removed from that node. It survives container restarts inside the same Pod, but not Pod rescheduling.

Use caseGood fitRisk
scratch spaceyesdata disappears with Pod
build workspaceyesnode disk pressure can evict Pod
cacheyes if rebuildablecold start after reschedule
database datanodata loss
inter-container handoffyeskeep size bounded

Memory-backed emptyDir:

volumes:
  - name: fast-temp
    emptyDir:
      medium: Memory
      sizeLimit: 512Mi

Memory-backed volumes count against memory pressure. Use limits and test eviction behavior.

hostPath

hostPath mounts a path from the node filesystem into a Pod.

volumes:
  - name: node-logs
    hostPath:
      path: /var/log
      type: Directory

Production guidance:

PositionReason
avoid for application databinds Pod to node layout and can bypass storage controls
restrict with admission policyhost filesystem access is high privilege
use for node agents onlylog collectors, CSI plugins, and monitoring agents may need it
prefer Directory, File, or explicit typeavoids surprising path creation
mount read-only when possiblereduces node compromise impact

PersistentVolumes and PersistentVolumeClaims

A PersistentVolume is cluster storage. A PersistentVolumeClaim is a namespace request for that storage.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
  namespace: data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 100Gi

Pod usage:

apiVersion: v1
kind: Pod
metadata:
  name: postgres
  namespace: data
spec:
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: postgres-data
  containers:
    - name: postgres
      image: postgres:16
      volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data

PVC lifecycle:

Rendering diagram...

Commands:

kubectl get pvc -A
kubectl describe pvc postgres-data -n data
kubectl get pv
kubectl describe pv pvc-6b07-data-postgres
kubectl get events -n data --sort-by=.lastTimestamp

StorageClasses and Dynamic Provisioning

A StorageClass defines which provisioner creates volumes and which parameters it uses.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: csi.example.com
parameters:
  type: ssd
  encrypted: "true"
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

Key fields:

FieldChoicesGuidance
provisionerCSI driver namemust match installed CSI driver
reclaimPolicyDelete or RetainRetain for critical data, Delete for disposable or managed lifecycle
allowVolumeExpansiontrue or falseenable for production databases if backend supports it
volumeBindingModeImmediate or WaitForFirstConsumeruse WaitForFirstConsumer for zonal or local storage
parametersdriver-specificreview encryption, IOPS, filesystem, replication, and topology

Dynamic provisioning flow:

Rendering diagram...

With WaitForFirstConsumer, provisioning waits until a Pod exists so the scheduler can choose a zone that satisfies both compute and storage topology.

CSI Drivers

CSI, the Container Storage Interface, lets vendors implement storage operations outside Kubernetes core.

Typical CSI components:

ComponentRuns asResponsibility
controller pluginDeployment or StatefulSetcreate, delete, attach, detach, snapshot, expand
node pluginDaemonSetmount, unmount, stage, publish volumes on nodes
external provisionersidecarwatches PVCs and creates PVs
external attachersidecarmanages VolumeAttachment objects
external resizersidecarhandles expansion
external snapshottersidecarhandles snapshots

Inspect CSI:

kubectl get csidrivers
kubectl get storageclass
kubectl get pods -A | grep -i csi
kubectl get volumeattachments
kubectl describe csidriver csi.example.com

Access Modes

Access mode describes how a volume may be mounted by nodes and Pods. It is a scheduling and attach contract, not a full application-level concurrency guarantee.

ModeMeaningTypical backendBest use
RWO, ReadWriteOnceread-write by a single nodeblock diskdatabases, queues, single-writer apps
ROX, ReadOnlyManyread-only by many nodesfile or replicated volumeshared static data
RWX, ReadWriteManyread-write by many nodesNFS, CephFS, cloud file shareshared uploads, legacy shared filesystem apps
RWOP, ReadWriteOncePodread-write by one Podsupported CSI block volumesstrict single Pod writer

RWO vs RWX Tradeoffs

DecisionRWORWX
performanceoften higher for databasesdepends on network filesystem
data safetysimpler single writer modelapplication must handle concurrent writers
failoverattach and detach may take timemany Pods can mount at once
topologyoften zonalmay be regional or network reachable
operational complexitylowerhigher, especially permissions and locking
best fitstateful databasesshared content and multi-replica file access

Use RWX only when the application actually requires shared writable filesystem semantics. For many systems, object storage, a database, or a queue is a better shared data primitive.

Reclaim Policies

PolicyBehavior after PVC deletionUse when
Deletebackend volume is deletedephemeral environments or operator-managed restore path
RetainPV and backend data remaincritical data needs manual recovery gate

Retain recovery sketch:

kubectl get pv
kubectl patch pv pvc-6b07-data-postgres -p '{"spec":{"claimRef": null}}'
kubectl apply -f replacement-pvc.yaml

Use care. Manually rebinding retained volumes can attach old data to the wrong workload if labels, namespaces, and claim names are not reviewed.

Volume Expansion

PVC expansion works only when the StorageClass and CSI driver support it.

kubectl patch pvc postgres-data -n data -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'
kubectl describe pvc postgres-data -n data
kubectl get events -n data --sort-by=.lastTimestamp

Expansion checklist:

CheckWhy
allowVolumeExpansion: trueKubernetes rejects expansion otherwise
driver supports controller expansionbackend disk must grow
driver supports node expansionfilesystem must grow on node
filesystem supports online growthsome workloads need restart
monitoring adjusteddisk alert thresholds should reflect new capacity

Never shrink a PVC by editing the requested size. Kubernetes volume shrinking is not a normal safe operation.

Snapshots

CSI snapshots provide point-in-time copies when the driver supports the snapshot API.

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: postgres-data-snap-20260615
  namespace: data
spec:
  volumeSnapshotClassName: fast-ssd-snapshots
  source:
    persistentVolumeClaimName: postgres-data

Restore into a new PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data-restore
  namespace: data
spec:
  storageClassName: fast-ssd
  dataSource:
    name: postgres-data-snap-20260615
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi

Snapshot guidance:

TopicGuidance
consistencyapplication quiescing or database-native checkpointing may be required
scopesnapshot protects a volume, not every dependency
retentionkeep policy outside the workload namespace when possible
restore testa backup that has not been restored is unproven
portabilityCSI snapshots may not move across clusters or providers

StatefulSets and PVC Templates

StatefulSets create stable Pod identities and stable PVCs per replica.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: data
spec:
  serviceName: postgres
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:16
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes:
          - ReadWriteOnce
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 100Gi

Resulting claims:

data-postgres-0
data-postgres-1
data-postgres-2

StatefulSet storage properties:

PropertyOperational meaning
stable ordinalPod postgres-0 returns with the same identity
stable PVCdeleting a Pod does not delete its PVC
ordered rolloutupdates happen in ordinal order by default
scale down retentionPVCs usually remain after scale down
per-replica dataeach replica has independent storage

Local PersistentVolumes

Local PVs expose node-local disks through the PV and PVC model. They can provide high performance, but data locality becomes a scheduling constraint.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: local-pv-node-a
spec:
  capacity:
    storage: 500Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-ssd
  local:
    path: /mnt/disks/ssd1
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - node-a

StorageClass for local PVs:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-ssd
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

Tradeoffs:

StrengthCost
low latency and high throughputPod can run only where data exists
simple hardware modelnode loss can mean data unavailability
good for replicated databasesrequires database-level replication
predictable failure domainoperations must track disk health

Backup and Restore

Kubernetes objects and persistent data need separate backup plans.

Rendering diagram...

Backup layers:

LayerCapturesDoes not capture
GitOps repodesired manifestslive generated state and data
etcd backupAPI objectsexternal cloud disks in a directly useful app-consistent form
CSI snapshotvolume blocksapplication consistency by itself
database dumplogical recordsfull filesystem state
application exportdomain objectsplatform metadata

Restore runbook outline:

  1. Identify workload, namespace, PVC names, StorageClass, and application version.
  2. Stop writers or isolate the restore target.
  3. Restore into a new PVC or new namespace first.
  4. Start a validation Pod or application replica against restored data.
  5. Run application-level integrity checks.
  6. Promote through a controlled cutover.
  7. Record recovery time, recovery point, and gaps.

Commands:

kubectl get pvc -n data
kubectl get volumesnapshot -n data
kubectl describe volumesnapshot postgres-data-snap-20260615 -n data
kubectl apply -f restore-pvc.yaml
kubectl run restore-check -n data --image=busybox:1.36 -- sleep 3600

Data Locality and Scheduling

Storage can constrain where Pods run. This matters for zonal disks, local PVs, and topology-aware CSI drivers.

FeatureWhy it matters
WaitForFirstConsumeravoids provisioning storage in a zone where the Pod cannot run
PV node affinitypins local PV use to specific nodes
Pod affinity and anti-affinityspreads replicas across failure domains
topology spread constraintsreduces correlated failure
storage topology labelsbind volume to zone, region, rack, or node

Review with:

kubectl describe pod postgres-0 -n data
kubectl describe pvc data-postgres-0 -n data
kubectl describe pv pvc-6b07-data-postgres
kubectl get nodes --show-labels

Common Mistakes

MistakeImpactFix
using emptyDir for durable statedata loss on Pod rescheduleuse PVC or external durable service
using hostPath for app datanode lock-in and security riskuse PV, local PV, or CSI driver
default StorageClass not reviewedunexpected cost or durabilityset storageClassName explicitly
Immediate binding for zonal disksPod and volume can land in incompatible zonesuse WaitForFirstConsumer
assuming snapshots equal backupscrash consistency may be insufficientcombine with app quiescing and restore tests
deleting PVC during cleanupbackend data may be deleted by reclaim policycheck PV reclaim policy first
using RWX for database writescorruption or locking failuresuse RWO or database-native clustering
scaling StatefulSet without capacity planmany PVCs provision at onceprecheck quotas and storage backend limits
no restore drillrecovery process fails under pressureschedule restore tests
ignoring volume attach limitsPods stuck Pending or ContainerCreatingmonitor cloud and node attach limits

Storage Failure Troubleshooting

PVC Pending:

kubectl describe pvc postgres-data -n data
kubectl get storageclass fast-ssd -o yaml
kubectl get events -n data --sort-by=.lastTimestamp
kubectl logs -n kube-system -l app=csi-provisioner

Pod stuck Pending:

kubectl describe pod postgres-0 -n data
kubectl get pvc -n data
kubectl get pv
kubectl get nodes --show-labels

Pod stuck ContainerCreating with mount errors:

kubectl describe pod postgres-0 -n data
kubectl get volumeattachment
kubectl describe volumeattachment csi-1234567890abcdef
kubectl logs -n kube-system -l app=csi-node

Filesystem full:

kubectl exec -n data postgres-0 -- df -h
kubectl describe pvc data-postgres-0 -n data
kubectl patch pvc data-postgres-0 -n data -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'

Volume attached to wrong or dead node:

kubectl get volumeattachment
kubectl describe node node-a
kubectl describe pod postgres-0 -n data
kubectl delete pod postgres-0 -n data

Data missing after restart:

kubectl get pod postgres-0 -n data -o yaml | grep -A20 volumes:
kubectl get pvc -n data
kubectl describe pv pvc-6b07-data-postgres
kubectl rollout history statefulset/postgres -n data

Production Guidance

AreaGuidance
class selectiondefine storage classes by workload tier, not by ambiguous names
durabilitydocument replication, zone, encryption, snapshot, and backup semantics
bindingprefer WaitForFirstConsumer for topology-sensitive storage
expansionenable and test volume expansion before production incidents
access modechoose RWO for single-writer state, RWX only for true shared filesystem needs
reclaimuse Retain for critical manually managed data
snapshotspair snapshots with app consistency and restore validation
StatefulSetsunderstand PVC retention before scaling down or deleting
local PVsuse only with replicated applications or accepted node-loss risk
monitoringalert on PVC usage, inode pressure, attach failures, snapshot failures, and backend quotas

Review Checklist

  • Every PVC has an explicit storageClassName.
  • The StorageClass reclaim policy matches the data criticality.
  • volumeBindingMode is WaitForFirstConsumer for zonal or local storage.
  • Access mode is justified, especially any RWX claim.
  • StatefulSet PVC retention is understood before deletion or scale down.
  • Volume expansion is tested for the chosen CSI driver.
  • SnapshotClass and restore flow are documented.
  • Backups include application consistency, not only volume blocks.
  • Restore has been tested into a separate namespace or cluster.
  • Local PV workloads have database-level replication or accepted data loss boundaries.
  • Monitoring covers capacity, inodes, attach, mount, provision, snapshot, and backend quota failures.
  • Runbooks include PVC Pending, mount failure, expansion, and restore procedures.