Kubernetes Networking CNI NetworkPolicy and Service Mesh - Kubernetes Compendium

Purpose: explain the Kubernetes data network below Services, including CNI plugins, NetworkPolicy, egress control, service mesh, mTLS, and practical troubleshooting.

Network layers

Kubernetes networking is a stack. When traffic fails, identify the layer before changing manifests.

Rendering diagram...

Core contracts:

Contract	Meaning
Pod to Pod	Pods should be able to communicate across nodes without application aware NAT, unless policy or infrastructure blocks it.
Pod to Service	Pod traffic to a ClusterIP should reach a ready endpoint selected by that Service.
Pod to DNS	Pods need access to CoreDNS for normal Service discovery.
Node to Pod	kubelet, probes, and some infrastructure paths need node to Pod reachability.
External to Service	Ingress, Gateway, LoadBalancer, or NodePort translate external traffic into Service or Pod paths.

CNI overview

CNI, the Container Network Interface, is the plugin contract used by kubelet and the container runtime to attach Pods to networks. The CNI plugin assigns Pod IPs, creates interfaces, installs routes, and may enforce policy.

Kubernetes itself does not implement the Pod network. It delegates that work to the installed CNI. This is why two conformant clusters can have very different packet paths.

Typical CNI responsibilities:

Allocate Pod IP addresses.
Create the Pod network interface.
Connect the Pod to the node network namespace.
Configure routes or overlays so Pods can reach each other across nodes.
Enforce NetworkPolicy if supported.
Optionally implement Service load balancing, encryption, observability, and egress gateways.

Pod network designs

Design	How it works	Strengths	Costs
Overlay	Encapsulates Pod traffic across nodes, often VXLAN or Geneve	Works on simple underlays, easy cluster portability	Encapsulation overhead and MTU tuning required.
Routed Pod CIDRs	Node or fabric routes Pod CIDRs directly	Efficient and transparent	Requires route propagation, BGP, or cloud route integration.
Cloud native VPC IPs	Pods receive VPC routable addresses	Strong cloud integration and security group fit	IP exhaustion and provider limits can dominate design.
eBPF datapath	Kernel eBPF programs handle routing, policy, and sometimes Services	High performance and observability	Plugin expertise and kernel compatibility matter.

MTU matters. Overlay encapsulation reduces effective packet size. If clients see intermittent TLS stalls, gRPC resets, or large response hangs, check MTU and path fragmentation early.

kubectl get nodes -o wide
kubectl -n kube-system get pods -o wide
kubectl -n apps run netshoot --rm -it --image=nicolaka/netshoot:latest --restart=Never -- bash
ip addr
ip route
ping -M do -s 1372 POD_IP
tracepath POD_IP

CNI plugin tradeoffs

Calico

Calico is a widely used CNI focused on routed networking and policy. It can run with BGP, overlays, or cloud integrations. It supports Kubernetes NetworkPolicy and Calico specific policy resources.

Strength	Tradeoff
Mature policy model	Advanced Calico resources are not portable Kubernetes APIs.
Strong bare metal and hybrid support	BGP and route design require operational skill.
Good fit for explicit security segmentation	Misordered or overlapping policies can be hard to reason about.

Common Calico checks:

kubectl -n kube-system get pods -l k8s-app=calico-node -o wide
kubectl get ippools.crd.projectcalico.org
kubectl get networkpolicy -A
kubectl get globalnetworkpolicy.crd.projectcalico.org

Cilium

Cilium uses eBPF for networking, policy, observability, and optional kube-proxy replacement. It can provide rich flow visibility through Hubble and supports L3, L4, and L7 aware policy features through Cilium specific APIs.

Strength	Tradeoff
Powerful observability and eBPF datapath	Kernel, feature flag, and upgrade compatibility matter.
Can replace kube-proxy	Changes the Service datapath and debugging tools.
Rich policy and identity model	Cilium specific policies reduce portability.

Common Cilium checks:

kubectl -n kube-system get pods -l k8s-app=cilium -o wide
kubectl -n kube-system exec ds/cilium -- cilium status
kubectl -n kube-system exec ds/cilium -- cilium service list
kubectl -n kube-system exec ds/cilium -- cilium policy get

Flannel

Flannel is a simple Pod networking plugin. It is often used for learning, small clusters, and environments that want a minimal overlay. Flannel traditionally does not implement Kubernetes NetworkPolicy by itself.

Strength	Tradeoff
Simple mental model	Limited built in policy story.
Easy for labs and small clusters	Fewer advanced observability and security features.
Lower operational surface	May need another component for policy enforcement.

Common Flannel checks:

kubectl -n kube-flannel get pods -o wide
kubectl -n kube-system get pods | grep flannel
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{" "}{.spec.podCIDR}{"\n"}{end}'

Plugin selection matrix

Requirement	Better fit
Simple learning cluster	Flannel or managed provider default
Strong Kubernetes NetworkPolicy baseline	Calico, Cilium, or cloud CNI with policy support
Deep flow visibility	Cilium with Hubble, or mesh telemetry where appropriate
Bare metal BGP routing	Calico or Cilium depending on team skill
kube-proxy replacement	Cilium or another eBPF capable datapath
Cloud security group integration	Provider CNI, sometimes with Calico or Cilium policy layered on top

NetworkPolicy

NetworkPolicy is the Kubernetes API for Pod level network allow rules. It is namespaced and selects Pods by labels. It controls ingress to selected Pods and egress from selected Pods.

Critical caveat: NetworkPolicy enforcement requires CNI support. If the installed CNI does not implement NetworkPolicy, creating NetworkPolicy objects may have no traffic effect. Behavior beyond TCP, UDP, and SCTP can be plugin specific, including ICMP handling, ARP, DNS details, node local paths, and advanced L7 matching.

Default behavior:

Situation	Behavior
No NetworkPolicy selects a Pod for ingress	All ingress is allowed to that Pod.
At least one ingress policy selects a Pod	Only ingress allowed by all matching policy rules is allowed.
No NetworkPolicy selects a Pod for egress	All egress is allowed from that Pod.
At least one egress policy selects a Pod	Only egress allowed by matching policy rules is allowed.
Multiple policies select the same Pod	Allowed traffic is additive. Policies do not deny traffic directly.

Default deny

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
  namespace: apps
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

Apply default deny only with a planned allow list. Otherwise you will often break DNS, metrics scraping, readiness probes, webhooks, and dependency access.

Allow frontend to backend

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-api
  namespace: apps
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app.kubernetes.io/name: frontend
      ports:
        - protocol: TCP
          port: 8080

Allow DNS egress

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-egress
  namespace: apps
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

Allow egress to an internal CIDR

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-egress-to-database-subnet
  namespace: apps
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: api
  policyTypes:
    - Egress
  egress:
    - to:
        - ipBlock:
            cidr: 10.40.0.0/16
            except:
              - 10.40.99.0/24
      ports:
        - protocol: TCP
          port: 5432

ipBlock is usually intended for external or infrastructure CIDRs. Do not assume Pod IP allow lists remain stable. Prefer selectors for in-cluster Pods.

NetworkPolicy common mistakes

Mistake	Result	Fix
CNI does not enforce policy	Policies exist but traffic remains open	Verify CNI support before relying on policy.
Forgetting DNS egress	Apps fail with resolution errors	Allow UDP and TCP 53 to CoreDNS or node local DNS.
Label selector too broad	Unintended Pods gain access	Use standard labels and verify selected Pods.
Label selector too narrow	Legitimate clients lose access	Test from real client Pods and inspect labels.
Assuming policies deny	A later policy can allow traffic because allows are additive	Model allow lists, not ordered firewall denies.
Blocking kubelet probes	Pods become unready or restart	Understand probe source path for your CNI and cluster.
Using IPs for Pods	Rules break on reschedule	Use Pod and namespace selectors.
Ignoring SCTP or non TCP behavior	Results vary by plugin	Confirm plugin behavior for protocols beyond TCP, UDP, and SCTP.

Policy review commands:

kubectl -n apps get networkpolicy
kubectl -n apps describe networkpolicy allow-frontend-to-api
kubectl -n apps get pod --show-labels
kubectl -n apps run client --rm -it --image=curlimages/curl:8.8.0 --restart=Never -- sh
curl -sv http://api:8080/healthz
nslookup api.apps.svc.cluster.local

Egress control

Egress is harder than ingress because destinations can be IPs, DNS names, external SaaS, cloud metadata services, or private networks. Kubernetes NetworkPolicy only models IP blocks and selectors in the standard API. DNS name aware egress control is plugin or proxy specific.

Egress control patterns:

Pattern	Use when	Tradeoffs
NetworkPolicy egress allow lists	Internal dependencies and stable CIDRs	Poor fit for dynamic SaaS IPs.
Egress gateway	Need fixed source IP, audit, or central firewalling	Adds a choke point and routing complexity.
Service mesh egress	Need identity aware outbound control	Mesh complexity and sidecar or ambient constraints.
Cloud NAT or firewall	Need provider level enforcement	May not know Kubernetes workload identity.
Explicit HTTP proxy	Need URL or domain control	Apps must support proxy config or transparent proxying.

Metadata service protection example:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-egress
  namespace: apps
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: api
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53
    - to:
        - ipBlock:
            cidr: 10.50.0.0/16
      ports:
        - protocol: TCP
          port: 443

Production guidance:

Decide whether egress policy is for blast radius, compliance, cost control, or routing.
Keep DNS and time sync dependencies explicit.
Block cloud metadata access by default unless workload identity requires it.
Prefer workload identity over static cloud credentials.
Log denied flows where the CNI supports it.
Test upgrades because policy engines and datapaths can change behavior.

Service mesh overview

A service mesh adds workload identity, traffic policy, telemetry, and often mTLS between services. The classic model injects a proxy sidecar into each Pod. Newer models may use node proxies or ambient datapaths, depending on the mesh.

Mesh capabilities:

Capability	What it gives
mTLS	Encrypted and authenticated service to service traffic.
Traffic splitting	Canary, mirroring, retries, timeouts, and circuit breaking.
Identity	Workloads identified by service accounts or mesh identities instead of IPs.
Telemetry	Request metrics, traces, logs, and topology.
Policy	Authorization rules at L4 or L7, depending on the mesh.

Mesh costs:

Cost	Impact
Operational complexity	Control plane upgrades, proxy versions, injection policy, and debugging paths.
Latency and resources	Proxies consume CPU, memory, and add hops.
Failure modes	Misconfigured mTLS or authorization can break healthy apps.
Cognitive load	Developers must understand app, Kubernetes, and mesh routing layers.

mTLS overview

mTLS means both client and server present certificates. In a mesh, the control plane usually issues workload certificates to proxies. The proxies authenticate each other, encrypt traffic, and authorize requests based on workload identity.

Rendering diagram...

Important distinctions:

Topic	Meaning
TLS termination	Where encrypted traffic becomes plaintext.
mTLS identity	Both sides authenticate with certificates.
AuthorizationPolicy	Mesh specific rule deciding which identities can call which workloads.
End to end encryption	May require app level TLS if plaintext between proxy and app is unacceptable.

Istio and Linkerd tradeoffs

Dimension	Istio	Linkerd
Scope	Broad traffic management, security, policy, gateways, extensibility	Focused service mesh with simpler operations
Data plane	Envoy sidecars or ambient mode depending on deployment	Lightweight Rust proxy sidecars
Traffic features	Very rich routing, retries, fault injection, gateways	Core traffic policy with less surface area
Complexity	Higher, more powerful, more knobs	Lower, easier for many teams to operate
Ecosystem	Large ecosystem and many integrations	Strong simplicity and reliability emphasis
Best fit	Large platforms needing advanced routing and policy	Teams that want mTLS and telemetry with less operational weight

Mesh adoption checklist:

Prove the need with concrete requirements, not trend pressure.
Start with one namespace and one non-critical service path.
Define ownership for mesh control plane, certificates, policy, and upgrades.
Establish golden signals before enforcing mTLS or authorization.
Document escape hatches for broken injection, proxy startup, and policy rollback.
Measure latency and resource overhead before broad rollout.

Mesh versus NetworkPolicy

Need	NetworkPolicy	Service mesh
Block Pod traffic by label and port	Strong fit if CNI enforces it	Possible but heavier
Authenticate workload identity	Not provided by standard NetworkPolicy	Strong fit
Encrypt service to service traffic	Not provided by standard NetworkPolicy	Strong fit
HTTP path authorization	Not standard	Mesh or ingress policy feature
Restrict DNS name egress	Not standard	Possible with mesh or plugin specific policy
Portable Kubernetes API	Stronger	Mesh APIs vary

Use both when needed. NetworkPolicy gives baseline segmentation at the network layer. Mesh policy gives identity aware application traffic control. Do not treat mesh mTLS as a replacement for all network segmentation.

Troubleshooting network failures

Use a controlled debug Pod in the same namespace and, when needed, on a specific node.

apiVersion: v1
kind: Pod
metadata:
  name: netshoot
  namespace: apps
spec:
  restartPolicy: Never
  containers:
    - name: netshoot
      image: nicolaka/netshoot:latest
      command:
        - sleep
        - "3600"

kubectl -n apps apply -f netshoot.yaml
kubectl -n apps exec -it netshoot -- bash
dig kubernetes.default.svc.cluster.local
dig api.apps.svc.cluster.local
curl -sv http://api:8080/healthz
nc -vz api 8080
ip route
ss -tnp

Flow:

Rendering diagram...

Direct Pod path

kubectl -n apps get pods -o wide
kubectl -n apps exec -it netshoot -- curl -sv http://POD_IP:8080/healthz

If direct Pod IP fails:

Check	Command
App listening	`kubectl -n apps exec POD -- ss -lntp`
Pod labels and policy	`kubectl -n apps get pod POD --show-labels`
CNI Pods	`kubectl -n kube-system get pods -o wide`
Node route	`kubectl get node NODE -o yaml` plus node route inspection
MTU	`tracepath POD_IP`

Service path

kubectl -n apps get svc api -o yaml
kubectl -n apps get endpointslice -l kubernetes.io/service-name=api -o yaml
kubectl -n apps exec -it netshoot -- curl -sv http://api:8080/healthz

If Pod IP works but Service fails, focus on Service selectors, ports, kube-proxy, eBPF service maps, conntrack, and EndpointSlice readiness.

DNS path

kubectl -n apps exec -it netshoot -- cat /etc/resolv.conf
kubectl -n apps exec -it netshoot -- dig api.apps.svc.cluster.local
kubectl -n kube-system logs deploy/coredns --tail=100
kubectl -n kube-system get endpointslice -l k8s-app=kube-dns

If DNS fails but Service IP works, do not chase kube-proxy first. Check CoreDNS health, DNS policy, node local DNS, and egress policy.

Mesh path

kubectl -n apps get pod POD -o jsonpath='{.spec.containers[*].name}{"\n"}'
kubectl -n apps logs POD -c istio-proxy --tail=100
kubectl -n apps describe pod POD

Mesh specific tools differ:

Mesh	Useful checks
Istio	`istioctl proxy-status`, `istioctl proxy-config clusters`, `istioctl analyze`
Linkerd	`linkerd check`, `linkerd viz stat`, `linkerd viz tap`

Common failure patterns

Symptom	Likely layer	Typical cause
DNS timeout	DNS, policy, CNI	DNS egress blocked or CoreDNS unreachable.
DNS NXDOMAIN	DNS name	Wrong namespace, typo, missing Service.
Pod IP works, Service IP fails	Service datapath	Wrong Service port, no EndpointSlice, kube-proxy or eBPF issue.
Service works inside namespace only	DNS or policy	Short name depends on search path, cross namespace policy missing.
Cross node Pod traffic fails	CNI or underlay	Overlay blocked, BGP issue, cloud route missing, MTU.
Only large responses fail	MTU	Encapsulation overhead and blocked fragmentation.
Works before NetworkPolicy apply	Policy	Missing allow for DNS, backend, probes, or mesh proxy ports.
Works without sidecar only	Mesh	mTLS mode, authorization policy, or proxy config.
External egress source IP wrong	NAT path	Node SNAT, cloud NAT, or missing egress gateway.

Production guidance

Area	Guidance
CNI choice	Pick based on policy, observability, cloud integration, and operator skill, not benchmarks alone.
IP planning	Size Pod CIDRs and Service CIDRs before cluster creation. Readdressing later is painful.
Policy rollout	Start with observe, then namespace default deny, then workload allow lists.
DNS	Treat CoreDNS as production infrastructure with metrics, autoscaling, and alerting.
Egress	Decide the allowed outbound model before compliance asks for it.
Mesh	Adopt for identity, encryption, telemetry, or advanced traffic policy, not for every cluster by default.
Upgrades	Test CNI, kube-proxy, kernel, and mesh upgrades in a realistic staging cluster.
Observability	Keep packet, flow, DNS, proxy, and application telemetry correlated by namespace, Pod, node, and Service.

Review checklist

CNI plugin is identified, supported, and healthy on every node.
The team knows whether kube-proxy, IPVS, iptables, or eBPF handles Services.
Pod CIDR, Service CIDR, node CIDR, and VPC CIDR do not overlap.
NetworkPolicy enforcement is verified with real traffic tests.
NetworkPolicy caveats are documented: CNI support is required, and behavior beyond TCP, UDP, and SCTP can be plugin specific.
Default deny policies include explicit DNS, dependency, probe, and telemetry allowances.
Egress to cloud metadata endpoints is intentionally allowed or blocked.
CoreDNS has enough replicas, resource requests, cache settings, and alerts.
MTU is validated for overlay or VPN paths.
Mesh injection policy is explicit and reversible.
mTLS mode is known for every meshed namespace.
Authorization policies are tested from allowed and denied callers.
Troubleshooting runbooks separate DNS, Service, CNI, policy, mesh, and application failures.