Kubernetes Networking CNI NetworkPolicy and Service Mesh

Reading time
12 min read
Word count
2308 words
Diagram count
3 diagrams

Source: Victor Bona's Obsidian Compendium snapshot, Knowledge base/kubernetes/05 Kubernetes Networking CNI NetworkPolicy and Service Mesh.md.

Purpose: explain the Kubernetes data network below Services, including CNI plugins, NetworkPolicy, egress control, service mesh, mTLS, and practical troubleshooting.

Related notes: Kubernetes, 00 Kubernetes Mastery Roadmap, 04 Services DNS Ingress Gateway API and Traffic Routing, Networking, 04 Services DNS Ingress Gateway API and Traffic Routing, 09 Security RBAC Pod Security Admission and Supply Chain, 04 Services DNS Ingress Gateway API and Traffic Routing, 10 Observability Logging Metrics Tracing Events and Probes.

Network layers

Kubernetes networking is a stack. When traffic fails, identify the layer before changing manifests.

Rendering diagram...

Core contracts:

ContractMeaning
Pod to PodPods should be able to communicate across nodes without application aware NAT, unless policy or infrastructure blocks it.
Pod to ServicePod traffic to a ClusterIP should reach a ready endpoint selected by that Service.
Pod to DNSPods need access to CoreDNS for normal Service discovery.
Node to Podkubelet, probes, and some infrastructure paths need node to Pod reachability.
External to ServiceIngress, Gateway, LoadBalancer, or NodePort translate external traffic into Service or Pod paths.

CNI overview

CNI, the Container Network Interface, is the plugin contract used by kubelet and the container runtime to attach Pods to networks. The CNI plugin assigns Pod IPs, creates interfaces, installs routes, and may enforce policy.

Kubernetes itself does not implement the Pod network. It delegates that work to the installed CNI. This is why two conformant clusters can have very different packet paths.

Typical CNI responsibilities:

  1. Allocate Pod IP addresses.
  2. Create the Pod network interface.
  3. Connect the Pod to the node network namespace.
  4. Configure routes or overlays so Pods can reach each other across nodes.
  5. Enforce NetworkPolicy if supported.
  6. Optionally implement Service load balancing, encryption, observability, and egress gateways.

Pod network designs

DesignHow it worksStrengthsCosts
OverlayEncapsulates Pod traffic across nodes, often VXLAN or GeneveWorks on simple underlays, easy cluster portabilityEncapsulation overhead and MTU tuning required.
Routed Pod CIDRsNode or fabric routes Pod CIDRs directlyEfficient and transparentRequires route propagation, BGP, or cloud route integration.
Cloud native VPC IPsPods receive VPC routable addressesStrong cloud integration and security group fitIP exhaustion and provider limits can dominate design.
eBPF datapathKernel eBPF programs handle routing, policy, and sometimes ServicesHigh performance and observabilityPlugin expertise and kernel compatibility matter.

MTU matters. Overlay encapsulation reduces effective packet size. If clients see intermittent TLS stalls, gRPC resets, or large response hangs, check MTU and path fragmentation early.

kubectl get nodes -o wide
kubectl -n kube-system get pods -o wide
kubectl -n apps run netshoot --rm -it --image=nicolaka/netshoot:latest --restart=Never -- bash
ip addr
ip route
ping -M do -s 1372 POD_IP
tracepath POD_IP

CNI plugin tradeoffs

Calico

Calico is a widely used CNI focused on routed networking and policy. It can run with BGP, overlays, or cloud integrations. It supports Kubernetes NetworkPolicy and Calico specific policy resources.

StrengthTradeoff
Mature policy modelAdvanced Calico resources are not portable Kubernetes APIs.
Strong bare metal and hybrid supportBGP and route design require operational skill.
Good fit for explicit security segmentationMisordered or overlapping policies can be hard to reason about.

Common Calico checks:

kubectl -n kube-system get pods -l k8s-app=calico-node -o wide
kubectl get ippools.crd.projectcalico.org
kubectl get networkpolicy -A
kubectl get globalnetworkpolicy.crd.projectcalico.org

Cilium

Cilium uses eBPF for networking, policy, observability, and optional kube-proxy replacement. It can provide rich flow visibility through Hubble and supports L3, L4, and L7 aware policy features through Cilium specific APIs.

StrengthTradeoff
Powerful observability and eBPF datapathKernel, feature flag, and upgrade compatibility matter.
Can replace kube-proxyChanges the Service datapath and debugging tools.
Rich policy and identity modelCilium specific policies reduce portability.

Common Cilium checks:

kubectl -n kube-system get pods -l k8s-app=cilium -o wide
kubectl -n kube-system exec ds/cilium -- cilium status
kubectl -n kube-system exec ds/cilium -- cilium service list
kubectl -n kube-system exec ds/cilium -- cilium policy get

Flannel

Flannel is a simple Pod networking plugin. It is often used for learning, small clusters, and environments that want a minimal overlay. Flannel traditionally does not implement Kubernetes NetworkPolicy by itself.

StrengthTradeoff
Simple mental modelLimited built in policy story.
Easy for labs and small clustersFewer advanced observability and security features.
Lower operational surfaceMay need another component for policy enforcement.

Common Flannel checks:

kubectl -n kube-flannel get pods -o wide
kubectl -n kube-system get pods | grep flannel
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{" "}{.spec.podCIDR}{"\n"}{end}'

Plugin selection matrix

RequirementBetter fit
Simple learning clusterFlannel or managed provider default
Strong Kubernetes NetworkPolicy baselineCalico, Cilium, or cloud CNI with policy support
Deep flow visibilityCilium with Hubble, or mesh telemetry where appropriate
Bare metal BGP routingCalico or Cilium depending on team skill
kube-proxy replacementCilium or another eBPF capable datapath
Cloud security group integrationProvider CNI, sometimes with Calico or Cilium policy layered on top

NetworkPolicy

NetworkPolicy is the Kubernetes API for Pod level network allow rules. It is namespaced and selects Pods by labels. It controls ingress to selected Pods and egress from selected Pods.

Critical caveat: NetworkPolicy enforcement requires CNI support. If the installed CNI does not implement NetworkPolicy, creating NetworkPolicy objects may have no traffic effect. Behavior beyond TCP, UDP, and SCTP can be plugin specific, including ICMP handling, ARP, DNS details, node local paths, and advanced L7 matching.

Default behavior:

SituationBehavior
No NetworkPolicy selects a Pod for ingressAll ingress is allowed to that Pod.
At least one ingress policy selects a PodOnly ingress allowed by all matching policy rules is allowed.
No NetworkPolicy selects a Pod for egressAll egress is allowed from that Pod.
At least one egress policy selects a PodOnly egress allowed by matching policy rules is allowed.
Multiple policies select the same PodAllowed traffic is additive. Policies do not deny traffic directly.

Default deny

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
  namespace: apps
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

Apply default deny only with a planned allow list. Otherwise you will often break DNS, metrics scraping, readiness probes, webhooks, and dependency access.

Allow frontend to backend

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-api
  namespace: apps
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app.kubernetes.io/name: frontend
      ports:
        - protocol: TCP
          port: 8080

Allow DNS egress

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-egress
  namespace: apps
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

Allow egress to an internal CIDR

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-egress-to-database-subnet
  namespace: apps
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: api
  policyTypes:
    - Egress
  egress:
    - to:
        - ipBlock:
            cidr: 10.40.0.0/16
            except:
              - 10.40.99.0/24
      ports:
        - protocol: TCP
          port: 5432

ipBlock is usually intended for external or infrastructure CIDRs. Do not assume Pod IP allow lists remain stable. Prefer selectors for in-cluster Pods.

NetworkPolicy common mistakes

MistakeResultFix
CNI does not enforce policyPolicies exist but traffic remains openVerify CNI support before relying on policy.
Forgetting DNS egressApps fail with resolution errorsAllow UDP and TCP 53 to CoreDNS or node local DNS.
Label selector too broadUnintended Pods gain accessUse standard labels and verify selected Pods.
Label selector too narrowLegitimate clients lose accessTest from real client Pods and inspect labels.
Assuming policies denyA later policy can allow traffic because allows are additiveModel allow lists, not ordered firewall denies.
Blocking kubelet probesPods become unready or restartUnderstand probe source path for your CNI and cluster.
Using IPs for PodsRules break on rescheduleUse Pod and namespace selectors.
Ignoring SCTP or non TCP behaviorResults vary by pluginConfirm plugin behavior for protocols beyond TCP, UDP, and SCTP.

Policy review commands:

kubectl -n apps get networkpolicy
kubectl -n apps describe networkpolicy allow-frontend-to-api
kubectl -n apps get pod --show-labels
kubectl -n apps run client --rm -it --image=curlimages/curl:8.8.0 --restart=Never -- sh
curl -sv http://api:8080/healthz
nslookup api.apps.svc.cluster.local

Egress control

Egress is harder than ingress because destinations can be IPs, DNS names, external SaaS, cloud metadata services, or private networks. Kubernetes NetworkPolicy only models IP blocks and selectors in the standard API. DNS name aware egress control is plugin or proxy specific.

Egress control patterns:

PatternUse whenTradeoffs
NetworkPolicy egress allow listsInternal dependencies and stable CIDRsPoor fit for dynamic SaaS IPs.
Egress gatewayNeed fixed source IP, audit, or central firewallingAdds a choke point and routing complexity.
Service mesh egressNeed identity aware outbound controlMesh complexity and sidecar or ambient constraints.
Cloud NAT or firewallNeed provider level enforcementMay not know Kubernetes workload identity.
Explicit HTTP proxyNeed URL or domain controlApps must support proxy config or transparent proxying.

Metadata service protection example:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-egress
  namespace: apps
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: api
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53
    - to:
        - ipBlock:
            cidr: 10.50.0.0/16
      ports:
        - protocol: TCP
          port: 443

Production guidance:

  1. Decide whether egress policy is for blast radius, compliance, cost control, or routing.
  2. Keep DNS and time sync dependencies explicit.
  3. Block cloud metadata access by default unless workload identity requires it.
  4. Prefer workload identity over static cloud credentials.
  5. Log denied flows where the CNI supports it.
  6. Test upgrades because policy engines and datapaths can change behavior.

Service mesh overview

A service mesh adds workload identity, traffic policy, telemetry, and often mTLS between services. The classic model injects a proxy sidecar into each Pod. Newer models may use node proxies or ambient datapaths, depending on the mesh.

Mesh capabilities:

CapabilityWhat it gives
mTLSEncrypted and authenticated service to service traffic.
Traffic splittingCanary, mirroring, retries, timeouts, and circuit breaking.
IdentityWorkloads identified by service accounts or mesh identities instead of IPs.
TelemetryRequest metrics, traces, logs, and topology.
PolicyAuthorization rules at L4 or L7, depending on the mesh.

Mesh costs:

CostImpact
Operational complexityControl plane upgrades, proxy versions, injection policy, and debugging paths.
Latency and resourcesProxies consume CPU, memory, and add hops.
Failure modesMisconfigured mTLS or authorization can break healthy apps.
Cognitive loadDevelopers must understand app, Kubernetes, and mesh routing layers.

mTLS overview

mTLS means both client and server present certificates. In a mesh, the control plane usually issues workload certificates to proxies. The proxies authenticate each other, encrypt traffic, and authorize requests based on workload identity.

Rendering diagram...

Important distinctions:

TopicMeaning
TLS terminationWhere encrypted traffic becomes plaintext.
mTLS identityBoth sides authenticate with certificates.
AuthorizationPolicyMesh specific rule deciding which identities can call which workloads.
End to end encryptionMay require app level TLS if plaintext between proxy and app is unacceptable.

Istio and Linkerd tradeoffs

DimensionIstioLinkerd
ScopeBroad traffic management, security, policy, gateways, extensibilityFocused service mesh with simpler operations
Data planeEnvoy sidecars or ambient mode depending on deploymentLightweight Rust proxy sidecars
Traffic featuresVery rich routing, retries, fault injection, gatewaysCore traffic policy with less surface area
ComplexityHigher, more powerful, more knobsLower, easier for many teams to operate
EcosystemLarge ecosystem and many integrationsStrong simplicity and reliability emphasis
Best fitLarge platforms needing advanced routing and policyTeams that want mTLS and telemetry with less operational weight

Mesh adoption checklist:

  1. Prove the need with concrete requirements, not trend pressure.
  2. Start with one namespace and one non-critical service path.
  3. Define ownership for mesh control plane, certificates, policy, and upgrades.
  4. Establish golden signals before enforcing mTLS or authorization.
  5. Document escape hatches for broken injection, proxy startup, and policy rollback.
  6. Measure latency and resource overhead before broad rollout.

Mesh versus NetworkPolicy

NeedNetworkPolicyService mesh
Block Pod traffic by label and portStrong fit if CNI enforces itPossible but heavier
Authenticate workload identityNot provided by standard NetworkPolicyStrong fit
Encrypt service to service trafficNot provided by standard NetworkPolicyStrong fit
HTTP path authorizationNot standardMesh or ingress policy feature
Restrict DNS name egressNot standardPossible with mesh or plugin specific policy
Portable Kubernetes APIStrongerMesh APIs vary

Use both when needed. NetworkPolicy gives baseline segmentation at the network layer. Mesh policy gives identity aware application traffic control. Do not treat mesh mTLS as a replacement for all network segmentation.

Troubleshooting network failures

Use a controlled debug Pod in the same namespace and, when needed, on a specific node.

apiVersion: v1
kind: Pod
metadata:
  name: netshoot
  namespace: apps
spec:
  restartPolicy: Never
  containers:
    - name: netshoot
      image: nicolaka/netshoot:latest
      command:
        - sleep
        - "3600"
kubectl -n apps apply -f netshoot.yaml
kubectl -n apps exec -it netshoot -- bash
dig kubernetes.default.svc.cluster.local
dig api.apps.svc.cluster.local
curl -sv http://api:8080/healthz
nc -vz api 8080
ip route
ss -tnp

Flow:

Rendering diagram...

Direct Pod path

kubectl -n apps get pods -o wide
kubectl -n apps exec -it netshoot -- curl -sv http://POD_IP:8080/healthz

If direct Pod IP fails:

CheckCommand
App listeningkubectl -n apps exec POD -- ss -lntp
Pod labels and policykubectl -n apps get pod POD --show-labels
CNI Podskubectl -n kube-system get pods -o wide
Node routekubectl get node NODE -o yaml plus node route inspection
MTUtracepath POD_IP

Service path

kubectl -n apps get svc api -o yaml
kubectl -n apps get endpointslice -l kubernetes.io/service-name=api -o yaml
kubectl -n apps exec -it netshoot -- curl -sv http://api:8080/healthz

If Pod IP works but Service fails, focus on Service selectors, ports, kube-proxy, eBPF service maps, conntrack, and EndpointSlice readiness.

DNS path

kubectl -n apps exec -it netshoot -- cat /etc/resolv.conf
kubectl -n apps exec -it netshoot -- dig api.apps.svc.cluster.local
kubectl -n kube-system logs deploy/coredns --tail=100
kubectl -n kube-system get endpointslice -l k8s-app=kube-dns

If DNS fails but Service IP works, do not chase kube-proxy first. Check CoreDNS health, DNS policy, node local DNS, and egress policy.

Mesh path

kubectl -n apps get pod POD -o jsonpath='{.spec.containers[*].name}{"\n"}'
kubectl -n apps logs POD -c istio-proxy --tail=100
kubectl -n apps describe pod POD

Mesh specific tools differ:

MeshUseful checks
Istioistioctl proxy-status, istioctl proxy-config clusters, istioctl analyze
Linkerdlinkerd check, linkerd viz stat, linkerd viz tap

Common failure patterns

SymptomLikely layerTypical cause
DNS timeoutDNS, policy, CNIDNS egress blocked or CoreDNS unreachable.
DNS NXDOMAINDNS nameWrong namespace, typo, missing Service.
Pod IP works, Service IP failsService datapathWrong Service port, no EndpointSlice, kube-proxy or eBPF issue.
Service works inside namespace onlyDNS or policyShort name depends on search path, cross namespace policy missing.
Cross node Pod traffic failsCNI or underlayOverlay blocked, BGP issue, cloud route missing, MTU.
Only large responses failMTUEncapsulation overhead and blocked fragmentation.
Works before NetworkPolicy applyPolicyMissing allow for DNS, backend, probes, or mesh proxy ports.
Works without sidecar onlyMeshmTLS mode, authorization policy, or proxy config.
External egress source IP wrongNAT pathNode SNAT, cloud NAT, or missing egress gateway.

Production guidance

AreaGuidance
CNI choicePick based on policy, observability, cloud integration, and operator skill, not benchmarks alone.
IP planningSize Pod CIDRs and Service CIDRs before cluster creation. Readdressing later is painful.
Policy rolloutStart with observe, then namespace default deny, then workload allow lists.
DNSTreat CoreDNS as production infrastructure with metrics, autoscaling, and alerting.
EgressDecide the allowed outbound model before compliance asks for it.
MeshAdopt for identity, encryption, telemetry, or advanced traffic policy, not for every cluster by default.
UpgradesTest CNI, kube-proxy, kernel, and mesh upgrades in a realistic staging cluster.
ObservabilityKeep packet, flow, DNS, proxy, and application telemetry correlated by namespace, Pod, node, and Service.

Review checklist

  1. CNI plugin is identified, supported, and healthy on every node.
  2. The team knows whether kube-proxy, IPVS, iptables, or eBPF handles Services.
  3. Pod CIDR, Service CIDR, node CIDR, and VPC CIDR do not overlap.
  4. NetworkPolicy enforcement is verified with real traffic tests.
  5. NetworkPolicy caveats are documented: CNI support is required, and behavior beyond TCP, UDP, and SCTP can be plugin specific.
  6. Default deny policies include explicit DNS, dependency, probe, and telemetry allowances.
  7. Egress to cloud metadata endpoints is intentionally allowed or blocked.
  8. CoreDNS has enough replicas, resource requests, cache settings, and alerts.
  9. MTU is validated for overlay or VPN paths.
  10. Mesh injection policy is explicit and reversible.
  11. mTLS mode is known for every meshed namespace.
  12. Authorization policies are tested from allowed and denied callers.
  13. Troubleshooting runbooks separate DNS, Service, CNI, policy, mesh, and application failures.