eBPF Observability Uprobes Kprobes Tracepoints and CO-RE

Reading time
13 min read
Word count
2554 words
Diagram count
2 diagrams

Source: Victor Bona's Obsidian Compendium snapshot, Knowledge base/linux-systems-engineering/16 eBPF Observability Uprobes Kprobes Tracepoints and CO-RE.md.

Purpose: Use eBPF observability safely for syscall, process, file, TCP, DNS, latency, off-CPU, lock, and application tracing while managing overhead, privacy, portability, and production troubleshooting.

16 eBPF Observability Uprobes Kprobes Tracepoints and CO-RE

Related notes: Linux Systems Engineering, 02 Processes Threads Scheduling Signals and Jobs, 04 Filesystems VFS Block IO Page Cache and Storage, 05 Linux Networking TCP IP Routing Firewalling and DNS, 06 System Calls ABI libc and User Kernel Boundaries, 14 eBPF Fundamentals Verifier Maps Programs and Helpers, 15 eBPF Networking XDP TC Cilium and Service Dataplanes, 17 Production Operations Troubleshooting and Runbooks, 18 Linux Ecosystem Tools and Learning Projects

eBPF observability is event-driven instrumentation at kernel and user-space boundaries. It can answer questions that ordinary logs and metrics miss: which process opened this file, which syscall is slow, where TCP connects fail, which lock path blocks, which user-space function contributes latency, or which DNS names a workload resolves. It is powerful because it can run close to the event source. It is risky because the event source may be hot, sensitive, unstable, or different across kernel and binary versions.

On a local learning machine, use bpftrace one-liners, toy programs, disposable VMs, and known workloads. Trace too broadly once so you understand the cost. On production hosts and clusters, prefer narrow attach points, bounded duration, sampling, aggregation, redaction, and a clear exit condition. The question is not "can eBPF see this?" The question is "can this observation be collected without changing the incident more than it explains?"

Rendering diagram...

Choosing an Instrumentation Surface

SurfaceStabilityUseCaution
tracepointrelatively stable kernel event ABIsyscall, scheduler, block, network eventsfields still vary by kernel and config
raw tracepointlower overhead and raw contexthot tracepoint pathsless friendly decoding
kprobedynamic kernel function entrymissing tracepoint, deep debuggingfunction names and arguments can change
kretprobedynamic kernel function returnreturn values and latencyreturn probes add overhead and can miss some paths
fentryBTF-typed function entryefficient kernel function tracingneeds BTF and support
fexitBTF-typed function exitreturn values with lower overhead than kretprobe in many casessame portability requirements
uprobeuser-space function entryapp or library instrumentationbinary symbols, ASLR, inlining, versions
uretprobeuser-space function returnapp latency and return valueshigher overhead, recursive calls need correlation
USDTuser-level static probesapp-declared stable probe pointsrequires probes compiled into binary or runtime

Prefer stable static points first: application metrics, OpenTelemetry spans, logs, kernel tracepoints, and USDT probes. Use dynamic kprobes and uprobes when the stable surface lacks the needed signal.

Syscall Tracing

Syscall tracing maps user-space behavior to kernel entry points. It is useful for permission denials, unexpected file access, network calls, process spawning, and latency at the user-kernel boundary. Tracepoints such as syscall enter and exit events are usually a better production surface than kprobes on syscall implementation functions.

Example bpftrace for local learning:

sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat { @[comm] = count(); }'

Production shape:

  • filter by cgroup, PID namespace, UID, command, or service
  • aggregate counts instead of printing every event
  • sample arguments only when needed
  • never collect pathnames or arguments fleet-wide without data classification

Syscall names are not the same as application intent. A high openat count may be normal dynamic linker, config reload, logging, or filesystem cache behavior. Correlate with process, path class, latency, and error code.

Process Exec Tracing

Exec tracing answers "what actually started?" It is valuable for incident response, cron surprises, container entrypoints, shell escapes, and deployment verification.

Common fields:

FieldWhy it matters
PID and parent PIDprocess tree and ancestry
UID and GIDactor and privilege
cgroup or container IDworkload attribution
command and argsexecuted program and intent
timestamptimeline reconstruction
return code from execfailed execution attempts

Sensitive-data warning: command-line arguments often contain tokens, passwords, connection strings, file paths, customer identifiers, or incident secrets. In production, hash, truncate, or allowlist fields before export.

File Access Tracing

File tracing can attach to syscalls, VFS functions, LSM hooks, or tracepoints depending on the question.

QuestionBetter surface
which process attempted to open a pathsyscall tracepoint or LSM hook
why did access failsyscall exit plus errno, LSM audit if available
which filesystem path is hotVFS or syscall aggregation
which block device is slowblock tracepoints, not path syscalls
who changed a sensitive fileauditd or fanotify may be better for durable policy

Pathnames are hard. Kernel paths can be renamed while observed, dentries may not reconstruct cleanly, and containers see mount namespaces. A path from the host namespace may differ from the path inside the container. For production security monitoring, prefer established audit mechanisms unless eBPF is needed for a specific gap.

TCP Connection Tracing

TCP tracing can expose connect attempts, accepts, retransmits, resets, state transitions, and latency. Useful surfaces include syscall tracepoints for connect, tracepoints in TCP state handling, kprobes or fentry on TCP functions when tracepoints are insufficient, and cgroup socket hooks for workload attribution.

Example fields:

FieldUse
source and destination tupleflow identity
PID, command, cgroupworkload attribution
TCP statehandshake and close behavior
errno or reset reasonfailure classification where available
latencyconnect or request phase timing

Use 05 Linux Networking TCP IP Routing Firewalling and DNS for the packet and socket model before assuming every TCP failure is an application bug.

DNS Tracing

DNS can be traced at multiple layers:

LayerWhat it showsBlind spot
application library uproberequested name before resolver policylanguage and library specific
libc resolver uprobenames sent through glibc pathapps may bypass libc
UDP/TCP port 53 packet parsingwire query and responseencrypted DNS and local cache behavior
CoreDNS or resolver logsserver-side answer pathmisses client-side cache and NSS behavior
eBPF socket or packet tracingtuple and payload where visibleprivacy and encryption limits

DNS names can be sensitive. They may reveal tenants, internal services, experiments, and incident targets. In production, aggregate by suffix, hash full names, or sample only failed responses when possible.

Latency Histograms

Latency histograms are one of the best eBPF observability patterns. The BPF program records start time in a map, computes duration on completion, and increments a bucket. User space reads compact aggregated state.

Rendering diagram...

Key choice matters. PID alone can collide across threads or reused processes. For syscalls, use PID/TID plus operation-specific identifiers where possible. Always delete start records on completion to avoid map leaks. Add fallback cleanup for long-lived missing exits if the workload or probe can miss events.

Off-CPU Profiling

Off-CPU profiling asks where threads spend time not running: blocked on IO, locks, futexes, scheduler delays, or sleeping. eBPF can observe scheduler switches, capture stack traces for blocked tasks, and build aggregate blocked-time profiles.

Production cautions:

  • stack capture is expensive
  • symbolization needs debug symbols or frame pointers, depending on stack source
  • blocked time is not always bad; sleeping event loops are normal
  • container attribution requires cgroup or namespace correlation
  • high-cardinality stack maps can consume memory

Use off-CPU profiles with CPU profiles. A service can be slow because it is burning CPU, waiting on storage, blocked on locks, rate limited, or starved by scheduling.

Lock Contention Tracing

Lock contention tracing can use kernel lock tracepoints, scheduler signals, futex tracing, or application uprobes around lock functions. The right surface depends on whether the lock is kernel internal, pthread/futex-based, runtime-level, or application-defined.

Lock typePossible observation
kernel spinlock or mutexlock tracepoints or kernel probes where available
pthread mutexfutex syscalls, libc uprobes, app runtime probes
JVM, Go, Rust runtime locksruntime-specific probes or symbols if exposed
database locksapplication metrics and logs are often better

Do not infer lock ownership from one signal. Combine wait duration, stack traces, owner hints if available, and application-level context.

Application Uprobes

Uprobes attach to user-space instruction addresses. They can instrument functions in binaries and shared libraries without modifying source. Uretprobes observe function returns and can compute latency or inspect return values.

Practical problems:

  • stripped binaries may lack symbols
  • optimized code may inline or eliminate functions
  • shared library versions change offsets
  • ASLR and containers complicate path resolution
  • language runtimes may move work away from the function you chose
  • function arguments follow ABI rules, not source-level names

Local example:

sudo bpftrace -e 'uprobe:/usr/lib/x86_64-linux-gnu/libc.so.6:getaddrinfo { @[comm] = count(); }'

Production guidance:

  • pin binary build IDs or package versions
  • prefer USDT or runtime-supported probes when available
  • test against the exact container image
  • filter by cgroup before reading arguments
  • avoid attaching to extremely hot functions without sampling

USDT Probes

USDT means user-level statically defined tracing. Applications or runtimes compile named probe points into binaries. Unlike arbitrary uprobes, USDT probes are intentional instrumentation contracts. They are common in runtimes, databases, and some system services.

USDT is often the best bridge between application semantics and eBPF mechanics. It can expose "request started", "query planned", or "garbage collection began" more directly than guessing from syscalls.

Limitations:

  • probes must exist in the binary or runtime
  • fields are only as good as the provider contract
  • some environments strip or package binaries without probes
  • high-rate probes still need sampling and aggregation

OpenTelemetry Relationship

OpenTelemetry is an instrumentation and telemetry data model for traces, metrics, and logs. eBPF is a kernel mechanism for collecting or enforcing at low-level hooks. They are complementary.

NeedOpenTelemetryeBPF
business transaction tracestrong when app is instrumentedinferred and incomplete
kernel latency or syscall failuresweak unless app records itstrong at boundary
network flow attributionapp-level viewhost and kernel view
low-level process/file evidenceusually absentstrong but sensitive
semantic labelsstrongmust be derived

The best production systems connect them carefully: eBPF fills blind spots, while OpenTelemetry provides request context. Avoid pretending eBPF can reconstruct encrypted application semantics or user intent without application cooperation.

Overhead Management

Overhead comes from attach frequency, per-event work, map operations, stack capture, user memory reads, string handling, event emission, user-space decoding, and downstream export.

ControlEffect
early filtersreduce work before maps and events
aggregation in mapsreduce event volume
samplingbound cost on hot paths
per-CPU countersreduce contention
ring-buffer drop countersreveal lost visibility
duration limitsprevent forgotten incident tracers
feature flagsroll back high-cost probes quickly

Production rule: export the tracer's own health. At minimum track load failures, attach failures, map update failures, buffer drops, events processed, events exported, and CPU or memory use of the user-space agent.

Sampling and Cardinality

Sampling decides which events become detailed records. Cardinality decides how many unique keys maps and downstream systems must hold. Both are reliability controls.

High-cardinality keys:

  • full pathnames
  • full DNS names
  • PID plus timestamp plus command-line
  • complete stack traces
  • source and destination tuple at internet scale
  • Kubernetes pod UID plus container plus process plus request label

Prefer hierarchical aggregation: service, namespace, cgroup, executable, error code, latency bucket. Keep raw details for sampled exemplars or incident windows.

Privacy and Sensitive Data

eBPF observability can see data that application logging intentionally avoids: arguments, file names, DNS names, socket addresses, process command lines, sometimes buffers, and user memory. Production collection needs explicit data handling.

Guidance:

  • classify each captured field before rollout
  • avoid payload capture by default
  • hash or truncate sensitive names
  • redact command-line arguments unless allowlisted
  • separate local forensic scripts from fleet agents
  • restrict who can run ad hoc tracers
  • define retention for raw event streams

Root on a local lab is not a privacy model. Production hosts hold tenant data, credentials, and incident-sensitive artifacts.

Verifier Failure Troubleshooting

Verifier failures are normal development feedback.

Error shapeLikely causeFix pattern
invalid read from stackstack slot not initializedwrite before read, zero structs
invalid access to packetmissing bounds proofcheck every header against data_end
R type mismatchhelper argument type wrongfollow helper prototype for program type
unbounded loopmax iteration cannot be provenclamp loop count to a constant bound
map value may be NULLmap lookup not checkedbranch after lookup before dereference
unreleased referencesocket or kptr reference not releasedcall release helper on all paths
program too large or complexstate explosionsimplify branches, split with tail calls

Capture full verifier logs in CI for BPF programs. Compiler changes can alter bytecode shape enough to change verifier outcomes.

Missing BTF Troubleshooting

CO-RE and fentry/fexit depend on BTF availability. First check:

ls -l /sys/kernel/btf/vmlinux
sudo bpftool btf dump file /sys/kernel/btf/vmlinux format raw | head
sudo bpftool feature probe kernel | grep -i btf

If BTF is missing:

  • install the distribution kernel BTF or debug package if available
  • generate a BTF file only if your build process supports it and the kernel allows it
  • fall back to tracepoints, kprobes, or non-CO-RE builds where appropriate
  • treat vendor kernels and backports as separate targets

If BTF exists but relocation fails:

  • confirm the target type exists on that kernel
  • check field renames or layout differences
  • inspect compiled object BTF
  • verify the loader is using the expected object and target kernel
  • check architecture-specific differences

Kernel Compatibility

Kernel version is a weak proxy. Distribution kernels backport features, disable configs, or carry patches. A 5.15 enterprise kernel and an upstream 5.15 kernel may not expose the same practical BPF surface.

Compatibility matrix fields:

FieldWhy it matters
kernel release and distrofeature and backport baseline
architectureJIT, ABI, register conventions, stack unwinding
BTF presenceCO-RE and fentry/fexit
helper and map supportprogram load success
lockdown and capabilitiespermissions to load and attach
cgroup modeworkload attribution and cgroup hooks
container runtimecgroup and namespace mapping

Use bpftool feature probe in preflight checks. Do not rely only on uname -r.

CO-RE Portability Troubleshooting

CO-RE failures usually come from a mismatch between compiled expectations and target kernel types.

Runbook:

llvm-objdump -h program.bpf.o
bpftool btf dump file program.bpf.o format raw | head
bpftool btf dump file /sys/kernel/btf/vmlinux format c > /tmp/vmlinux.h
grep -n "struct task_struct" /tmp/vmlinux.h | head

Decision table:

SymptomInterpretationAction
no .BTF in objectobject was not built with BTFfix compile flags and target
target has no vmlinux BTFhost lacks kernel BTFinstall BTF package or use fallback
field relocation failsfield absent or renameduse CO-RE existence checks or version-specific fallback
program loads on one distro onlybackport or config differenceexpand feature matrix
fentry attach failsfunction not traceable or BTF mismatchuse tracepoint or kprobe fallback

Production Runbook

Before running a tracer:

  1. State the question in one sentence.
  2. Pick the narrowest stable attach point.
  3. Define filters and sampling.
  4. Define captured fields and privacy handling.
  5. Define duration and rollback.
  6. Watch tracer health while it runs.

During collection:

sudo bpftool prog show
sudo bpftool map show
sudo bpftool link show
top -H -p $(pidof your-agent)
journalctl -u your-agent --since -10m

After collection:

  • detach programs or stop the agent
  • remove temporary pinned maps and links
  • record kernel version, tool version, attach points, filters, and known drops
  • separate confirmed evidence from inference

Common Mistakes

MistakeResultBetter practice
tracing every syscall on every processhigh overhead and noisy datafilter by service, cgroup, syscall, and time
printing every eventuser-space drain becomes bottleneckaggregate and sample
reading full arguments by defaultsecret leakageallowlist fields and redact
using kprobes as stable APIsbreakage after kernel updateprefer tracepoints, fentry with BTF, or compatibility tests
ignoring dropped eventsfalse confidenceexport drop counters
assuming host paths equal container pathswrong file conclusionsinclude mount namespace or cgroup context
treating eBPF as OpenTelemetry replacementmissing request semanticscombine kernel evidence with app instrumentation

eBPF observability is strongest when it is a scalpel: narrow, bounded, and connected to a concrete hypothesis. It is weakest when used as a permanent firehose of everything the kernel can expose.