Purpose: Provide a study and practice path for mastering Linux systems engineering from user space fundamentals through kernel internals, containers, production operations, and eBPF.

Linux Systems Mastery Roadmap

This roadmap is ordered by operational dependency rather than academic purity. Learn the kernel boundary before eBPF, process state before incident response, memory accounting before container limits, and VFS behavior before debugging storage latency. The goal is to build a mental model that survives production ambiguity.

Phase 1: Build The Boundary Model

Start with 01 Linux Mental Model User Space Kernel and Hardware and 06 System Calls ABI libc and User Kernel Boundaries.

You should be able to explain:

Why Linux is the kernel while a distribution supplies policy, packaging, service management, and defaults.
How firmware, bootloader, kernel, initramfs, kernel command line, and PID 1 participate in boot.
Why user space cannot directly access hardware, page tables, scheduler state, or most privileged CPU instructions.
How libc wrappers, system call numbers, registers, errno, vDSO, file descriptors, and stable ABI rules shape application behavior.
Why /proc, /sys, /dev, tmpfs, devtmpfs, and other virtual filesystems are APIs, not ordinary persistent storage.

Practice:

uname -a
cat /proc/cmdline
cat /proc/self/status
readlink /proc/self/exe
ls -l /proc/self/fd
strace -f -e trace=process,file,network true
systemctl status

Phase 2: Master Processes, Memory, Files, and Networks

Read:

The key shift is to stop treating command output as the system. ps, top, free, ip, ss, df, du, mount, and journalctl are views over kernel state and user space policy. When views disagree, find the object they are viewing: task, memory cgroup, page cache, inode, mount, socket, route, conntrack entry, unit, or namespace.

Practice labs:

Lab	Commands	Learning target
Fork and exec	`strace -f bash -lc 'echo hi	wc -c'`
Zombie	Run a tiny parent that does not wait for a child.	Zombies are process table entries waiting for parent collection, not running tasks.
Page cache	`dd`, `sync`, `echo 3 > /proc/sys/vm/drop_caches` on a lab host only.	File IO and memory pressure interact through reclaim and cache.
Inode exhaustion	Create many tiny files on a disposable filesystem.	Disk free space and inode availability fail independently.
Network namespace	`ip netns add lab`, veth pair, bridge, route, `tcpdump`.	Containers are built from the same primitives.

Production caution:

Do not drop caches, flush firewall rules, detach mounts, kill process groups, or change sysctls on production hosts without an explicit rollback path.
Prefer read-only inspection first: /proc, /sys, systemctl show, journalctl, ip -details, ss -tinp, perf stat, bpftool prog show.

Phase 3: Operate Services and Security Policy

Read 07 systemd Boot Init Units Timers Journald and Services, 08 Permissions Users Groups Capabilities and LSMs, and 12 Linux Security Hardening Secrets and Incident Response.

Systemd is not merely a service launcher. It is the user space coordinator for dependency graph activation, cgroup placement, service supervision, logging integration, socket activation, timers, transient units, hardening directives, and resource controls. Security failures often appear as service failures because PID 1, PAM, sudo, capabilities, LSM policy, seccomp, mount permissions, and cgroup placement meet at unit start.

Checklist:

Can you explain Requires= vs After= without mixing requirement and ordering?
Can you find the effective unit after vendor file, administrator override, and runtime drop-ins?
Can you tell whether a denial came from Unix mode bits, capabilities, SELinux, AppArmor, Landlock, seccomp, readonly mounts, or missing devices?
Can you harden a service without breaking its needed filesystem, network, or capability access?

Phase 4: Understand Containers As Linux

Read 09 cgroups Namespaces Containers and Runtime Isolation.

Containers are processes with constrained views and constrained resources. Namespaces change what resources look like. Cgroups account and limit resources. Mounts assemble a root filesystem. Capabilities and LSMs reduce privilege. Seccomp filters system calls. Runtimes implement OCI specifications. Kubernetes schedules and configures those primitives through containerd or CRI-O, CNI, CSI, kubelet, and node agents.

Practice:

unshare --mount --uts --ipc --pid --fork --user --map-root-user bash
cat /proc/self/cgroup
systemd-run --scope -p MemoryMax=200M -p CPUQuota=50% stress-ng --vm 1 --vm-bytes 300M

Use local machines for destructive namespace and cgroup experiments. In production clusters, inspect the relationship between pod, container, cgroup, network namespace, and host processes before changing anything.

Phase 5: Build Observability and Performance Discipline

Read 10 Observability Logs Metrics Tracing and Debugging, 11 Performance Engineering perf Flamegraphs and Capacity, and 17 Production Operations Troubleshooting and Runbooks.

Performance work is evidence discipline. First identify the constrained resource, then select the lowest overhead tool that can validate or falsify the hypothesis. Avoid starting with the most powerful tracer because it may require privileges, expose sensitive data, or change timing.

Layered workflow:

Ask what changed, what is affected, and what resource is saturated.
Check logs and counters for a fast boundary: journalctl, dmesg, service metrics, systemctl status.
Inspect live state: top, ps, pidstat, vmstat, iostat, ss, ip, ethtool, /proc/pressure/*.
Trace only a bounded target: strace -p, perf record -p, bpftrace one-liners, tcpdump with filters.
Capture enough evidence for rollback or escalation.

Phase 6: Learn Kernel Architecture Without Cargo Culting Patches

Read 13 Kernel Architecture Modules Drivers and Device Model.

You do not need to patch the kernel to operate Linux well, but you do need to know what the kernel is doing. Understand monolithic kernel design, modules, devices, VFS, network stack, memory manager, scheduler, RCU, locking, workqueues, softirqs, kernel threads, panics, oops reports, taint flags, config options, and build boundaries.

Production guidance:

Prefer upstream or distribution-supported kernels for production.
Treat out-of-tree modules as operational risk: ABI mismatch, taint, crash risk, and upgrade constraints.
Build custom kernels on learning machines or dedicated lab hosts unless there is a business case and rollback plan.

Phase 7: Use eBPF As Constrained Kernel Extension