Compendium

Software Engineering

Staff and Principal-level notes on architecture, distributed systems, reliability, security, delivery, leadership, and AI-native engineering.

Notes: 16
Reading: 382 min
References: 197
Diagrams: 94

[INDEX]

Study map

This is the canonical entry point for the Software Engineering knowledge base. Use it to move from broad system judgment to focused topic notes without losing the whole-system context. The goal is Staff and Principal engineering depth: concept mastery, design judgment, operational correctness, verification discipline, and the ability to change complex systems without creating hidden risk.

This note is a map, not a textbook. Leaf notes own depth, proofs, examples, checklists, code, and operational playbooks. This index owns routing, coverage, study order, and the relationships between domains.

How to use this index

Use this page in four modes:

Mode	Use when	Start here	What good output looks like
Orientation	You need the shape of the field before diving deeper.	00 Staff Principal Software Engineering and the #Domain map.	You can explain how correctness, reliability, security, performance, and execution fit together.
Design review	You are evaluating an architecture, RFC, migration, incident fix, or platform change.	#Staff and Principal standard and #System review lens.	You identify invariants, failure modes, tradeoffs, verification gates, and owner boundaries.
Topic study	You need to master a specific area such as consensus, memory ordering, TLA+, queues, or release safety.	#Required topic coverage matrix.	You know the primary note, adjacent notes, and the question the topic helps answer.
Execution planning	You need to sequence learning or project work at senior depth.	#Staff and Principal study path.	You have a staged path from fundamentals to cross-org technical leadership.

Navigation rules:

Start with the numbered notes when you need breadth. They are dense MOCs for each domain.
Jump to existing vault anchors when they already own a topic, especially Data Structures/Data Structures, Design Patterns/Design patterns, Event-Driven Architectures and Event Sourcing, Software testing, Software Supply Chain Security, kubernetes/Kubernetes, AI-Enhanced Software Development, Indexing Large Codebases for AI-Assisted Development, Context-Aware Systems and MCP Protocols, and LLMOps and Model Deployment.
Prefer the domain note before creating a new leaf note. If a topic only needs a routing sentence, keep it in the MOC. If it needs examples, proofs, diagrams, incident stories, or implementation detail, split it into an atomic note.
Read laterally. Most important software engineering problems cross boundaries: a queueing issue can be a product SLO issue, a database isolation issue can be a security issue, and a deployment strategy can be an organizational design issue.
Treat every note as a tool for decisions. Ask: what decision does this help me make, what invariant does it protect, and what evidence would show it is working?

Core map

00 Staff Principal Software Engineering: Staff and Principal mental model, execution loop, system-property thinking, and review prompts.
01 Engineering Fundamentals: programming models, concurrency, memory ordering, cache coherency, nonblocking algorithms, liveness, and mutability.
02 Architecture and Design: code architecture, boundaries, state machines, architecture governance, ADRs, and design review.
03 Data Structures Algorithms and Complexity: complexity analysis, storage-oriented structures, concurrent data structures, and algorithmic patterns.
04 Databases Storage and Transactions: storage engines, indexes, isolation, transactions, replication, distributed databases, and migrations.
05 Distributed Systems: consistency, time, CAP, PACELC, consensus, replicated state machines, quorums, retries, networking, and failure patterns.
06 Caching Queues and Streaming: caching, invalidation, queueing theory, delivery semantics, retry design, streaming, and Kafka.
07 APIs Contracts and Integration: API contracts, idempotency, event contracts, schema evolution, and integration risk.
08 Reliability Observability and Operations: failure modes, observability, alerts, incidents, control planes, and network operations.
09 Security and Supply Chain: threat modeling, access boundaries, secrets, supply chain controls, application security, and security review.
10 Testing Verification and Quality Bars: test layers, formal methods, TLA+, concurrency testing, quality bars, and review checklists.
11 Performance Capacity and Cost: latency, throughput, CPU, memory, contention, capacity planning, load testing, and cost engineering.
12 Delivery Migrations and Release Engineering: release strategies, migration safety, rollback, CI/CD, and GitOps.
13 Technical Leadership and Execution: Conway's Law, strategy, leverage, review systems, decisions, and operating rhythm.
14 AI Native Software Engineering: AI-assisted development, agentic systems, retrieval, context engineering, LLMOps, and model deployment.

Domain map

Domain	Primary note	Core question	Typical artifacts
Engineering judgment	00 Staff Principal Software Engineering	What system property are we changing, and who owns the risk?	Review prompts, decision frames, escalation criteria, learning plans.
Programming foundations	01 Engineering Fundamentals	What behavior does the code have under concurrency, memory effects, mutation, and failure?	Invariants, concurrency contracts, liveness analysis, correctness notes.
Architecture	02 Architecture and Design	What boundaries, states, and dependencies make change safer or more expensive?	ADRs, context diagrams, state machines, interface contracts.
Algorithms and structures	03 Data Structures Algorithms and Complexity	What complexity, access pattern, and data shape does the system rely on?	Complexity budgets, data-structure choices, proof sketches, benchmark plans.
Data systems	04 Databases Storage and Transactions	What guarantees does persistent state actually provide?	Isolation analysis, migration plans, schema evolution rules, backup and restore criteria.
Distributed systems	05 Distributed Systems	What happens when time, networks, membership, and partial failure become unreliable?	Consistency choices, quorum design, consensus notes, retry policies.
Flow control	06 Caching Queues and Streaming	How do we absorb load, hide latency, and move work without violating correctness?	Cache policy, invalidation model, queue topology, stream processing contract.
Integration	07 APIs Contracts and Integration	What do producers and consumers depend on, and how does that contract evolve?	API specs, event schemas, compatibility rules, idempotency keys.
Reliability	08 Reliability Observability and Operations	How does the system fail, and how do humans detect and recover it?	SLOs, dashboards, alerts, runbooks, incident reviews.
Security	09 Security and Supply Chain	What trust boundary can be crossed, and what prevents abuse or compromise?	Threat models, permission matrices, secret handling rules, supply chain attestations.
Verification	10 Testing Verification and Quality Bars	What evidence is strong enough to trust this behavior?	Test strategy, model checks, fuzz tests, quality gates, review checklists.
Performance and cost	11 Performance Capacity and Cost	Where are the bottlenecks, limits, and economic tradeoffs?	Capacity models, latency budgets, load tests, cost envelopes.
Delivery	12 Delivery Migrations and Release Engineering	How do we ship, migrate, and recover without depending on luck?	Release plans, rollback plans, migration scripts, progressive delivery controls.
Leadership	13 Technical Leadership and Execution	How do people, ownership, and incentives shape the technical system?	Strategy docs, decision logs, team topology, operating rhythms.
AI-native engineering	14 AI Native Software Engineering	How do AI tools and systems change build, test, operate, and govern loops?	Prompt protocols, evals, retrieval design, agent boundaries, LLMOps controls.

Knowledge graph

Rendering diagram...

System review lens

Use this lens for architecture reviews, incident follow-ups, migration plans, and production readiness checks.

Lens	Question	Evidence to look for	Related notes
Correctness	What invariant must remain true under retries, concurrency, partial failure, deploys, and repair jobs?	Explicit invariants, idempotency keys, transaction boundaries, model checks, race tests.	01 Engineering Fundamentals, 04 Databases Storage and Transactions, 10 Testing Verification and Quality Bars
Reliability	What does the user observe when dependencies fail, slow down, split brain, or return stale data?	SLOs, graceful degradation, retry budgets, circuit breakers, alert quality, runbooks.	05 Distributed Systems, 08 Reliability Observability and Operations, 11 Performance Capacity and Cost
Operability	Can humans detect, understand, mitigate, and repair bad behavior quickly?	Dashboards, structured logs, traces, safe admin actions, incident playbooks, rollback commands.	08 Reliability Observability and Operations, 12 Delivery Migrations and Release Engineering
Evolvability	Can the design absorb new requirements without hidden coupling or data breakage?	Stable contracts, compatibility tests, bounded contexts, ADRs, schema evolution policy.	02 Architecture and Design, 07 APIs Contracts and Integration, 13 Technical Leadership and Execution
Security	What trust boundary exists, and what prevents privilege escalation, data exposure, or supply chain compromise?	Threat model, least privilege, secret controls, dependency provenance, abuse cases.	09 Security and Supply Chain, Software Supply Chain Security
Performance	What happens at peak, during contention, and when the hot path shifts?	Latency budgets, throughput targets, load tests, profiles, queue depth, capacity model.	06 Caching Queues and Streaming, 11 Performance Capacity and Cost, Littles law and efficient queue strategy
Delivery safety	How is the change deployed, observed, rolled back, and cleaned up?	Progressive rollout, migration phases, rollback plan, release gates, ownership.	12 Delivery Migrations and Release Engineering, 10 Testing Verification and Quality Bars

Example review prompt:

> If this change is retried, deployed halfway, processed out of order, observed through stale caches, or rolled back after partial writes, what invariant still holds?

Required topic coverage matrix

This matrix is the minimum advanced-topic routing table for this knowledge base. "Mastery signal" names the behavior that shows the topic is not just memorized.

Topic	Primary location	Related locations	Mastery signal
Consistency	05 Distributed Systems#Consistency models	04 Databases Storage and Transactions#Isolation and correctness, 07 APIs Contracts and Integration#Event contracts	You can choose and defend a consistency model for a user-visible workflow.
Linearizability	05 Distributed Systems#Consistency models	10 Testing Verification and Quality Bars#Formal methods and model checking	You can distinguish real-time ordering from serializability and test the claim.
Serializability	04 Databases Storage and Transactions#Isolation and correctness	04 Databases Storage and Transactions#Transactions	You can explain anomalies and pick isolation levels based on invariants.
Eventual consistency	05 Distributed Systems#Consistency models	06 Caching Queues and Streaming#Message delivery semantics, 07 APIs Contracts and Integration#Event contracts	You can design reconciliation, read-your-writes expectations, and conflict handling.
Mutex	01 Engineering Fundamentals#Concurrency primitives	11 Performance Capacity and Cost#Contention	You can explain mutual exclusion, convoying, priority inversion, and lock scope.
Semaphores	01 Engineering Fundamentals#Concurrency primitives	06 Caching Queues and Streaming#Queueing fundamentals, 11 Performance Capacity and Cost#Contention	You can use permits to bound concurrency without hiding overload.
Condition variables	01 Engineering Fundamentals#Concurrency primitives	10 Testing Verification and Quality Bars#Concurrency testing	You can reason about wait predicates, missed wakeups, and spurious wakeups.
Memory ordering	01 Engineering Fundamentals#Memory models and ordering	11 Performance Capacity and Cost#CPU and memory performance	You can explain acquire, release, fences, and why data races invalidate reasoning.
Atomic operations	01 Engineering Fundamentals#Memory models and ordering	01 Engineering Fundamentals#Nonblocking algorithms	You can use CAS or fetch-add while preserving a clear invariant.
Lock free programming, also written lock-free programming	01 Engineering Fundamentals#Lock-free and wait-free programming	11 Performance Capacity and Cost#Lock-free and wait-free tradeoffs	You can separate progress guarantees from raw speed and identify ABA risks.
Wait-free algorithms	01 Engineering Fundamentals#Nonblocking algorithms	03 Data Structures Algorithms and Complexity#Concurrent data structures	You can explain bounded per-thread progress and when the complexity is justified.
Deadlocks	01 Engineering Fundamentals#Liveness failures	10 Testing Verification and Quality Bars#Concurrency testing	You can identify circular wait and remove it through ordering, timeouts, or ownership changes.
Livelocks	01 Engineering Fundamentals#Liveness failures	08 Reliability Observability and Operations#Failure modes	You can detect systems doing work without progress and add backoff or coordination.
Starvation	01 Engineering Fundamentals#Liveness failures	11 Performance Capacity and Cost#Contention	You can identify unfair scheduling and design bounded waiting.
Cache coherency	01 Engineering Fundamentals#Cache coherency	11 Performance Capacity and Cost#CPU and memory performance	You can connect false sharing, cache lines, and memory visibility to latency.
Algorithmic complexity	03 Data Structures Algorithms and Complexity#Complexity as an engineering tool	11 Performance Capacity and Cost#Capacity planning	You can turn asymptotic complexity into an operational capacity limit.
Storage structures	03 Data Structures Algorithms and Complexity#Storage oriented structures	04 Databases Storage and Transactions#Storage engine mental model	You can choose between B-trees, LSM trees, hash indexes, and log structures by workload.
Advanced databases	04 Databases Storage and Transactions#Advanced databases	03 Data Structures Algorithms and Complexity#Storage oriented structures	You can explain storage, indexing, transactions, replication, and recovery as one system.
MVCC	04 Databases Storage and Transactions#Isolation and correctness	04 Databases Storage and Transactions#Transactions	You can explain snapshot visibility, write skew, vacuum pressure, and anomaly boundaries.
Replication	05 Distributed Systems#Replication	04 Databases Storage and Transactions#Replication and storage correctness	You can choose sync, async, leader, follower, and conflict models based on loss tolerance.
Quorum	05 Distributed Systems#Quorums	04 Databases Storage and Transactions#Distributed databases	You can reason about read and write quorums, failure tolerance, and stale reads.
CAP theorem	05 Distributed Systems#CAP PACELC and failure tradeoffs	04 Databases Storage and Transactions#Distributed databases	You can apply CAP only under partition and avoid using it as a vague slogan.
PACELC	05 Distributed Systems#CAP PACELC and failure tradeoffs	11 Performance Capacity and Cost#Latency and throughput	You can connect normal-case latency choices to failure-case consistency choices.
Consensus algorithms	05 Distributed Systems#Consensus algorithms	08 Reliability Observability and Operations#Control planes	You can explain leader election, log replication, commit, membership, and split brain.
Replicated state machines	05 Distributed Systems#Replicated state machines	02 Architecture and Design#State machines	You can model commands, deterministic application, replay, and recovery.
The clock problem	05 Distributed Systems#Time clocks and ordering	10 Testing Verification and Quality Bars#Formal methods and model checking	You can explain wall clocks, monotonic clocks, logical clocks, and clock skew failure modes.
Idempotency	07 APIs Contracts and Integration#Idempotent APIs	05 Distributed Systems#Idempotency and retries, 12 Delivery Migrations and Release Engineering#Migration safety	You can design duplicate-safe side effects across APIs, jobs, queues, and deploys.
Advanced networking	05 Distributed Systems#Advanced networking	08 Reliability Observability and Operations#Network operations, kubernetes/Kubernetes	You can debug latency, loss, DNS, load balancing, connection pools, and service discovery.
Caching	06 Caching Queues and Streaming#Caching patterns	06 Caching Queues and Streaming#Cache invalidation, 11 Performance Capacity and Cost#Latency and throughput	You can state what is cached, why it is safe, how it expires, and how it is invalidated.
Queueing theory	06 Caching Queues and Streaming#Queueing fundamentals	Littles law and efficient queue strategy, 11 Performance Capacity and Cost#Capacity planning	You can use Little's Law to connect arrival rate, service time, queue depth, and latency.
Streaming	06 Caching Queues and Streaming#Streaming systems	Event-Driven Architectures and Event Sourcing, 07 APIs Contracts and Integration#Event contracts	You can reason about ordering, replay, partitions, offsets, schema evolution, and poison records.
API contracts	07 APIs Contracts and Integration#Contract types	07 APIs Contracts and Integration#Schema evolution	You can evolve producers and consumers independently without breaking compatibility.
Observability	08 Reliability Observability and Operations#Observability pillars	08 Reliability Observability and Operations#Alert quality	You can design telemetry around user-visible symptoms and debuggable causes.
Incident response	08 Reliability Observability and Operations#Incident response	13 Technical Leadership and Execution#Leadership operating rhythm	You can coordinate mitigation, communicate clearly, and produce useful learning.
Threat modeling	09 Security and Supply Chain#Threat modeling	09 Security and Supply Chain#Access boundaries	You can name assets, actors, trust boundaries, abuse cases, and mitigations.
Supply chain security	09 Security and Supply Chain#Supply chain controls	Software Supply Chain Security	You can defend dependency provenance, builds, artifacts, secrets, and CI/CD boundaries.
TLA+	10 Testing Verification and Quality Bars#Formal methods and model checking	05 Distributed Systems#Consensus algorithms, 02 Architecture and Design#State machines	You can model a small state machine and find a counterexample before code exists.
Concurrency testing	10 Testing Verification and Quality Bars#Concurrency testing	01 Engineering Fundamentals#Liveness failures	You can combine stress, schedule control, race detection, and invariant checks.
Performance profiling	11 Performance Capacity and Cost#CPU and memory performance	11 Performance Capacity and Cost#Load testing	You can distinguish CPU, IO, allocation, lock contention, and queueing bottlenecks.
Capacity planning	11 Performance Capacity and Cost#Capacity planning	06 Caching Queues and Streaming#Queueing fundamentals	You can forecast load, saturation, headroom, and degradation behavior.
Migration safety	12 Delivery Migrations and Release Engineering#Migration safety	04 Databases Storage and Transactions#Data migration playbook	You can split schema, code, and data changes into reversible phases.
Rollback and rollforward	12 Delivery Migrations and Release Engineering#Rollback and rollforward	08 Reliability Observability and Operations#Incident response	You can choose when to revert, when to roll forward, and what data cleanup is needed.
GitOps	12 Delivery Migrations and Release Engineering#GitOps operating model	kubernetes/Kubernetes, 08 Reliability Observability and Operations#Control planes	You can keep desired state, live state, drift, and reconciliation clear.
Conway's Law	13 Technical Leadership and Execution#Conways Law	02 Architecture and Design#Architecture and organization	You can connect team topology to module boundaries, ownership, and communication cost.
Technical strategy	13 Technical Leadership and Execution#Technical strategy	00 Staff Principal Software Engineering#The execution loop	You can turn ambiguous technical direction into sequenced bets and decision points.
AI-assisted development	14 AI Native Software Engineering#AI assisted development quality bar	AI-Enhanced Software Development, 10 Testing Verification and Quality Bars#Quality bars	You can raise throughput without lowering review, evidence, or ownership standards.
Retrieval and context	14 AI Native Software Engineering#Retrieval and context	Indexing Large Codebases for AI-Assisted Development, Context-Aware Systems and MCP Protocols	You can design context systems with freshness, relevance, permissions, and auditability.
LLMOps	14 AI Native Software Engineering#LLMOps	LLMOps and Model Deployment, 08 Reliability Observability and Operations#Observability pillars	You can evaluate, deploy, monitor, and roll back model behavior with production discipline.

Cross-domain trails

Use these trails when the question is broader than one note.

Question	Trail
How do I design a reliable stateful service?	02 Architecture and Design#State machines -> 04 Databases Storage and Transactions#Transactions -> 05 Distributed Systems#Replication -> 08 Reliability Observability and Operations#Observability pillars -> 12 Delivery Migrations and Release Engineering#Migration safety
How do I make retries safe?	07 APIs Contracts and Integration#Idempotent APIs -> 05 Distributed Systems#Idempotency and retries -> 06 Caching Queues and Streaming#Poison messages and retries -> 10 Testing Verification and Quality Bars#Quality bars
How do I debug production latency?	11 Performance Capacity and Cost#Latency and throughput -> 06 Caching Queues and Streaming#Queueing fundamentals -> 08 Reliability Observability and Operations#Observability pillars -> 05 Distributed Systems#Advanced networking
How do I ship a risky data change?	04 Databases Storage and Transactions#Data migration playbook -> 12 Delivery Migrations and Release Engineering#Migration safety -> 10 Testing Verification and Quality Bars#Quality bars -> 08 Reliability Observability and Operations#Incident response
How do I evaluate a platform architecture?	00 Staff Principal Software Engineering#System property checklist -> 02 Architecture and Design#Boundaries -> 13 Technical Leadership and Execution#Conways Law -> 09 Security and Supply Chain#Threat modeling
How do I design event-driven systems?	Event-Driven Architectures and Event Sourcing -> 06 Caching Queues and Streaming#Streaming systems -> 07 APIs Contracts and Integration#Event contracts -> 05 Distributed Systems#Time clocks and ordering
How do I review AI-native engineering work?	14 AI Native Software Engineering#AI assisted development quality bar -> Indexing Large Codebases for AI-Assisted Development -> Context-Aware Systems and MCP Protocols -> 10 Testing Verification and Quality Bars#Quality bars

Staff and Principal standard

A senior engineer can implement a feature. A Staff or Principal engineer can reason about the system property the feature changes.

Correctness: the invariant remains true under retries, concurrency, partial failure, deploys, backfills, and repair jobs.
Reliability: users get predictable behavior when dependencies fail, slow down, split brain, or return stale data.
Operability: humans can detect, understand, mitigate, and repair production behavior with bounded confusion.
Evolvability: future product changes do not require unplanned rewrites or unsafe coupling across ownership lines.
Simplicity: the design minimizes concepts, states, owners, and failure modes while still meeting the real requirement.
Verification: tests, simulations, model checks, reviews, and runtime signals are strong enough for the blast radius.
Leadership: the decision improves the technical system and the human system that owns it.

Staff and Principal depth is visible in the questions asked before implementation:

What is the smallest durable invariant?
What is the largest plausible blast radius?
What state can become inconsistent, orphaned, duplicated, stale, or unowned?
What happens if the operation runs twice, runs halfway, runs out of order, or runs during deploy?
Which dependencies are trusted, which are only best effort, and which must fail closed?
What must be observable before rollout, during rollout, after rollback, and after cleanup?
What future change would this design make easier, and what future change would it make harder?

Staff and Principal study path

This path is ordered by dependency, not by difficulty. Move forward when you can use the topic in a real design review.

Rendering diagram...

1. Foundations: reason about local correctness

Read:

Practice:

Explain mutexes, semaphores, atomics, memory ordering, and liveness failures without relying on framework behavior.
Turn a complex function into explicit invariants, state transitions, and failure cases.
Estimate the operational impact of an algorithmic choice under realistic load.

Exit standard:

You can identify when a bug is caused by mutation, concurrency, aliasing, hidden state, or complexity growth.

2. Architecture: design boundaries that survive change

Read:

02 Architecture and Design
07 APIs Contracts and Integration
Event-Driven Architectures and Event Sourcing

Practice:

Draw module boundaries and name the contracts between them.
Model a workflow as a state machine before choosing tables, queues, or APIs.
Write an ADR that states the rejected alternatives and the operating consequences.

Exit standard:

You can explain how a design changes failure modes, team ownership, migration paths, and future options.

3. Data and distributed systems: handle partial failure honestly

Read:

04 Databases Storage and Transactions
05 Distributed Systems
06 Caching Queues and Streaming
Littles law and efficient queue strategy

Practice:

Compare isolation levels through concrete anomaly examples.
Design idempotent APIs and queue consumers that tolerate duplicate delivery.
Explain when to use cache invalidation, leases, quorums, logical clocks, or consensus.

Exit standard:

You can make correctness claims under stale reads, retries, partitions, clock skew, replica lag, and reprocessing.

4. Reliability, security, and verification: prove enough before trust

Read:

08 Reliability Observability and Operations
09 Security and Supply Chain
10 Testing Verification and Quality Bars
Software testing
Software Supply Chain Security

Practice:

Build a test strategy that matches blast radius rather than code volume.
Write a small TLA+ model or state-machine model for a workflow with concurrency or retries.
Create a threat model that covers assets, actors, trust boundaries, abuse paths, and controls.
Design alerts around user-visible symptoms and actionable causes.

Exit standard:

You can say what evidence is sufficient, what evidence is missing, and what residual risk remains.

5. Performance, capacity, and delivery: ship under real constraints

Read:

11 Performance Capacity and Cost
12 Delivery Migrations and Release Engineering
kubernetes/Kubernetes
kubernetes/One-Day Kubernetes Crash Course

Practice:

Build a capacity model using arrival rate, service time, utilization, and queue depth.
Profile before optimizing and separate CPU, IO, allocation, lock, and network bottlenecks.
Plan a reversible database migration with expand, migrate, contract phases.
Define rollout, rollback, observability, and cleanup gates for a production change.

Exit standard:

You can ship changes with measurable safety rather than optimism.

6. Technical leadership: scale judgment through people and systems

Read:

Practice:

Translate ambiguous business pressure into technical strategy and decision points.
Use Conway's Law to reason about ownership, interfaces, and review boundaries.
Run reviews that improve the system without turning every concern into a blocker.

Exit standard:

You can make high-leverage technical decisions legible to engineers, managers, security, operations, and product leaders.

7. AI-native engineering: use AI with production discipline

Read:

14 AI Native Software Engineering
AI-Enhanced Software Development
Indexing Large Codebases for AI-Assisted Development
Context-Aware Systems and MCP Protocols
LLMOps and Model Deployment

Practice:

Define evals before using an AI behavior in a critical workflow.
Treat retrieval and context as governed systems with freshness, relevance, permissions, and auditability.
Review generated code by invariants, tests, threat model, and operational behavior, not by surface plausibility.

Exit standard:

You can use AI to improve throughput while preserving evidence, ownership, and production accountability.

Existing vault anchors

Use these notes as established source nodes instead of duplicating depth in this MOC:

Data Structures/Data Structures
Design Patterns/Design patterns
Event-Driven Architectures and Event Sourcing
Software Engineering glossary
Software testing
Software Supply Chain Security
SWE Review topics
Littles law and efficient queue strategy
kubernetes/Kubernetes
kubernetes/One-Day Kubernetes Crash Course
AI-Enhanced Software Development
Indexing Large Codebases for AI-Assisted Development
Context-Aware Systems and MCP Protocols
LLMOps and Model Deployment

Maintenance rules

Keep this note canonical: every major Software Engineering domain should be reachable from here in one hop.
Keep leaf depth out of this file unless the example improves routing or decision quality.
Preserve wikilinks when renaming or splitting notes.
Add new topics to the coverage matrix only when they are required for Staff or Principal judgment.
Prefer domain notes for intermediate routing and atomic notes for deep worked examples.

[NOTES]

Ordered notes

No. 120 min2 diagrams

Staff Principal Software Engineering

Staff Principal Software Engineering This note defines the operating model for senior individual contributor engineering at Staff and Principal scope. The rest of the folder breaks this model into specific disciplines....

No. 219 min3 diagrams

Engineering Fundamentals

Engineering Fundamentals Engineering fundamentals are the ideas that let you predict system behavior below the framework level. They connect source code to runtime behavior: state ownership, memory layout,...

No. 325 min14 diagrams

Architecture and Design

Architecture and Design Architecture is the set of hard to change decisions that shape a system's behavior, constraints, economics, and ability to evolve. It is not only diagrams, frameworks, or service counts. It is...

No. 420 min

Data Structures Algorithms and Complexity

Data Structures Algorithms and Complexity This note connects algorithmic fundamentals to production engineering decisions. In production systems, algorithms are not only interview exercises. They shape latency, memory...

No. 525 min11 diagrams

Databases Storage and Transactions

Databases Storage and Transactions Databases are correctness systems, not only persistence tools. A database is a contract between application invariants, storage media, concurrency control, recovery logic, and...

No. 629 min12 diagrams

Distributed Systems

Distributed Systems Distributed systems are systems where independent components communicate over unreliable networks and fail independently. Their central difficulty is not scale by itself. It is the combination of...

No. 734 min7 diagrams

Caching Queues and Streaming

Caching Queues and Streaming Caches, queues, and streams are coordination tools. They move work across time, space, and process boundaries. They improve latency, cost, throughput, and resilience, but they also create...

No. 823 min6 diagrams

APIs Contracts and Integration

APIs Contracts and Integration APIs are long lived contracts. Integration quality determines how safely systems can evolve, how quickly teams can ship, and how much production risk appears at service boundaries. A good...

No. 937 min5 diagrams

Reliability Observability and Operations

Reliability Observability and Operations Reliability is a product property. Operations are the feedback loop that keeps reliability real. A system is reliable when users can complete the work they came to do, within a...

No. 1020 min5 diagrams

Security and Supply Chain

Security and Supply Chain Security engineering is the disciplined reduction of exploitable risk under adversarial conditions. Supply chain security extends that discipline across the path from source code to running...

No. 1121 min4 diagrams

Testing Verification and Quality Bars

Testing Verification and Quality Bars Testing is not only bug detection. It is evidence for system properties: correctness, compatibility, resilience, performance, security, operability, and maintainability. A good...

No. 1223 min2 diagrams

Performance Capacity and Cost

Performance Capacity and Cost Performance engineering is the discipline of predicting, measuring, and controlling how a system consumes scarce resources while serving real demand. Capacity engineering asks whether the...

No. 1322 min6 diagrams

Delivery Migrations and Release Engineering

Delivery Migrations and Release Engineering High quality software depends on safe change, not only good design. Release engineering is the discipline of turning code, configuration, database changes, infrastructure...

No. 1422 min6 diagrams

Technical Leadership and Execution

Technical Leadership and Execution Technical leadership converts judgment into repeatable organizational capability. It is not the act of making every hard decision personally. It is the work of shaping strategy,...

No. 1521 min9 diagrams

AI Native Software Engineering

AI Native Software Engineering AI native software engineering applies normal engineering rigor to systems where language models assist, decide, retrieve, generate, test, review, operate, or act through tools. The...

No. 1721 min2 diagrams

Software Engineering

Software Engineering This is the canonical entry point for the Software Engineering knowledge base. Use it to move from broad system judgment to focused topic notes without losing the whole system context. The goal is...