Architecture and Design

Reading time
25 min read
Word count
4974 words
Diagram count
14 diagrams

Source: Victor Bona's Obsidian Compendium snapshot, Knowledge base/Software Engineering/02 Architecture and Design.md.

Architecture and Design

Architecture is the set of hard-to-change decisions that shape a system's behavior, constraints, economics, and ability to evolve. It is not only diagrams, frameworks, or service counts. It is the structure of ownership, state, dependencies, data movement, runtime topology, delivery paths, and operational feedback.

A design is architectural when changing it later would require migration, coordination, retraining, downtime, contract changes, data reshaping, organizational change, or a long compatibility period.

Core thesis

Good architecture makes important things explicit:

  • Invariants: facts the system must preserve.
  • Boundaries: places where knowledge, authority, and change are contained.
  • Dependencies: what can know about what.
  • State ownership: where truth lives and who is allowed to mutate it.
  • Failure modes: what breaks, how far it spreads, and how recovery works.
  • Evolution paths: how the system can change without freezing delivery.
  • Organizational fit: whether teams can operate the architecture they are given.

Architecture is successful when local changes stay local, irreversible decisions are rare, hidden coupling is surfaced early, and system behavior remains understandable under stress.

Hard-to-change decisions

Not every design decision deserves architectural weight. A decision becomes architectural when it changes the cost curve of future work.

Decision areaWhy it is hard to changeTypical consequencesQuestions before deciding
Data modelData accumulates, clients depend on shape, migrations are risky.Long-lived schema compatibility, backfills, reporting coupling.What facts are authoritative? What is derived?
Service boundariesBoundaries create ownership, network calls, contracts, and deployment units.Coordination cost, latency, operational surface, release coupling.Does the boundary match a business capability?
Tenancy modelTenant isolation affects storage, authorization, billing, limits, and operations.Security posture, migration complexity, noisy-neighbor controls.What must be isolated by policy, performance, or law?
Authorization modelAccess rules spread through APIs, UI, data, jobs, and audit trails.Privilege bugs, audit failures, retrofitting policy engines.Where is policy evaluated and enforced?
Integration styleSynchronous APIs, events, queues, and files create different failure semantics.Retries, ordering, idempotency, observability, user experience.What must happen immediately and what can be eventual?
Consistency modelStrong consistency and eventual consistency imply different user and repair flows.Locking, reconciliation, conflict handling, support workflows.Which invariants need strict consistency?
Deployment modelRuntime topology affects releases, rollbacks, secrets, routing, and debugging.Release coordination, blast radius, operational skill requirements.Can teams deploy and recover independently?
Build and dependency modelPackage structure and dependency direction shape every change.Slow builds, accidental coupling, difficult refactors.Which modules are stable policy and which are replaceable detail?
Observability modelWhat is measured determines what can be operated.Unknown failures, slow incident response, weak SLOs.What signal proves the design is healthy?

Use deliberate friction for these decisions: ADRs, design review, prototypes, migration plans, and fitness functions.

Architecture goals

  • Make the important things explicit: invariants, boundaries, dependencies, owners, failure modes.
  • Keep local changes local.
  • Put volatility behind stable contracts.
  • Keep state ownership clear.
  • Make failure containment intentional.
  • Let teams ship independently without creating accidental distributed transactions.
  • Preserve a path to migrate away from wrong choices.
  • Make runtime behavior observable enough to debug with evidence.
  • Prefer reversible decisions for volatile areas and stable contracts for slow-changing areas.

Architectural forces

Architecture is tradeoff management. The same choice can be correct in one context and harmful in another.

ForcePulls towardWatch for
Delivery speedFewer boundaries, simpler deployment, shared process.Hidden coupling and unclear ownership.
Team autonomyExplicit module or service boundaries.Premature distribution and duplicated platform work.
ReliabilityIsolation, backpressure, retries, graceful degradation.Complexity that operators cannot reason about.
Product discoveryFlexible domain model, reversible decisions.Overfitting to today's workflow.
ComplianceStrong audit, policy enforcement, data classification.Policy scattered across code paths.
ScalePartitioning, asynchronous work, specialized storage.Solving hypothetical scale before actual bottlenecks.
CostConsolidation, shared infrastructure, right-sized resources.Underprovisioning critical paths or losing isolation.
MaintainabilityCohesion, stable APIs, small change sets.Layering that hides domain behavior.

Code architecture

Core ideas:

  • Cohesion: things that change together live together.
  • Coupling: dependencies are explicit, minimal, and pointed in the right direction.
  • Encapsulation: consumers rely on contracts, not internals.
  • Composition: small parts combine without hidden global state.
  • Dependency inversion: high-level policy does not depend on low-level details.
  • Ports and adapters: domain logic is separated from infrastructure.
  • Functional core, imperative shell: pure decisions inside, effects at the boundary.
  • Explicit state transitions: lifecycle rules are first-class code, not scattered conditionals.
  • Side-effect isolation: writes, network calls, queues, emails, and clocks are pushed to edges.

Existing anchors:

Dependency direction

Good dependency direction protects business rules from volatile details.

Rendering diagram...

In this shape, application services orchestrate use cases, domain code owns rules, ports describe required capabilities, and adapters implement those capabilities. The domain should not import controllers, ORMs, queue clients, HTTP clients, or framework-specific request objects.

LayerOwnsShould depend onShould not depend on
DomainEntities, value objects, invariants, policies, state transitions.Language primitives, domain types.Database, HTTP, UI, queues, framework lifecycle.
ApplicationUse cases, transactions, orchestration, authorization calls.Domain, ports.Concrete infrastructure internals.
InterfaceControllers, resolvers, CLI commands, serializers.Application services, DTOs.Database tables as API contracts.
InfrastructureRepositories, clients, queue publishers, telemetry, storage.Ports, external SDKs.UI decisions or domain shortcuts.

Code architecture checklist

  • Domain rules can be tested without starting infrastructure.
  • Public module APIs are smaller than internal implementation.
  • Imports do not bypass module boundaries.
  • Database schema is not the only domain model.
  • Side effects are explicit and injectable.
  • Time, randomness, identity generation, and external calls are controlled at boundaries.
  • Use cases have clear transaction boundaries.
  • Error types distinguish validation, conflict, authorization, dependency failure, and unknown failure.
  • Cross-cutting concerns are centralized without hiding business decisions.

Modularity

Modularity is the ability to change one part without understanding or redeploying everything. It is not achieved by folders alone. A module needs a purpose, a public contract, private internals, tests around behavior, and dependency rules.

Module qualityStrong signalWeak signal
CohesionModule contains a business capability or stable technical abstraction.Module is a grab bag of helpers.
EncapsulationConsumers import from one public entry point.Consumers deep-import internal files.
ReplaceabilityImplementation can change behind a contract.Every caller knows storage details.
TestabilityBehavior is tested through public API.Tests assert private structure.
OwnershipA team or maintainer can reason about the module end to end.Ownership is split by technical layer only.

Modular design tactics

  • Define public entry points and block private imports.
  • Keep DTOs at boundaries and domain types inside modules.
  • Prefer explicit events or commands over shared mutable objects.
  • Separate stable policies from replaceable infrastructure.
  • Make dependencies flow inward toward domain policy.
  • Use package or workspace boundaries when social pressure makes folder boundaries too weak.
  • Add contract tests for modules used by many consumers.

Boundaries

A boundary is a line that restricts knowledge. Good boundaries are not only about separation. They define who owns decisions.

BoundaryGood signBad sign
ModuleClear public API, hidden internals.Consumers import private internals.
ServiceOwns data and behavior together.Other services write its database.
TeamOwns a business capability.Many teams must coordinate for simple changes.
APIStable contract, versioning strategy.Clients depend on undocumented behavior.
EventFact in domain language.Event is a database row dump.
TransactionInvariant protected in one consistency boundary.Distributed writes are assumed to be atomic.
DeploymentIndependent release and rollback.One deploy requires synchronized changes everywhere.
SecurityPolicy enforced where authority exists.Authorization is left to UI or caller discipline.

Boundary design questions

  • What language is used on each side of the boundary?
  • Which side owns the invariant?
  • Which side owns persistence?
  • Is the boundary synchronous, asynchronous, or both?
  • Can either side evolve without a flag day?
  • What is the failure behavior if the other side is unavailable?
  • Is the boundary aligned with team ownership?
  • Does the boundary reduce coordination or just move it to runtime?

Domain modeling

Domain modeling turns business concepts into explicit software concepts. The goal is not to mirror every noun in the business. The goal is to capture the rules, decisions, lifecycles, and language that make the system correct.

ConceptUse whenExample
EntityIdentity matters across time.Account, subscription, order, deployment.
Value objectEquality is based on value and invariants.Money, date range, email address, resource limit.
AggregateA consistency boundary protects related invariants.Order with line items, account with quota allocation.
Domain serviceRule spans multiple entities but is still domain logic.Pricing eligibility, risk scoring, scheduling policy.
RepositoryDomain-oriented collection abstraction.Load account by id, save subscription aggregate.
PolicyNamed decision rule that can vary.Upgrade eligibility, retry policy, fraud threshold.
Domain eventA fact that happened in domain language.PaymentCaptured, WorkspaceProvisioned, QuotaExceeded.

Modeling heuristics

  • Model behaviors before storage tables.
  • Name concepts with the language used by domain experts.
  • Capture invalid states as impossible where the language permits.
  • Put invariants near the data they protect.
  • Avoid anemic models when behavior is scattered across services.
  • Avoid overactive entities when orchestration belongs in application services.
  • Treat reporting shapes and API shapes as projections, not necessarily the core model.
  • Create explicit types for money, identifiers, tenant scope, limits, permissions, and lifecycle state.

Example aggregate boundary

Rendering diagram...

The aggregate owns rules that must be true immediately. Projections can lag if the product can tolerate eventual consistency and repair flows exist.

State machines

State machines make lifecycle behavior explicit. They are essential for orders, payments, workflows, deployments, jobs, retries, provisioning, entitlement changes, and incident response.

Good state machine design:

  • States are named business facts, not UI steps.
  • Transitions are explicit and guarded.
  • Illegal transitions are rejected.
  • Side effects are attached to transitions through reliable mechanisms.
  • Retry behavior is defined per transition.
  • Terminal states are clear.
  • Observability includes state age, transition count, and stuck states.
  • Every state has an owner for support and operations.

Example:

Rendering diagram...

State machine table

TransitionGuardSide effectFailure handlingSignal
Requested to ValidatingRequest is well formed.Reserve idempotency key.Reject duplicate command or return previous result.Validation started count.
Validating to AcceptedPolicy passes.Persist accepted request.Emit validation failure reason.Acceptance rate.
Accepted to ProvisioningCapacity is available.Enqueue provisioning job.Stay accepted and retry scheduling.Queue age.
Provisioning to ActiveRequired resources exist.Publish activation event.Reconcile actual resources against desired state.Time to active.
Provisioning to FailedRetry budget exhausted or fatal error.Record reason and notify owner.Allow explicit retry or cancellation.Failed by reason.
Active to SuspendedSuspension policy applies.Revoke access or pause workload.Retry access update with compensation.Suspended count.
Active to DeletedDeletion authorized.Start cleanup workflow.Tombstone and continue cleanup asynchronously.Deletion lag.

State machine checklist

  • All states are documented with meaning and ownership.
  • All transitions are named and guarded.
  • Repeated commands are idempotent.
  • Side effects can be retried safely.
  • Terminal states are not accidentally mutable.
  • Reconciliation can repair drift between desired and actual state.
  • Metrics expose stuck state age and transition failure rates.
  • Support tooling can explain why an object is in its current state.

Event storming

Event storming is a collaborative modeling technique for discovering domain behavior, boundaries, commands, policies, actors, external systems, and pain points. It is most useful before service boundaries are fixed.

The core object is a domain event: something meaningful that happened in the business.

Event storming artifactMeaningExample
Domain eventA fact that happened.InvoicePaid, DeploymentFailed, UserInvited.
CommandAn intent to change the system.PayInvoice, StartDeployment, InviteUser.
ActorPerson or system initiating a command.Customer, support agent, scheduler.
PolicyRule that reacts to events or permits commands.If payment fails, retry after delay.
AggregateConsistency boundary that handles commands.Invoice, deployment, workspace.
External systemSystem outside the model boundary.Payment processor, email provider, identity provider.
Read modelProjection optimized for a query.Invoice dashboard, deployment timeline.
Hot spotUncertainty, conflict, risk, or missing policy.Refund rules differ by region.

Event storming flow

Rendering diagram...

Event storming checklist

  • Events are named in past tense and domain language.
  • Commands are named as intentions, not transport messages.
  • Policies explain why follow-up work happens.
  • Hot spots are recorded as design risks.
  • Aggregates are discovered from consistency needs, not table names.
  • External systems are marked where authority leaves the system.
  • Read models are separated from write-side invariants.
  • The resulting model informs module and service boundaries.

C4 thinking

C4 is a way to describe architecture at different zoom levels: context, containers, components, and code. Its value is disciplined perspective, not diagram ceremony.

LevelPurposeAudienceAnswers
ContextSystem in its environment.Product, leadership, security, adjacent teams.Who uses it? What systems does it interact with?
ContainerMajor deployable or runtime units.Engineers, operators, platform teams.What runs where? How do parts communicate?
ComponentInternal structure of a container.Engineers owning the code.What modules collaborate inside this runtime?
CodeClasses, functions, packages, schemas.Implementers and reviewers.How is behavior represented?

Context diagram

Rendering diagram...

Container diagram

Rendering diagram...

Component diagram

Rendering diagram...

C4 diagrams should be paired with text that explains decisions, not just labels. A useful diagram states boundaries, protocols, ownership, and trust levels.

Architecture decision records

An ADR records a decision and the reasoning that made it correct at the time. It is not a status report and not a design manifesto. It is a durable explanation for future maintainers.

An ADR should capture:

  • Context.
  • Decision.
  • Alternatives considered.
  • Consequences.
  • Invariants affected.
  • Rollback or migration path.
  • Owners.
  • Review date.

Use ADRs for choices that alter long-term system properties: database selection, messaging strategy, cache consistency, tenancy model, authorization model, deployment model, API compatibility, service extraction, event schema policy, or build system layout.

ADR template

# ADR: <decision title>

## Status
Accepted

## Context
What forces, constraints, incidents, product goals, and existing system facts matter?

## Decision
What are we choosing?

## Alternatives considered
What else was plausible, and why did we reject it?

## Consequences
What gets easier, harder, riskier, cheaper, or more expensive?

## Invariants affected
Which correctness rules depend on this decision?

## Migration or rollback
How can we reverse, replace, or phase this decision later?

## Owners and review
Who owns the decision, and when should it be revisited?

ADR quality checklist

  • The decision is concrete enough that reviewers can tell what changed.
  • Alternatives are real options, not strawmen.
  • Consequences include negative tradeoffs.
  • Migration and rollback are addressed honestly.
  • The decision names affected teams and operational owners.
  • The ADR can be understood without private meeting context.
  • The ADR links to diagrams, incidents, benchmarks, or prototypes when relevant.

Service boundaries

Service boundaries should be drawn around business capabilities, ownership, data authority, and independent change. A service is not just a process. It is a contract, an operational responsibility, a data owner, and a source of failure modes.

Good service boundary signals

  • The service owns a coherent business capability.
  • It owns its data and exposes behavior through APIs or events.
  • Most changes inside the service do not require changes elsewhere.
  • The owning team can deploy, observe, and recover it.
  • Calls into the service use domain language.
  • Cross-service workflows are explicit and resilient.

Weak service boundary signals

  • The service is a CRUD wrapper around another service's database.
  • Multiple services must update in lockstep for common features.
  • The service boundary follows technical layers instead of business capability.
  • Clients need to know internal tables, states, or retry behavior.
  • Operational ownership is unclear.
  • The system relies on distributed transactions without designing for them.

Boundary decision table

QuestionBias toward same moduleBias toward separate service
Do changes usually happen together?Yes.No.
Is one consistency boundary required?Yes.No, eventual consistency is acceptable.
Does one team own the capability?Yes.Separate teams need autonomy.
Is latency critical and chatty?Yes.No, coarse-grained calls are enough.
Are scaling needs different?No.Yes, workload shape is independent.
Are data security rules different?No.Yes, isolation is required.
Can the organization operate another service?No.Yes, with clear ownership.

Monolith, modular monolith, and microservices

The architecture question is not "monolith or microservices." The real question is where to put boundaries so the system can evolve at acceptable cost.

StyleDescriptionBest whenRisks
MonolithOne deployable with weak or informal internal boundaries.Small team, early product discovery, simple operations.Boundary erosion, slow builds, broad regression risk.
Modular monolithOne deployable with strong internal modules and explicit dependency rules.Domain is still evolving but team needs maintainability.Requires discipline and tooling to enforce boundaries.
MicroservicesMultiple independently deployable services with owned data and contracts.Teams need autonomy, capabilities are mature, operational maturity exists.Distributed failure, observability burden, data consistency complexity.

Migration path

Rendering diagram...

The modular monolith is often the best intermediate architecture because it builds the habits needed for services without forcing network, deployment, and data distribution costs too early.

Extraction checklist

  • The candidate module already has a clear public API.
  • Its data ownership is clear.
  • Cross-module calls are coarse-grained.
  • The module has contract tests.
  • Operational ownership is assigned.
  • Observability exists before extraction.
  • Failure behavior is designed for network calls.
  • Migration can run old and new paths in parallel.
  • Rollback does not require data loss.

Conway's Law as architecture input

Conway's Law says system design mirrors communication structure. It is not trivia. It is an architecture constraint.

Implications:

  • If one service needs five teams to change it, the architecture encodes five-team coupling.
  • If a platform team owns shared infrastructure without product feedback loops, the platform can become a bottleneck.
  • If domain boundaries are unclear in the organization, service boundaries become unclear in code.
  • If operational ownership is split from development ownership, reliability feedback is delayed.
  • If teams are organized by technical layer, the system tends to grow layer-oriented handoffs.

Use Conway's Law deliberately:

  • Align service ownership with team ownership.
  • Align API boundaries with communication boundaries.
  • Create platform abstractions only when product teams can consume them independently.
  • Avoid architectures that require coordination patterns the organization cannot execute.
  • Treat team topology as a design input, not an afterthought.
  • Change organization and architecture together when either one blocks the other.

Organization and architecture mapping

Organization shapeArchitecture pressureCommon failureCorrective move
Frontend, backend, database teamsLayered handoffs.Simple features require many queues.Form stream-aligned product teams.
One platform team for all infrastructureCentralized platform dependency.Product teams wait for platform changes.Provide self-service paved roads.
Many autonomous teams, no shared standardsFragmented services and tools.High operational variance.Create lightweight platform contracts.
Product teams without operational ownershipReliability feedback delayed.Incidents repeat.Put build-run accountability with owners.
Shared service with unclear ownerEveryone depends, nobody improves.Slow change and brittle integrations.Assign owner or split capability.

Distributed design

Once work crosses process or service boundaries, local assumptions stop being safe. Networks fail, messages duplicate, clocks drift, events arrive late, and partial success becomes normal.

Distributed systems checklist

  • Every command has an idempotency key or equivalent deduplication.
  • External side effects can be retried without duplicate harm.
  • Timeouts are shorter than user patience and longer than normal latency.
  • Retries use backoff and stop conditions.
  • Queues have dead-letter or parking behavior.
  • Consumers tolerate duplicate, delayed, and out-of-order events.
  • Events have schema versioning and compatibility rules.
  • Sagas or process managers own multi-step workflows.
  • Reconciliation jobs repair drift between systems.
  • Traces correlate work across services.

Synchronous vs asynchronous communication

CommunicationUse forAvoid whenDesign concerns
Synchronous requestImmediate decisions, user-visible validation, simple reads.Caller cannot tolerate dependency failure.Latency, timeout, cascading failure, versioning.
Asynchronous commandWork can be queued and retried.User needs immediate final result.Idempotency, queue age, retry policy.
Domain eventOther components need to react to facts.Publisher expects a specific receiver action.Schema evolution, ordering, consumer isolation.
Batch or fileLarge transfer, reconciliation, legacy integration.Low-latency interaction.Completeness, replay, audit, partial processing.

Event-driven architecture

Events are facts, not remote procedure calls in disguise. An event-driven design works when producers publish stable domain facts and consumers own their reactions.

Good event names:

  • PaymentCaptured
  • InvoiceVoided
  • WorkspaceProvisioned
  • DeploymentFailed
  • SubscriptionPlanChanged

Weak event names:

  • UpdateHappened
  • DatabaseRowChanged
  • SendEmailNow
  • UserServiceCallback

Event design checklist

  • Event name is past tense.
  • Payload contains enough information for consumers without leaking internals.
  • Event schema has versioning rules.
  • Publisher does not assume which consumers exist.
  • Consumers are idempotent.
  • Event publishing is tied reliably to state changes, often with Design Patterns/Outbox Pattern.
  • Replay behavior is defined.
  • Sensitive fields are minimized and classified.

Related patterns:

Data architecture

Data decisions outlive code decisions. Architecture must distinguish writes, reads, projections, analytics, caches, search indexes, and integration copies.

Data typeMeaningArchitectural concern
Authoritative stateSource of truth for a fact.Ownership, consistency, backups, migrations.
Derived stateRecomputable from authoritative state.Rebuild path, lag, invalidation.
CachePerformance copy.Expiration, stampede control, correctness boundaries.
ProjectionQuery-optimized view.Rebuilds, schema drift, lag visibility.
Audit logEvidence of decisions or changes.Immutability, retention, access controls.
Event logOrdered record of facts.Replay, versioning, consumer compatibility.
Analytics dataAggregated or denormalized reporting data.Freshness, privacy, lineage.

Data ownership rules

  • One component should own writes for a given authoritative fact.
  • Other components should read through contracts, projections, or replicated views.
  • Shared databases are acceptable only when module boundaries and ownership are still enforced.
  • Cross-service joins should be treated as coupling and designed deliberately.
  • Derived data must have a rebuild or reconciliation strategy.
  • Schema migrations need compatibility windows when multiple versions run at once.

Architecture fitness functions

A fitness function is an executable or reviewable check that tells whether the architecture still satisfies an intended property. It turns architecture from a slide into a guardrail.

Fitness functionProtectsExample
Dependency rule testsModule boundaries.Domain cannot import infrastructure.
Contract testsAPI compatibility.Consumer expectations run against provider build.
Migration testsData evolution.New schema accepts old and new app versions.
Latency budgetsUser experience and capacity.P95 checkout API under 300 ms.
Resilience testsFailure containment.Payment provider timeout does not break browsing.
Security policy testsAuthorization model.Cross-tenant reads are rejected.
Cost checksRuntime economics.Background workers stay within budget envelope.
Observability checksOperability.Every workflow emits trace id, state, and result.
Ownership checksSocio-technical fit.Every service has runbook, owner, SLO, escalation path.

Example fitness function diagram

Rendering diagram...

Fitness functions should be cheap enough to run often and strict enough to catch architectural drift before it becomes expensive.

API architecture

APIs are product and architecture boundaries. They should represent stable capabilities, not expose internal storage.

API design checklist

  • Resource or command names use domain language.
  • Requests carry idempotency keys for unsafe operations.
  • Errors are stable and actionable.
  • Authorization is enforced server-side.
  • Pagination, filtering, and sorting are explicit.
  • Versioning and deprecation policy are documented.
  • Backward compatibility is tested.
  • Internal fields are not leaked accidentally.
  • Rate limits and abuse controls are part of the contract.
  • Observability links API calls to domain outcomes.

API compatibility table

ChangeUsually compatibleUsually breaking
Add optional response fieldYes.If clients reject unknown fields.
Add required request fieldNo.Breaks existing clients.
Rename enum valueNo.Breaks parsing and behavior.
Add enum valueMaybe.Breaks exhaustive clients.
Change error codeNo.Breaks client handling.
Add endpointYes.Rarely breaking.
Change field meaningNo.Worst kind of silent break.

Security architecture

Security architecture should be structural. It should not rely on every caller remembering every rule.

Key design points:

  • Identity: who is making the request?
  • Authentication: how is identity proven?
  • Authorization: what is this identity allowed to do?
  • Tenancy: what scope limits access?
  • Auditability: what decision was made and why?
  • Secrets: where credentials live and how they rotate?
  • Trust boundaries: where untrusted data enters the system?
  • Data classification: what data requires special handling?

Trust boundary diagram

Rendering diagram...

Every arrow crossing a trust boundary needs validation, authentication or authorization, logging, and failure behavior.

Operational architecture

Architecture includes how the system runs. A design that cannot be observed, deployed, backed up, restored, or debugged is incomplete.

Operational concernArchitecture question
DeployabilityCan each part be deployed and rolled back safely?
ObservabilityCan operators see requests, jobs, events, state transitions, and dependency health?
Incident responseIs there a runbook for expected failures?
CapacityWhat saturates first, and how is it detected?
BackpressureWhat happens when downstream systems slow down?
RecoveryCan authoritative data be restored and derived data rebuilt?
DriftCan desired and actual state be compared and reconciled?
SupportCan support explain a user's state without database spelunking?

Operability checklist

  • Each critical workflow has logs, metrics, and traces.
  • Dashboards show user impact, not only infrastructure health.
  • Alerts map to action and ownership.
  • Runbooks include diagnosis, mitigation, and rollback.
  • Backups are tested with restore drills.
  • Queues expose depth, age, retries, and dead-letter counts.
  • Deployments include health checks and rollback criteria.
  • Feature flags have owners and removal dates.
  • Manual repair paths are audited.

Architecture examples

Example: payment flow with outbox and worker

Rendering diagram...

Architectural properties:

  • The user request does not depend on every downstream consumer.
  • The outbox ties event publication to database state.
  • Idempotency protects retries.
  • The worker can be monitored and replayed.
  • Payment state transitions remain explicit.

Example: modular monolith with bounded contexts

Rendering diagram...

This design keeps deployment simple while still making business boundaries explicit. If a module later becomes a service, the contract and ownership already exist.

Example: poor boundary

Rendering diagram...

The database becomes the real integration contract. This may be acceptable inside a disciplined modular monolith, but it is dangerous when labeled as independent services because ownership and invariants become unclear.

Example: better boundary

Rendering diagram...

Here each service owns its writes. Other consumers use APIs, events, or projections with explicit contracts.

Design review questions

Invariants and domain

  • What are the main invariants?
  • Which component owns each invariant?
  • What invalid states are currently possible?
  • Which concepts are domain facts and which are UI or persistence details?
  • What state is authoritative and what state is derived?
  • What lifecycle needs a state machine?

Boundaries and dependencies

  • What are the boundaries and what does each boundary hide?
  • What crosses each boundary: commands, queries, events, files, shared tables?
  • Which dependencies point toward stable policy?
  • Which dependencies point toward volatile details?
  • Where are private internals being imported?
  • What happens if the boundary moves later?

Data and consistency

  • Which writes must be atomic?
  • Which updates can be eventual?
  • How are conflicts detected and resolved?
  • How are derived views rebuilt?
  • What is the migration strategy for schema and event changes?
  • What data must be retained, deleted, encrypted, or audited?

Failure and operations

  • What happens if the same command arrives twice?
  • What happens if side effects partially complete?
  • What is synchronous and what is asynchronous?
  • What is the retry, timeout, and compensation strategy?
  • What is the rollback strategy?
  • What operational signals prove the design works?
  • What manual repair path exists, and is it audited?

Organization

  • Who owns the design after launch?
  • Which teams must coordinate to change it?
  • Does the architecture match team communication paths?
  • Can the organization operate the runtime topology?
  • What new skills, runbooks, or support tools are required?

Architecture smells

SmellWhat it suggestsPossible response
Every feature touches many modules.Boundaries do not match change patterns.Revisit domain boundaries and ownership.
Services share a database casually.Data authority is unclear.Define write ownership and access contracts.
Business rules live in controllers.Domain logic is coupled to transport.Move rules into domain or application layer.
Events mirror tables.Events are leaking persistence internals.Rename events around domain facts.
Many small synchronous calls for one page.Chatty distributed design.Add coarse APIs or read models.
Retries cause duplicate work.Idempotency is missing.Add command ids, dedupe, and safe side effects.
Teams need meetings for simple changes.Architecture encodes coordination overhead.Align modules and services to team ownership.
Diagrams disagree with code.Architecture documentation is stale.Add fitness functions and update docs in review.
Shared utilities contain domain rules.Hidden coupling through convenience abstractions.Move rules to owning domain modules.
Feature flags never retire.Temporary architecture became permanent.Track owners, expiry, and removal work.

Architecture review checklist

  • The decision is actually architectural and deserves review.
  • The problem is stated before the solution.
  • Invariants and ownership are explicit.
  • Boundaries map to domain and team realities.
  • Data authority is clear.
  • Dependency direction is intentional.
  • Failure behavior is designed, not assumed.
  • Security and tenancy are structural.
  • Observability proves the design works.
  • Migration and rollback paths are credible.
  • Tradeoffs are documented in an ADR.
  • Fitness functions protect the decision from drift.

Practical architecture workflow

  1. Describe the business capability and the invariant it protects.
  2. Identify the current and future sources of volatility.
  3. Model the domain with events, commands, policies, and aggregates.
  4. Draw the C4 context and container views.
  5. Choose module or service boundaries based on ownership and consistency.
  6. Define dependency direction and public contracts.
  7. Design state machines for lifecycle-heavy concepts.
  8. Decide synchronous and asynchronous communication explicitly.
  9. Define data ownership, migrations, and rebuild paths.
  10. Add observability, runbooks, and fitness functions.
  11. Record the decision in an ADR.
  12. Review after real usage and incidents.