Technical Leadership and Execution

Reading time
22 min read
Word count
4397 words
Diagram count
6 diagrams

Source: Victor Bona's Obsidian Compendium snapshot, Knowledge base/Software Engineering/13 Technical Leadership and Execution.md.

Technical Leadership and Execution

Technical leadership converts judgment into repeatable organizational capability. It is not the act of making every hard decision personally. It is the work of shaping strategy, standards, architecture, execution systems, and communication so that many engineers can make better decisions without waiting for permission.

The test of technical leadership is whether the organization becomes more capable after the leader leaves the room.

Executive model

Technical leadership operates across five connected systems:

SystemLeadership questionPrimary outputsFailure mode
StrategyWhat technical direction makes the business more likely to win?Technical strategy, bets, sequencing, investment themesLocal optimization and disconnected initiatives
ArchitectureWhat constraints make good outcomes easier and bad outcomes harder?Boundaries, standards, platform contracts, governanceDiagrams with no enforcement or adoption
ExecutionHow does work move from intent to reliable production change?Operating rhythm, planning cadence, dependency management, readiness checksBusy teams with weak delivery truth
QualityWhat level of evidence is required before we trust a change?Review systems, test strategy, quality bars, incident learningReview theater or hero-based quality
PeopleHow do engineers learn judgment and increase ownership?Mentoring, standards, delegation, staff-plus leveragePrincipal bottlenecks and fragile expertise
Rendering diagram...

Conway's Law

Conway's Law: organizations design systems that mirror their communication structures.

Use Conway's Law as a design input, not as a complaint after the architecture becomes painful.

Organizational realityArchitectural implicationLeadership response
Two teams have weak coordinationAvoid a hot path that requires frequent cross-team changesCreate clearer ownership, an API contract, or a platform boundary
One domain has one accountable teamGive that team service, data, and runtime ownershipAlign roadmap, support model, and incident accountability
A shared component has many consumersTreat it as a product, not a library someone happens to ownDefine SLOs, versioning, docs, adoption path, and deprecation rules
Product teams need speed but share infrastructureBuild paved roads with escape hatchesMake the default easy and the exception explicit
Many teams touch the same schemaData ownership is unclearAssign domain ownership, migration rules, and compatibility guarantees
Operations and product ownership are separatedIncidents become handoffsReconnect on-call, readiness, and service ownership

Reverse Conway maneuver

The reverse Conway maneuver intentionally changes team boundaries so the desired architecture becomes easier to build.

Rendering diagram...

Use a reverse Conway maneuver when:

  • The target architecture requires a stronger boundary than the current org structure supports.
  • A platform capability needs product management, documentation, and support, not only code ownership.
  • A domain is split across too many teams for coherent data and behavior decisions.
  • A shared service has become a coordination sink.

Avoid it when:

  • The problem is only missing documentation or an unclear API.
  • Teams are already overloaded and the reorg would reduce delivery capacity.
  • The architecture is not yet understood well enough to justify moving people.
  • The proposed team boundary optimizes leadership charts instead of operational ownership.

Conway diagnostic checklist

  • Does every critical runtime path have one accountable owning team?
  • Does every shared service have an explicit consumer contract?
  • Can a team change its owned service without negotiating with unrelated teams?
  • Are data ownership, schema migration authority, and backfill responsibility clear?
  • Do incident responders have authority to change the systems they operate?
  • Are platform teams measured by adoption, reliability, and developer experience?
  • Are cross-team dependencies visible before planning commitments are made?
  • Are escalation paths explicit for boundary disputes?

Technical strategy

A technical strategy is a set of choices that concentrates engineering effort on the constraints that matter most. It is not a list of all desirable improvements.

Strategy should explain:

  • The current technical reality.
  • The desired future state.
  • The constraints that shape the path.
  • The explicit bets being made.
  • The choices intentionally not being made.
  • Which decisions are reversible and which are expensive to reverse.
  • The sequence of investments.
  • The risks and kill criteria.
  • The metrics used to learn.
  • The owners and decision checkpoints.

Strategy contents

SectionPurposeGood signalWeak signal
ContextEstablish why strategy is needed nowLinks engineering work to business, reliability, cost, security, or speed constraintsGeneric modernization language
Current stateMake reality inspectableNames known bottlenecks, incidents, costs, and frictionLists preferences without evidence
Desired stateDefine the target operating modelDescribes capabilities and constraints, not only technologiesVendor or framework shopping
BetsConcentrate investmentStates what must be true for the bet to pay offMany safe statements nobody can argue with
Non-goalsProtect focusNames tempting work that is out of scopeEverything remains implicitly possible
SequenceMake execution plausibleOrders work by dependency, risk, and learningParallel initiatives without capacity reasoning
MetricsCreate feedbackMeasures adoption, reliability, cycle time, cost, and qualityVanity metrics or only completion percentage
GovernanceKeep decisions aliveDefines review cadence and reversal triggersStrategy published once and forgotten

Strategy pyramid

Rendering diagram...

Technical strategy example

FieldExample
Current stateProduct teams share a large deployment unit. Release coordination is slow, incident blast radius is high, and schema ownership is unclear.
Desired stateTeams own independently deployable services around stable domains, with shared platform capabilities for auth, observability, deployment, and data migration safety.
Strategic betDomain ownership plus platform paved roads will reduce coordination cost without creating infrastructure chaos.
Non-goalsWe will not split every module, replace the entire stack, or create one-off service templates per team.
First sequenceMap domains, assign owners, define service readiness bar, extract one low-risk service, validate deployment and observability path, then repeat.
MetricsLead time, failed deployment rate, incident blast radius, number of cross-team release blockers, service readiness compliance.
Reversal triggerIf service count increases operational load faster than platform support reduces coordination load, pause extraction and invest in platform maturity.

Strategy review checklist

  • Does the strategy identify the constraint that matters most?
  • Does it explain why now?
  • Does it state what the organization will stop doing?
  • Does it connect architecture choices to delivery or reliability outcomes?
  • Does it separate principles from decisions?
  • Does it identify irreversible or expensive choices?
  • Does it include a sequence that can survive limited capacity?
  • Does it define evidence that would change the plan?
  • Does every program have an accountable owner?
  • Can team leads explain the strategy without repeating slogans?

Principal engineer leverage

Principal engineer leverage comes from increasing the quality and speed of decisions across many teams. The role is measured less by personal output and more by the decision environment it creates.

High leverage activities:

  • Create a shared technical narrative.
  • Define quality bars that teams can apply without escalation.
  • Mentor through reviews and standards.
  • Remove architectural bottlenecks.
  • Standardize recurring decisions.
  • Build tools that shorten feedback loops.
  • Intervene in high-risk designs early.
  • Turn incidents into systemic improvements.
  • Translate business ambiguity into technical options.
  • Make invisible constraints visible before planning locks.

Low leverage traps:

  • Becoming the only reviewer.
  • Owning every hard decision personally.
  • Replacing team accountability with heroics.
  • Writing strategy without adoption mechanisms.
  • Treating architecture as diagrams instead of constraints.
  • Saying yes to every escalation.
  • Using influence to preserve personal taste instead of organizational capability.
  • Measuring impact only by code volume.

Leverage ladder

LevelActivityScope of impactSustainability
DoPersonally solve a hard problemOne system or teamLow if repeated too often
PairHelp another engineer solve itOne engineer and one systemMedium
ReviewImprove a decision before it shipsTeam or programMedium
StandardizeTurn a recurring decision into a patternMany teamsHigh
AutomatePut the standard into toolingWhole organizationHigh
GovernCreate a feedback loop around the standardWhole organization over timeHighest

Principal engineer allocation model

Time horizonFocusTypical artifacts
DailyUnblock high-risk work, review critical decisions, protect quality barsReview comments, risk notes, escalation decisions
WeeklyAlign execution with strategy, resolve dependencies, improve standardsArchitecture review agenda, decision records, planning feedback
MonthlyReassess technical bets, audit ownership, update platform roadmapStrategy update, risk register, capability map
QuarterlyShape investment themes and organizational capabilityTechnical roadmap, org design input, governance changes

When a principal should intervene

Intervene early when:

  • A decision is hard to reverse.
  • The blast radius crosses team or customer boundaries.
  • The architecture changes ownership or operational responsibility.
  • The team lacks prior experience with the risk class.
  • The decision sets a precedent for many future decisions.
  • The plan optimizes local delivery while increasing global complexity.

Avoid intervention when:

  • The decision is reversible and within team ownership.
  • The team has a clear quality bar and feedback loop.
  • The risk is contained.
  • The main issue is style preference.
  • The intervention would remove learning ownership from the team.

Engineering operating rhythm

Operating rhythm is the cadence by which engineering turns signals into decisions. Good rhythm prevents drift. Bad rhythm creates meetings that do not change behavior.

Weekly rhythm

Cadence itemPurposeInputsOutputs
Reliability reviewDetect production risk earlyIncidents, SLOs, alerts, error budgetsRemediation owners, risk acceptance, follow-up dates
Delivery reviewUnderstand execution truthRoadmap status, blockers, dependency graphReplanned commitments, escalation paths
Architecture office hoursPull risk forwardDesign sketches, ADR drafts, migration plansFeedback, review routing, decision criteria
Dependency triagePrevent hidden blockersCross-team requests, vendor constraints, platform queuesDependency owner, date, fallback plan
Standards reviewKeep quality bars currentReview findings, incidents, repeated defectsUpdated templates, checks, examples

Monthly rhythm

Cadence itemPurposeOutputs
Strategy refreshReconcile strategy with new evidenceUpdated bets, changed priorities, stopped work
Ownership auditFind orphaned systems and ambiguous boundariesOwnership map changes, service catalog updates
Tech debt reviewRank debt by risk and opportunity costFunded remediation, accepted debt, retired complaints
Platform reviewEvaluate developer experience and platform adoptionPaved road improvements, support model changes
Architecture governance reviewCheck whether governance is helping flowReview threshold changes, template updates

Operating rhythm diagram

Rendering diagram...

Operating rhythm health checklist

  • Meetings produce decisions, owners, and dates.
  • Production signals influence planning.
  • Planning commitments include dependency risk.
  • Architecture reviews happen before implementation is locked.
  • Decisions are recorded where future teams can find them.
  • Follow-ups are closed or explicitly accepted as risk.
  • Standards are updated when incidents expose gaps.
  • Review load is measured and reduced when it becomes a bottleneck.

Architecture governance

Architecture governance is the system that helps teams make coherent technical decisions without centralizing every choice. Good governance is lightweight, risk-based, and educational.

It should answer:

  • Which decisions need review?
  • Who reviews them?
  • What evidence is required?
  • What standards apply?
  • How are exceptions approved?
  • How are decisions revisited?
  • How do teams learn from prior decisions?

Governance by decision risk

Decision typeReview levelEvidence requiredExample
Local reversible implementation detailTeam reviewCode review and testsInternal refactor
Local durable design choiceTeam design reviewDesign note, migration plan, test planNew module boundary
Cross-team interfaceArchitecture reviewAPI contract, ownership, versioning, support modelShared platform API
Data ownership or migrationArchitecture plus data reviewBackward compatibility, rollback, audit, backfill planCustomer data schema split
Security or compliance boundarySecurity reviewThreat model, controls, logging, access modelNew auth flow
Production critical systemReadiness reviewSLOs, runbooks, observability, capacity, incident ownerPayments pipeline
Irreversible platform betLeadership reviewStrategy alignment, cost model, exit plan, adoption planCloud provider or database migration

Governance workflow

Rendering diagram...

Governance anti-patterns

Anti-patternSymptomCorrection
Review board as permission gateTeams wait weeks for approvalDefine risk thresholds and delegate low-risk decisions
Architecture by tasteFeedback is opinion-heavy and inconsistentPublish decision criteria and examples
Exceptions without expiryTemporary choices become permanentRequire owner, date, and reversal condition
Governance after implementationReview can only approve or blockRequire early design review for high-risk work
No adoption pathStandards exist but teams do not use themProvide templates, tooling, migration help, and examples

Architecture review checklist

  • Is the problem statement clear and evidence-based?
  • Are the options real alternatives, not one proposal plus strawmen?
  • Are ownership and operational responsibilities explicit?
  • Are interfaces, data contracts, and compatibility rules specified?
  • Are failure modes and rollback paths credible?
  • Are security, privacy, cost, and reliability risks addressed?
  • Are dependencies and sequencing visible?
  • Is the decision reversible? If not, is the evidence strong enough?
  • Are metrics defined for adoption and outcome?
  • Is there a revisit trigger?

Review systems

Review systems convert expertise into shared judgment. A good review system is explicit enough to be teachable and lightweight enough to preserve flow.

Good review systems:

  • Have explicit criteria.
  • Focus on risk.
  • Teach judgment.
  • Preserve flow.
  • Escalate only meaningful ambiguity.
  • Link decisions to future evidence.
  • Reduce repeated feedback by updating standards and tools.

Review types:

  • Code review.
  • Design review.
  • Production readiness review.
  • Security review.
  • Data migration review.
  • Incident review.
  • Architecture review.
  • Operational readiness review.
  • Dependency review.

Review routing matrix

Change characteristicReview neededReviewer profile
Small local change with testsCode reviewTeam peer
New public APIDesign and code reviewTeam peer plus API owner
Shared library changeCode review plus consumer impact reviewMaintainer and representative consumer
User data migrationData migration reviewData owner, service owner, operations
New external dependencyDependency reviewOwning team, security, platform if needed
New production serviceReadiness reviewService owner, SRE or platform, security if exposed
Cross-domain architectureArchitecture reviewDomain owners and principal or staff reviewer
Incident remediationIncident reviewService owner, affected teams, quality owner

Review quality rubric

DimensionWeak reviewStrong review
CorrectnessSpots syntax or obvious bugsTests assumptions, invariants, and edge cases
DesignComments on personal styleEvaluates boundaries, coupling, ownership, and reversibility
RiskTreats all issues equallyPrioritizes blast radius and failure modes
EvidenceSays "seems fine"Asks for or verifies meaningful evidence
TeachingGives commandsExplains the principle behind the request
FlowBlocks on minor preferencesSeparates required changes from suggestions

Review comment template

Use this shape for high-signal review comments:

Concern: <what could go wrong>
Reason: <why this matters, including risk or invariant>
Evidence: <line, test, incident, metric, or standard>
Request: <specific required change or question>
Severity: <blocking, should fix, suggestion>

Example:

Concern: The migration assumes all rows have a valid workspace_id.
Reason: Historical imports created rows before workspace assignment was enforced, so this can fail during backfill.
Evidence: Import path before 2025-03 did not require workspace_id.
Request: Add a preflight query and a remediation path before the migration runs.
Severity: blocking

Review system checklist

  • Review criteria are documented.
  • Review routing is based on risk, not title.
  • Reviewers label blocking feedback clearly.
  • Repeated feedback becomes a standard, lint rule, template, or example.
  • Teams can make low-risk decisions without central approval.
  • High-risk decisions are reviewed before implementation locks in.
  • Review latency is measured.
  • Reviewers are rotated and mentored.
  • Review outcomes are connected to later production evidence.

Decision quality

Decision quality is the quality of the reasoning process given the information available at the time. It is not the same as outcome quality. A good decision can have a bad outcome if reality changes. A bad decision can get lucky.

Decision record

Every significant decision should answer:

  • What are we deciding?
  • Why does this decision matter now?
  • What constraints matter?
  • What options exist?
  • What evidence supports each option?
  • What are the consequences?
  • What would make us reverse this?
  • Who owns follow-up?
  • When will we revisit the decision?

Decision classification

Decision classReversibilityRecommended process
Type 1Expensive or impossible to reverseSlow down, gather evidence, review broadly, document carefully
Type 2Reversible with moderate costDecide with accountable owner, instrument outcome, revisit
Type 3Local and easily reversibleLet the team decide, review through normal code or design review

Decision quality checklist

  • The decision is phrased as a choice, not as an implementation task.
  • Constraints are separated from preferences.
  • At least two real options are considered.
  • The recommended option has explicit tradeoffs.
  • The decision names who benefits and who pays the cost.
  • Unknowns are stated.
  • Reversal criteria are explicit.
  • The follow-up owner has authority to act.
  • The decision record is discoverable.

ADR template

# ADR: <decision title>

Date: <YYYY-MM-DD>
Status: Proposed | Accepted | Superseded
Owner: <team or role>
Reviewers: <teams or roles>

## Context

<What is happening, why now, and what constraints matter.>

## Decision

<The choice being made.>

## Options considered

| Option | Benefits | Costs | Risks | Reversibility |
| --- | --- | --- | --- | --- |
| <option> | <benefits> | <costs> | <risks> | <high, medium, low> |

## Consequences

<Expected operational, delivery, cost, security, and ownership effects.>

## Evidence and validation

<Tests, metrics, incidents, prototypes, benchmarks, or user evidence.>

## Reversal or revisit criteria

<What would cause us to change this decision.>

## Follow-up

| Action | Owner | Due date | Evidence |
| --- | --- | --- | --- |
| <action> | <owner> | <date> | <evidence> |

Execution systems

Execution systems make delivery truth visible. They connect strategy to shipped, operated, and learned-from changes.

Strong execution is not the same as aggressive commitment. Strong execution means the organization can see work clearly, sequence it honestly, manage dependencies, preserve quality, and adapt when evidence changes.

Execution loop

Rendering diagram...

Execution control points

Control pointQuestionEvidence
IntakeShould this work exist?Strategy link, user impact, risk reduction, opportunity cost
ShapingIs the problem understood enough to plan?Problem statement, options, constraints, success criteria
PlanningCan the sequence survive reality?Dependencies, capacity, milestones, fallback plan
ReadinessIs the change safe to expose?Tests, observability, rollback, runbook, support owner
LaunchAre we learning safely?Rollout plan, metrics, alerting, incident owner
Follow-upDid the work produce the intended outcome?Outcome metrics, incident review, adoption data

Planning artifact checklist

  • Problem statement includes user, business, or operational impact.
  • Success criteria are observable.
  • Scope and non-scope are explicit.
  • Dependencies have owners and dates.
  • Risks have mitigations or acceptance.
  • Rollout and rollback are described.
  • Quality bars are named before implementation starts.
  • Milestones prove learning, not only activity.
  • Work is sliced so partial delivery creates value or reduces risk.

Execution risk register template

| Risk | Impact | Likelihood | Owner | Mitigation | Trigger | Status |
| --- | --- | --- | --- | --- | --- | --- |
| <risk> | <customer, delivery, cost, security, reliability impact> | <low, medium, high> | <owner> | <action> | <signal that risk is materializing> | <open, accepted, mitigated> |

Dependency management

Dependencies are commitments between teams. They need ownership, dates, fallback paths, and escalation rules.

Poor dependency management creates hidden queues. Technical leaders make those queues visible before teams commit to plans.

Dependency types

Dependency typeExampleManagement tactic
Technical dependencyPlatform API required before product work can shipContract-first design, early integration test, fallback path
Data dependencyMigration must finish before feature rolloutPreflight checks, phased backfill, compatibility window
Organizational dependencyAnother team owns a required serviceNamed owner, planning commitment, escalation path
Vendor dependencyExternal provider must approve or deliver capabilityDeadline buffer, alternative provider, degraded mode
Compliance dependencyLegal or security approval requiredEarly review, evidence packet, clear risk acceptance
Operational dependencyOn-call or support model not readyReadiness checklist, runbook, training, launch gate

Dependency board template

| Dependency | Needed by | Providing team | Owning person | Required date | Current status | Fallback | Escalation date |
| --- | --- | --- | --- | --- | --- | --- | --- |
| <dependency> | <consumer> | <provider> | <owner> | <date> | <status> | <fallback> | <date> |

Dependency management checklist

  • Each dependency has one named accountable owner.
  • The provider and consumer agree on the contract.
  • The required date is tied to a milestone, not a vague quarter.
  • The fallback plan is credible.
  • Integration risk is tested early.
  • Escalation happens before the plan is already broken.
  • Dependencies are reviewed in the operating rhythm.
  • Completed dependencies are validated by the consuming team.

Org design and ownership

Org design is an architecture decision. Team boundaries determine communication paths, incentives, operational responsibility, and the cost of change.

Ownership model

Ownership areaDefinitionRequired clarity
Product ownershipWho decides what user outcome mattersProduct goals, roadmap priority, success metrics
Technical ownershipWho decides implementation and architectureService boundaries, standards, technology choices
Operational ownershipWho responds when it breaksOn-call, runbooks, SLOs, incident authority
Data ownershipWho defines meaning, schema, access, and lifecycleData contracts, privacy, retention, migration authority
Platform ownershipWho provides reusable capabilitiesAPI contracts, support model, adoption path, deprecation policy

Team topology patterns

Team typePurposeLeadership concern
Stream-aligned teamOwns a user or business flow end to endGive it enough autonomy and platform support
Platform teamProvides internal capabilities that reduce cognitive loadTreat platform as a product with adoption metrics
Enabling teamHelps other teams learn a capabilityAvoid permanent dependency and measure skill transfer
Complicated subsystem teamOwns specialized technical domainProtect expertise while preventing an ivory tower

Ownership smell catalog

SmellConsequenceFix
Many teams can change a service but none operate itIncidents become blame and delayAssign operational owner and change authority
Platform team builds without product feedbackLow adoption and shadow toolingAdd product management, support intake, and adoption metrics
Domain data is owned by infrastructureBusiness rules drift into technical layersMove semantic ownership to domain team
Team owns code but not roadmapTechnical debt accumulates without fundingAlign roadmap capacity with ownership responsibilities
Service has no deprecation ownerDead paths remain foreverAdd lifecycle owner and retirement process

Org and architecture review questions

  • Does the proposed team structure reduce coordination on high-frequency work?
  • Does each team have the authority required for its accountability?
  • Are shared capabilities owned as products?
  • Are support and on-call responsibilities aligned with change authority?
  • Are domain boundaries reflected in data ownership?
  • Are experts enabling others or becoming permanent gatekeepers?
  • Does the org design make the desired architecture easier to evolve?

Communicating tradeoffs

Technical leaders communicate tradeoffs so stakeholders can make informed decisions. The goal is not to win a technical argument. The goal is to expose cost, risk, timing, reversibility, and uncertainty in language the audience can act on.

Tradeoff framing

Technical concernStakeholder translation
CouplingFuture changes will require coordination across more teams
LatencyUsers will wait longer or workflows will feel slower
Operational complexityIncidents will be harder to diagnose and resolve
Migration riskExisting customers or data may be affected during transition
Vendor lock-inFuture negotiation and exit options become weaker
Test coverage gapWe have less evidence that the change behaves correctly
Inconsistent patternsEngineers will spend more time rediscovering local rules
Missing observabilityWe may not know quickly when the system is failing

Tradeoff memo template

# Tradeoff memo: <topic>

## Decision needed

<The decision stakeholders must make.>

## Recommendation

<The recommended option and why.>

## Options

| Option | Benefit | Cost | Risk | Reversibility | Time impact |
| --- | --- | --- | --- | --- | --- |
| <option> | <benefit> | <cost> | <risk> | <high, medium, low> | <impact> |

## What we know

<Evidence and constraints.>

## What is uncertain

<Important unknowns and how we will reduce them.>

## Decision deadline

<Date or event that creates urgency.>

## Consequences of waiting

<What gets worse, better, or remains optional if no decision is made.>

Executive communication checklist

  • Start with the decision needed.
  • State the recommendation before the details.
  • Translate technical risk into business or operational impact.
  • Separate facts, assumptions, and opinions.
  • Name cost, timing, and reversibility.
  • Identify the decision deadline.
  • Explain what happens if no decision is made.
  • Avoid jargon unless the audience already uses it.
  • End with a clear ask.

Mentoring through standards

Mentoring scales when expectations are visible and reusable. A standard is a teaching tool when it explains the reason behind the rule and includes examples.

Good standards:

  • Encode lessons from incidents and reviews.
  • Explain the principle, not only the rule.
  • Include examples of acceptable and unacceptable patterns.
  • Are easy to apply during planning and review.
  • Are enforced by tooling where possible.
  • Have an exception process.
  • Are periodically retired or simplified.

Mentoring modes

ModeUse whenOutput
PairingThe engineer needs live judgment transferShared implementation and reasoning
ReviewThe work is mostly ready but needs quality feedbackSpecific corrections and general principle
Design critiqueThe problem framing or boundary is unclearBetter options and decision criteria
Written standardThe same issue appears repeatedlyDurable guidance and examples
Office hoursMany teams need access to expertiseFaster routing and shared learning
Post-incident teachingProduction exposed a systemic gapUpdated standard, training, and checks

Standard template

# Standard: <name>

## Purpose

<What risk this standard reduces or what capability it enables.>

## Applies to

<Systems, teams, change types, or contexts.>

## Rule

<The expected practice.>

## Rationale

<Why the rule exists.>

## Examples

### Good

<Concrete acceptable pattern.>

### Avoid

<Concrete pattern that creates risk.>

## Exceptions

<Who can approve exceptions, what evidence is required, and when to revisit.>

## Enforcement

<Review checklist, lint rule, CI check, template, or readiness gate.>

Mentoring checklist for staff-plus engineers

  • Give the principle behind feedback.
  • Distinguish blocker, recommendation, and preference.
  • Ask the engineer to explain the tradeoff back.
  • Turn repeated comments into a standard or tool.
  • Delegate decisions when risk is contained.
  • Preserve team ownership after giving advice.
  • Follow up on outcomes, not only implementation.
  • Publicly document patterns so access to judgment is not relationship-based.

Examples

Example: service extraction decision

DimensionAssessment
ProblemA domain inside a monolith changes frequently and blocks unrelated releases.
ConstraintThe team does not yet have mature service observability or on-call experience.
Option AKeep module in monolith and improve internal boundaries.
Option BExtract service immediately.
Option CCreate an internal module boundary, add ownership and metrics, then extract after readiness criteria are met.
RecommendationOption C, because it reduces coordination cost while avoiding premature distributed-system complexity.
Quality barClear domain API, migration plan, telemetry, rollback, service readiness checklist.
Revisit triggerIf the module boundary still blocks release flow after two planning cycles, restart extraction review.

Example: platform adoption problem

SymptomDiagnosisLeadership action
Teams bypass the deployment platformThe platform optimizes central control more than team flowInterview consumers, measure friction, simplify paved road, publish escape hatch
Platform team says teams are noncompliantStandards may be unclear or costly to followTurn requirements into templates, automation, and migration support
Executives see inconsistent deliveryPlatform value is not tied to business outcomesReport lead time, failed deployment rate, adoption, and support load

Example: incident to standard

Incident findingSystemic standard
Alert fired without actionable contextEvery page must include service, symptom, likely causes, dashboard, and runbook link
Rollback required manual database repairHigh-risk migrations require preflight, compatibility window, rollback decision point, and owner
Customer impact was discovered through supportCritical user journeys require synthetic checks or direct product telemetry

Templates

Architecture review agenda

# Architecture review: <topic>

## Desired decision

<Approve, reject, request changes, or identify missing evidence.>

## Context

<Problem, constraints, user impact, business impact.>

## Options

<Options and tradeoffs.>

## Risk areas

- Ownership:
- Data:
- Security:
- Reliability:
- Cost:
- Migration:
- Operations:

## Decision

<Decision and rationale.>

## Follow-up

| Action | Owner | Date | Evidence |
| --- | --- | --- | --- |
| <action> | <owner> | <date> | <evidence> |

Production readiness checklist

  • Owning team is named.
  • On-call or support path is defined.
  • SLO or service objective is documented.
  • Dashboards show user impact and system health.
  • Alerts are actionable and routed.
  • Runbook covers common failure modes.
  • Rollback or mitigation path is tested.
  • Capacity assumptions are documented.
  • Security and access controls are reviewed.
  • Data backup, retention, and deletion requirements are satisfied.
  • Launch plan includes staged rollout and stop criteria.
  • Post-launch review date is scheduled.

Technical initiative one-pager

# Initiative: <name>

## Why now

<The constraint, opportunity, or risk.>

## Outcome

<Observable result.>

## Scope

<Included work.>

## Non-scope

<Explicit exclusions.>

## Owners

Business owner:
Technical owner:
Operational owner:

## Dependencies

| Dependency | Owner | Needed by | Fallback |
| --- | --- | --- | --- |
| <dependency> | <owner> | <date> | <fallback> |

## Risks

| Risk | Mitigation | Trigger |
| --- | --- | --- |
| <risk> | <mitigation> | <trigger> |

## Evidence of success

<Metrics, adoption, reliability, cost, or delivery signal.>

Weekly technical leadership review

# Weekly technical leadership review

Date: <YYYY-MM-DD>

## Reliability

- Incidents:
- Error budget or SLO concerns:
- Follow-ups at risk:

## Delivery

- Commitments at risk:
- Dependency blockers:
- Scope changes:

## Architecture

- Decisions needed:
- Reviews scheduled:
- Boundary or ownership concerns:

## Quality

- Repeated review findings:
- Standards needing updates:
- Tooling gaps:

## People and leverage

- Mentoring opportunities:
- Decisions to delegate:
- Bottlenecks to remove:

Leadership failure modes

Failure modeHow it appearsCountermeasure
Hero architectureOne senior person must approve everythingPublish criteria, delegate low-risk decisions, mentor reviewers
Strategy theaterStrategy exists but does not affect planningTie roadmap intake and review gates to strategic bets
Local optimizationTeams improve their area while system complexity risesUse cross-team architecture and dependency reviews
Quality driftStandards are known but not enforcedAdd automation, templates, readiness checks, and review calibration
Meeting gravityCadence grows without decisionsRequire owner, decision, date, or remove the meeting
Debt laundry listEvery annoyance competes for attentionRank debt by risk, cost of delay, and strategic constraint
Governance dragReview slows teams without improving outcomesMeasure review latency and narrow required review thresholds
Mentoring by proximityOnly favored engineers receive contextWrite standards, run office hours, rotate review exposure