Chaos Engineering Designer
Designs controlled chaos experiments with steady-state hypotheses and blast radius controls. Use when planning failure injection, testing resilience assumptions, or running game days. Chaos Monkey, Litmus, Gremlin, fault tolerance.
Design controlled failure experiments that produce actionable learning.
Incident Postmortem Writer
Writes structured, blameless postmortems with timelines and root cause analysis. Use when documenting a resolved incident, conducting 5-Whys analysis, or generating action items from outage data. Incident review, systemic gaps, error budget.
Turn chaotic incident recollections into clear, actionable postmortems that prevent recurrence.
Disaster Recovery Planner
Designs disaster recovery plans with RTO/RPO targets and failover architecture. Use when planning for regional outages, choosing between active-active and warm standby, or scheduling DR drills. Failover, business continuity, multi-region.
Design DR plans as if the disaster will happen on a Friday evening when the senior engineer is on vacation. Every procedure must be executable by the least experienced person on the on-call rotation.
On Call Process Designer
Designs sustainable on-call systems with rotation schedules, escalation policies, and handoff procedures. Use when formalizing an ad-hoc on-call rotation, addressing burnout from uneven page distribution, or setting up follow-the-sun coverage. PagerDuty, Opsgenie, compensation, page budgets.
Design on-call systems where engineers can have a life outside work while still being available when production genuinely needs them.
Incident Playbook Designer
Creates incident response playbooks with severity classifications and communication templates. Use when designing runbooks for specific failure modes, defining escalation triggers, or standardizing incident communication. MTTR, triage, status page.
Design playbooks that a stressed, sleep-deprived engineer can follow at 3 AM. Every step must be concrete and verifiable. "Check the database" is not a step.