Testland
Browse all skills & agents

qa-resilience-drills

Resilience drills: 6 skills (backup-verification-author, dr-drill-runner, error-budget-tests, mttr-mtbf-tracker, restore-time-tests, slo-negotiation-prep) and 2 agents (dr-drill-orchestrator, reliability-review-agent). Production-grade DR drills + backup verification + RTO + error-budget gating + incident metrics.

Install this plugin

/plugin install qa-resilience-drills@testland-qa

Part of role bundle: qa-role-performance

qa-resilience-drills

Production-grade resilience discipline - DR drills, backup verification, restore-time SLAs, error budgets, MTTR/MTBF tracking. Distinct from qa-chaos (experiment-authoring) - this plugin covers measured, scheduled drills + the metrics they feed.

Components

TypeNameDescription
Skilldr-drill-runnerPer-tier RTO + RPO; pre-drill checklist; drill workflow (announce → fail-over → verify → fail-back → cleanup); post-drill report; cold/warm/hot tier-specific patterns; cadence (monthly/quarterly/annual)
Skillbackup-verification-authorPer-backup-type integrity (SHA-256 + signature); restore-to-test-env spot check; partial-restore; cross-region replication SLA; retention-policy verification; encryption + key recovery
Skillrestore-time-testsTTF segments; baseline timed restore; parallel-restore optimization; PITR latency; partial object-store restore; trend tracking; cold-start latency
Skillerror-budget-testsSLI calculation; budget consumption; multi-window multi-burn-rate alerting; freeze-trigger when budget exhausted; rolling-window reset; weekly stakeholder reporting
Skillmttr-mtbf-trackerPer-incident schema (detected/acknowledged/mitigated/resolved); MTTD / MTTA / MTTR / MTBF formulae; ITIL alignment; postmortem integration; mitigation vs resolution distinction
Skillslo-negotiation-prepBuild-an-X prep pack for the QA - SRE - Product SLO conversation: current error-budget consumption + MTTR/MTBF trend + framed decision question + 3-5 option matrix (impact / reversibility / stakeholder cost) + recommended posture with cited alternatives.
Agentdr-drill-orchestratorExecutes a planned DR drill end to end: pre-drill checklist, failover, RTO/RPO monitor, fail-back, post-drill report.
Agentreliability-review-agentComposes error-budget burn + MTTR/MTBF into a weekly manager-facing reliability review narrative.

Install

/plugin marketplace add testland/qa
/plugin install qa-resilience-drills@testland-qa

Skills

backup-verification-author

Author backup-verification harness - per-backup-type integrity (SHA-256 / encrypted-payload signature), restore-to-test-env spot-check cadence, partial-restore (single-table / single-object) verification, cross-region replication validation, retention-policy assertions. "An untested backup is not a backup.

dr-drill-runner

Author and execute a single DR drill for one service: author the runbook (per-tier RTO + RPO), pre-drill checklist (data sync state, alert silencing, customer comms), drill workflow (announce, fail-over, verify, fail-back) with timestamps, standby verification, failback, and an auditor-ready post-drill report. Per Google Cloud DR planning guide; covers cold / warm / hot standby tier-specific patterns. For coordinating drills across multiple services or teams, use dr-drill-orchestrator.

error-budget-tests

Build error-budget gate tests - SLO + error-budget calculation per Google SRE workbook ("difference between target uptime and actual uptime"); burn-rate alerting; monthly-budget exhaustion test; freeze-trigger when budget consumed. Per sre.google embracing-risk reference.

mttr-mtbf-tracker

Reference for tracking MTTR (Mean Time To Recovery) / MTBF (Mean Time Between Failures) / MTTD (Mean Time To Detection) / MTTA (Mean Time To Acknowledge) - incident-record schema, calculation formulae, dashboards-as-code, target-vs-actual alerting. Aligns with ITIL incident management + ISO 20000 + Google SRE incident response chapter.

restore-time-tests

Build restore-time SLA tests - per-database + per-object-store baseline measurement, RTO objective verification, parallel-restore optimization tests, point-in-time-recovery (PITR) latency. Bound `time-to-functional` (TTF) ≤ documented RTO; flag silent regressions when restore time grows over months.

slo-negotiation-prep

Build-an-X workflow that produces the manager's prep pack for the QA - SRE - Product SLO conversation - current error-budget consumption + MTTR/MTBF trend + a single framed decision question + an explicit 3-5 option matrix with reversibility / stakeholder cost / impact scoring + recommended posture with cited alternatives. Distinct from `error-budget-tests` (which computes the SLI / SLO / budget math; this skill consumes it) and from `mttr-mtbf-tracker` (pure-reference incident schema; this skill consumes per-incident metrics). Use when budget is burning or a proposed change will stress the SLO - the output is the evidence pack the manager carries into the meeting, not a recommendation about which option to pick.