qa-okr-author
Build-an-X workflow that drafts a QA team's quarterly OKR set - one to three Objectives, each with 3 - 5 measurable Key Results - from the team's current state (risk matrix, defect-trend narrative, test-run history, test-pyramid balance, compliance coverage). Every numeric target cites its source artifact (e.g., `defect-trend-narrator` 2026-Q1 escape rate). QA-specific by design - generic OKR generators (Tability, Asana, ClickUp) don't know test metrics; the differentiation is the domain. Distinct from `test-strategy-author` (which authors the strategy doc) and from `risk-matrix-recommender` (which calibrates risk inputs). Use at the start of each quarter to draft the OKR set the manager edits and the team commits to.
qa-okr-author
Overview
The QA manager opens a blank document at quarter-start to draft OKRs. Generic OKR templates (Tability, Asana, ClickUp) all share the same flaw: they don't know what to measure for a QA team. "Improve quality" is not an Objective; "achieve 95% pass rate" is not a Key Result without a baseline and a documented method. This skill produces a draft anchored on the team's actual current-state data, with every numeric target citing the artifact it came from.
Per the canonical OKR framework (Doerr 2018, Measure What Matters; Grove's original formulation at Intel), each Objective is "a significant, concrete, clearly defined goal that is inspirational," and each Key Result is "measurable success criteria using 0 - 100% scales or numerical values with no ambiguity." Doerr recommends organizations target a 70% success rate - consistently hitting 100% means the OKRs need to be more aspirational. This skill emits both Committed KRs (target: 1.0 grading, binary outcomes) and Aspirational KRs (target: 0.7 grading, stretch).
When to use
Do not use this skill when:
Step 1 - Capture the inputs
Required:
| Input | Source | Why load-bearing |
|---|---|---|
| Quarterly objective(s) | Manager-provided; aligned with engineering / product OKRs | The skill drafts KRs under objectives the team owns; it won't invent strategic direction |
| Current-state metrics | At least one of: risk-matrix output, defect-trend-narrator recent report, test-run-summary-author cross-run-trend, test-pyramid-balancer audit | Every KR needs a baseline - without it, the target is unanchored |
| Time horizon | Quarterly (default) or other (semi-annual) | OKR cadence; per Doerr, quarterly is the canonical rhythm |
| Prior OKR set | If exists; the prior quarter's KRs + their grading | Continuity: drift from prior commitments is itself a signal |
The skill halts with MISSING_BASELINE (supply ≥1 current-state metric source) if no measurable input is offered.
Step 2 - Walk the QA-OKR shape catalog
Five canonical QA Objective shapes the skill recognises (catalog, not prescription). Each maps to a measurable KR family. The manager picks 1 - 3; the skill drafts the KRs.
Shape 1 - Strengthen the test pyramid
Anchored on test-pyramid-balancer. Used when the suite is E2E-heavy and shifting weight downward improves cycle time + maintainability.
| KR axis | Example KR | Baseline source |
|---|---|---|
| Layer ratio | unit:integration:E2E reaches 70:20:10 | current ratio per test-pyramid-balancer |
| Cycle time | regression suite duration < 45 min per shard | current per test-run-summary-author |
| E2E suite budget | E2E test count ≤ 200, growth rate ≤ 5/quarter | e2e-suite-budget |
Shape 2 - Reduce escape-defect rate
Anchored on defect-trend-narrator. Used when production defects are above the team's tolerance.
| KR axis | Example KR | Baseline source |
|---|---|---|
| Volume | P1 escapes < 2/quarter; P2 escapes < 10/quarter | current per defect-trend-narrator quarterly report |
| Time-to-detect | MTTD on P1 < 4 hours | per mttr-mtbf-tracker |
| Category-specific | regression-class escapes -50% WoW | per defect-clusterer + defect-trend-narrator |
Shape 3 - Cut regression cycle time
Anchored on test-run-summary-author. Used when CI is the bottleneck.
| KR axis | Example KR | Baseline source |
|---|---|---|
| Wall-clock | regression suite < 60 min per shard, 4× parallel | test-run-summary-author |
| Parallelisation | sharding factor ≥ 8 with no shard >90 min | CI config + test-run-summary-author |
| CI cost | per-PR CI cost -30% via TIA | regression-suite-selector |
Shape 4 - Reduce flake-budget consumption
Anchored on ai-flake-detector + flaky-test-quarantine. Used when flake rate is above the team's tolerance (flakiness is widespread and well-documented: about 16% of tests at Google show some flakiness per the Google Testing Blog; for KR targets, below 5% is aspirational and under 10% is a reasonable committed bar).
| KR axis | Example KR | Baseline source |
|---|---|---|
| Quarantine ceiling | quarantine list ≤ 5 at any point | current per flaky-test-quarantine |
| Flake rate | flake rate < 3% of CI runs (vs 8% current baseline) | per ai-flake-detector weekly history |
| Repair velocity | mean time-to-repair on quarantined test < 5 days | per flaky-test-quarantine |
Shape 5 - Close compliance evidence gaps
Anchored on compliance-readiness-reviewer. Used in regulated industries (healthcare, finance, automotive).
| KR axis | Example KR | Baseline source |
|---|---|---|
| Per-control coverage | SOC 2 Trust Service Criteria coverage ≥ 95% | compliance-readiness-reviewer |
| Evidence freshness | every control's evidence ≤ 90 days old | soc2-evidence-collector |
| Audit pass-rate | external audit findings ≤ 3, no high-severity | prior audit history |
Other Objective shapes are valid; these are the most-cited in QA-manager-facing literature.
Step 3 - Draft committed vs aspirational Key Results
Per the canonical framework, each KR is either Committed (target grading 1.0, binary outcomes the team promises to ship) or Aspirational (target grading 0.7, stretch goals where 70% completion is success).
The skill flags each KR explicitly:
## Objective 2 — Reduce escape-defect rate
**Quarter:** 2026-Q3 (Jul–Sep)
**Rationale:** Current quarterly P1 escape rate is 4 (per `defect-trend-narrator` 2026-Q2 report — citing `tracker-export-2026-Q2.json` lines `filter(severity=P1, found_in=production)`). Industry context: PractiTest 2026 finds 19.9% of teams use AI for risk identification — the team is below this. Reducing escape rate is the team's primary tied-to-revenue quality metric.
### Key Results
| # | Type | KR | Baseline | Source |
|---|---|---|---|---|
| KR2.1 | **Committed** | P1 escapes reach ≤ 2/quarter | 4/quarter (2026-Q2) | `defect-trend-narrator` |
| KR2.2 | **Committed** | P2 escapes reach ≤ 8/quarter | 13/quarter (2026-Q2) | `defect-trend-narrator` |
| KR2.3 | **Aspirational** | MTTD on P1 reaches ≤ 4h (median) | 11h (2026-Q2) | `mttr-mtbf-tracker` |
| KR2.4 | **Aspirational** | Regression-class escapes reach -50% vs Q2 | 18 → 9 | `defect-clusterer` + `defect-trend-narrator` |
### Scoring (per Doerr / Grove canon)
- Committed KRs (2.1, 2.2): grading target 1.0 — anything <1.0 is a miss.
- Aspirational KRs (2.3, 2.4): grading target 0.7 — 70% achievement is success.
### Risk if all KRs are committed
The team avoids risk by setting only committed KRs at safe levels. Per Doerr: "Consistently meeting 100% indicates OKRs need re-evaluation." Mix at least one aspirational KR per Objective.Step 4 - Cite every numeric target
The skill refuses to emit a KR target without citing the baseline. The output's audit appendix is the load-bearing artifact that lets the team verify the draft is grounded:
### Audit (sources for every numeric target)
| KR | Target | Baseline | Source artifact / query |
|---|---|---|---|
| KR1.1 | unit:integration:E2E = 70:20:10 | 41:14:45 | `test-pyramid-balancer` 2026-Q2 output |
| KR1.2 | regression duration < 45 min/shard | 67 min/shard | `test-run-summary-author cross-run-trend` 2026-Q2 |
| KR2.1 | P1 escapes ≤ 2/quarter | 4/quarter | `defect-trend-narrator` filter(severity=P1, found_in=production, window=2026-Q2) |
| KR2.3 | MTTD P1 ≤ 4h median | 11h median | `mttr-mtbf-tracker` per-incident log, 2026-Q2 |
| KR3.1 | flake rate < 3% of runs | 8% | `ai-flake-detector` weekly export 2026-Q2 |
| KR3.2 | quarantine list ≤ 5 | 11 | `flaky-test-quarantine` snapshot 2026-06-30 |If a baseline is not retrievable, the KR is flagged [BASELINE_NEEDED] in the draft and excluded from the committed set until the team supplies the data.
Step 5 - Cross-check against organisational alignment
OKRs are not authored in isolation. The skill emits an alignment check section the manager fills before committing:
### Alignment check
| Layer | OKR or theme | This QA OKR set's contribution |
|---|---|---|
| Company quarterly theme | "Reduce mean revenue-affecting incident cost" | Objective 2 (escape rate) directly contributes |
| Engineering OKR | "Cut release cycle to weekly" | Objective 3 (regression cycle time) directly contributes |
| Product OKR | "Ship Feature X with high-stakes user impact" | Objective 2 ties to risk-prevention; Objective 5 ties to compliance review |
| SRE OKR | "Maintain 99.9% SLO" | Objective 2 (escape rate) and Objective 4 (flake budget) tie via `error-budget-tests` |Per Doerr, OKRs at the team level should "ladder up" to company OKRs. The skill makes the laddering explicit so the team can validate alignment in stakeholder review.
Step 6 - Hand off to retro / quarterly review
The OKR set is the start of the loop, not the end. Hand-offs at quarter-end:
Worked example - quarter-start draft for a 6-engineer QA team
Input:
Output:
# QA OKRs — 2026-Q3 (Jul–Sep)
## Objective 1 — Reduce escape-defect rate
**Why:** Q2 P1 escapes (4) caused ~$X revenue impact per the customer-success retro. Q1 was 6; the trend is improving. Q3 target accelerates the trend.
| # | Type | KR | Baseline | Source |
|---|---|---|---|---|
| KR1.1 | Committed | P1 escapes ≤ 2 | 4 (Q2) | `defect-trend-narrator` |
| KR1.2 | Committed | P2 escapes ≤ 8 | 13 (Q2) | `defect-trend-narrator` |
| KR1.3 | Aspirational | MTTD P1 ≤ 4h median | 11h (Q2) | `mttr-mtbf-tracker` |
| KR1.4 | Aspirational | Regression-class escapes -50% | 18 → 9 | `defect-clusterer` |
## Objective 2 — Cut regression cycle time
**Why:** Engineering's "release weekly" OKR depends on regression < 60 min/shard. Q2 was 67 min.
| # | Type | KR | Baseline | Source |
|---|---|---|---|---|
| KR2.1 | Committed | Regression suite < 60 min/shard | 67 min (Q2) | `test-run-summary-author` |
| KR2.2 | Committed | Sharding factor ≥ 8, no shard > 75 min | 6, max shard 67 min (Q2) | CI config + summary |
| KR2.3 | Aspirational | Per-PR CI cost -30% via TIA | $0.42/PR (Q2) | `regression-suite-selector` adoption |
### Alignment check
| Layer | OKR | Contribution |
|---|---|---|
| Company Q3 theme | "Reduce revenue-affecting incident cost" | Objective 1 directly |
| Engineering | "Release weekly" | Objective 2 directly |
| SRE | "Maintain 99.9% SLO" | Objective 1 (via escape rate) |
### Audit
| KR | Target | Baseline | Source |
|---|---|---|---|
| KR1.1 | ≤2 | 4 | `defect-trend-narrator` filter(severity=P1, found_in=production, window=2026-Q2) |
| KR1.2 | ≤8 | 13 | same filter, severity=P2 |
| KR1.3 | ≤4h median | 11h median | `mttr-mtbf-tracker` log 2026-Q2 |
| KR1.4 | 9 | 18 | `defect-clusterer` category=regression, 2026-Q2 |
| KR2.1 | <60 min/shard | 67 min/shard | `test-run-summary-author` cross-run-trend 2026-Q2 |
| KR2.2 | shard≥8 | shard=6 | `playwright.config.ts` workers + `test-run-summary-author` |
| KR2.3 | -30% per-PR | $0.42/PR | CI billing export + `regression-suite-selector` adoption rate |Anti-patterns
| Anti-pattern | Why it fails | Fix |
|---|---|---|
| Setting KRs without baselines | "Reach 99% pass rate" - from what? The KR is unanchored; success can't be measured | Step 4 enforces baseline citation; flag [BASELINE_NEEDED] if missing |
| 100% committed KRs, no aspirational | Per Doerr: "Consistently meeting 100% indicates OKRs need re-evaluation" | At least one aspirational KR per Objective |
| KRs that aren't team-controllable | "Customer churn -50%" - QA can't move that lever alone | KRs are scoped to outcomes QA can directly cause |
| Six or more Objectives | Loss of focus; per the canonical framework, 1 - 3 Objectives is the recommended ceiling | Step 1 caps Objectives at 3 |
| KR with no measurable axis ("improve quality") | Not gradeable; the team cannot tell if it succeeded | Step 3 rejects un-measurable KRs |
| KRs that drift from the prior quarter without acknowledging the drift | Continuous re-targeting hides chronic underperformance | Step 1 ingests prior OKR set; drift surfaces in the rationale |
| Generic OKR template adopted without QA-specific KRs | The team commits to "ship more features" goals that don't measure quality | This skill is QA-domain-specific by design |
| Author OKRs without alignment to engineering / product | The team commits to goals nobody else cares about | Step 5 alignment check is required |