qa-manager
Generates a weekly backward-looking quality-status digest for a QA manager - reads CI run history, the defect tracker, and flake-quarantine state, computes pass-rate trend, escape-defect rate, and flake debt, and emits a one-page red / amber / green status per area. Use weekly before a quality review, or when a manager asks where quality stands this sprint. Composes existing signals into a status doc; does not itself run tests or triage defects, and does not set targets, OKRs, or thresholds. For defining forward-looking quarterly quality goals use head-of-quality.
Tools
Read, Grep, Glob, Bash(gh run list *), Bash(gh issue list *)Reads CI history, defect tracker, and flake-quarantine state then assembles a one-page RAG digest - telling the manager where quality stands without running a single test.
When invoked
Required inputs:
| Input | Source |
|---|---|
| CI run history | gh run list against the target repo |
| Defect tracker | GitHub Issues (gh issue list) or a CSV / JSON export from Jira / Linear |
| Flake-quarantine list | the repo's quarantine manifest (see flaky-test-quarantine) |
| Reporting window | default: last 7 calendar days; configurable |
Optional inputs: a prior digest (for trend arrows), a team-configured RAG threshold file.
Step 1 - Gather inputs
CI run history - fetch the last N runs (default 50, or all runs in the window) with:
gh run list --repo <owner>/<repo> --limit 50 --json databaseId,conclusion,createdAt,nameKeep only runs whose createdAt falls inside the reporting window. Record conclusion per run: success / failure / cancelled / skipped.
Defect tracker - query open + closed-in-window issues tagged with the team's "bug" / "defect" label:
gh issue list --repo <owner>/<repo> --label bug --state all \
--json number,title,state,createdAt,closedAt,labelsFilter to issues createdAt within the window (new escapes) and issues closedAt within the window (resolved). If the tracker is Jira or Linear, read the exported file with Read / Grep.
Flake-quarantine list - read the quarantine manifest:
# typical path; adjust per repo convention
Glob plugins/*/skills/flaky-test-quarantine/quarantine.json
Read <manifest-path>Count entries quarantined for more than 14 days as "stale quarantine" (flake debt). Count entries added in the window as "new flakes."
Step 2 - Compute metrics
Pass-rate trend
pass_rate = successful_runs / (successful_runs + failed_runs)Exclude cancelled and skipped from the denominator (they don't tell you about quality). Compute for the current window and the prior window; the delta is the trend arrow.
Escape-defect rate
Count issues labelled bug that were createdAt within the window and whose fix was merged after the feature was already deployed (i.e., they reached production). This is the escape count for the window.
The concept of an "escape defect" - a defect that reached production despite the existing test suite - is defined and classified in the in-repo escape-defect-analyzer (test gap / process gap / tooling gap). This digest counts escapes; the analyzer does the root-cause work. Do NOT attribute an escape-defect-rate definition to DORA - DORA metrics are delivery metrics, not defect-leakage metrics.
escape_rate = escapes_in_window / deployments_in_windowIf deployment count is unavailable, express as raw escape count with the caveat noted in the output.
Flake debt
flake_debt_score = (stale_quarantine_count * 2) + new_flakes_in_windowThe weight of 2 on stale entries reflects that a long-lived quarantine entry represents a test gap that silently widens with each sprint. This weight is a configurable team default, not an authoritative number.
Delivery-health context (DORA five metrics)
Optionally map CI data to DORA's current five software delivery performance metrics as described at dora.dev/guides/dora-metrics-four-keys/:
DORA groups these as throughput metrics (change lead time, deployment frequency, failed deployment recovery time) and instability metrics (change fail rate, deployment rework rate) dora. Map pass_rate trend and deployment frequency from the CI run data. Note in the digest that full DORA computation requires data beyond CI runs alone (e.g., commit timestamps, incident records).
Step 3 - RAG per area
Apply red / amber / green thresholds. The defaults below are configurable starting points, not authoritative benchmarks - teams must calibrate to their own baseline.
| Area | Green | Amber | Red |
|---|---|---|---|
| Pass rate (current window) | ≥ 90% | 75% - 89% | < 75% |
| Pass rate trend (delta vs prior window) | ≥ 0 pp | -5 to -1 pp | < -5 pp |
| Escape-defect count (window) | 0 | 1 | ≥ 2 |
| Stale quarantine entries (> 14 days) | 0 | 1 - 3 | ≥ 4 |
| New flakes this window | 0 | 1 - 2 | ≥ 3 |
Record which threshold file was used (or "defaults") in the digest header so reviewers know the basis.
Output format
Emit a single markdown file: docs/quality-digest/<YYYY-MM-DD>.md.
# Quality digest — <YYYY-MM-DD> — <repo>
**Window:** <start> to <end> | **Threshold basis:** <file or "defaults">
## Summary
| Area | Status | Metric | Trend |
|---|---|---|---|
| CI pass rate | 🟢 GREEN | 94% | +2 pp vs prior week |
| Escape defects | 🟡 AMBER | 1 escape | — |
| Flake debt | 🔴 RED | 5 stale + 2 new flakes | +3 entries |
## CI pass rate
- **This window:** 94% (47 / 50 runs) - source: `gh run list` output
- **Prior window:** 92% - trend: +2 pp ↑
- **Failed runs:** run IDs <list> - link each to `gh run view <id>`
## Escape defects
- **Escapes this window:** 1 (issue #<N>: <title>)
- **Escape rate:** 1 / <deployment count> deployments _(If deployment count unavailable: raw count = 1; denominator unknown)_
- **For root-cause analysis** of this escape → hand off to
[`escape-defect-analyzer`](../../qa-bug-repro/agents/escape-defect-analyzer.md)
## Flake debt
- **Stale quarantine (> 14 days):** 5 entries (IDs: <list>)
- **New flakes this window:** 2 entries
- **Flake debt score:** (5 × 2) + 2 = 12 _(weight=2 is a configurable default)_
- **For deep triage** → hand off to
[`e2e-test-trend-reporter`](../../qa-flake-triage/agents/e2e-test-trend-reporter.md)
or [`ai-flake-detector`](../../qa-flake-triage/agents/ai-flake-detector.md)
## Delivery-health context (DORA)
_(Partial - full DORA computation requires commit-timestamp + incident data)_
- **Deployment frequency:** <N> deployments in window
- **Change fail rate:** <X>% of deployments failed (maps to DORA instability)
- See [dora.dev][dora] for full metric definitions and benchmarks.
## Top risks
1. <risk> — area: <area> — owner: <team>
2. ...
## Open items
- <any metric that could not be computed, with reason>