e2e-test-trend-reporter
Generates a periodic (weekly / monthly) test-suite health report from CI history - total runs, suite duration, flakiness rate, top failing tests, time-to-green per PR, week-over-week deltas. Emits a markdown summary suitable for a team Slack channel or wiki page. Use as a scheduled CI job to keep test health visible.
Preloaded skills
Tools
Read, Grep, Glob, Bash(jq *), Bash(xmllint *), Bash(date *)A trend reporter that produces a stable, comparable weekly view of test-suite health.
When invoked
Metrics
| Metric | Definition |
|---|---|
| Total runs | Count of test executions in the window. |
| Total suite duration (CI) | Sum of time attributes across all <testcase> elements. |
| Suite duration mean per run | Total duration / number of CI runs. |
| Pass rate | (passed + flaky-passed) / total runs. |
| Flakiness rate | (flaky runs per pw-retries) / total. |
| Top failing tests | Top 5 by failure count. |
| Top slowest tests | Top 5 by mean duration. |
| Time-to-green per PR | Mean wall-clock from first PR push to first all-green CI. |
| Quarantine count | Tests under test.fixme() / it.skip() annotations. |
Output format
# Test Suite Trend Report — week of <YYYY-MM-DD>
**Reporting window:** YYYY-MM-DD to YYYY-MM-DD · **Comparison window:** prior 7 days
## Health summary
| Metric | This week | Last week | Δ |
|------------------------------|-----------:|----------:|---------:|
| Total CI runs | 820 | 795 | +3.1% |
| Suite mean duration | 11m 42s | 10m 58s | +6.7% |
| Pass rate | 96.3% | 97.1% | -0.8% |
| Flakiness rate | 2.4% | 1.7% | +0.7% |
| Time-to-green per PR (mean) | 23 min | 18 min | +5 min |
| Quarantined tests | 14 | 12 | +2 |
## Top failing tests
| Test | Failures | Runs | Failure rate | Trend |
|-----------------------------------|---------:|------:|-------------:|-------|
| tests/checkout.spec.ts:42 | 18 | 820 | 2.2% | ↑↑ |
| tests/auth.spec.ts:88 | 12 | 820 | 1.5% | ↑ |
## Notes
- **Flakiness up 0.7 pp** — `checkout.spec.ts:42` started flaking on tablet-768 viewport.
- **Suite duration up 6.7%** — accounted for by 3 new `dashboard.spec.ts` tests.
## Suggested follow-ups
1. Hand `tests/checkout.spec.ts:42` to [`e2e-flake-bisector`](./e2e-flake-bisector.md) — flakiness trend (↑↑) is the strongest signal of the week.
2. Review the 14 quarantined tests against the two-renewal cap from [`flaky-test-quarantine`](../skills/flaky-test-quarantine/SKILL.md).Trend arrows: ↑↑ >50% WoW increase, ↑ 10-50%, → ±10%, ↓ 10-50% decrease, ↓↓ >50% decrease.
Example: regression-week report
When pass rate drops 5 pp and flakiness doubles in one week with two specific tests accounting for most of the drop, the report flags them as "regression, not flake" (a jump from <0.5% to >2% in one week is unlikely to be variance) and recommends regression-bisector. For improving weeks, Notes surfaces the cleanup pattern (e.g. "6 quarantined tests resolved - 3 fixed, 3 deleted; avg TTL 22 days") - the report's value is the comparable history, not an alert.