flaky-test-quarantine
Builds a quarantine workflow for flaky tests - marks the test with the framework's skip/fixme/retry annotation, records the failure-rate observation and a bisect link in the annotation body, sets an auto-expiry date, and produces a CI report listing every quarantined test that has expired and needs re-evaluation. Use when a flaky test is blocking the trunk and must be removed from the gating path without losing track of it.
flaky-test-quarantine
Overview
A "flaky test" is a test that produces inconsistent pass/fail results across runs without an underlying code change (google-flaky). Industry consensus from Google Testing Blog and similar practitioner-engineering sources is that flaky tests should be isolated from the gating path rather than left to mask real regressions or be silently ignored (google-flaky).
Terminology note: "flaky test" is a practitioner-emergent term popularized by the Google Testing Blog. ISTQB does not maintain a canonical entry for it. This skill cites industry-engineering sources, not ISTQB authority.
This skill defines a quarantine workflow with five required parts:
When to use
If the test fails 100% of the time after a code change, it's a regression - use regression-bisector and fix, do not quarantine.
Step 1 - Mark the test
Playwright
test.fixme() is the canonical Playwright primitive for "this test is broken; do not run past this point" (pw-test):
test('checkout flow flaky test', async ({ page }) => {
test.fixme(
true,
'Quarantined 2026-05-04 (#1234) — fails ~12% of runs on tablet-768; bisect inconclusive. Re-evaluate by 2026-06-04.',
);
// ... test body, no longer runs
});test.fixme(condition, description) skips with the description visible in the report. Unlike test.skip(), fixme carries the explicit "this needs to be fixed" intent (pw-test).
If the goal is to allow retries before quarantining, use the retries config first (pw-retries):
// playwright.config.ts
export default defineConfig({
retries: process.env.CI ? 2 : 0,
});A test that passes on retry is reported with the flaky status (distinct from passed and failed); track these separately - flaky-but-passing tests are quarantine candidates, not yet quarantined (pw-retries).
Cypress
Cypress configures retries at the suite level via Cypress.config('retries', { runMode: 2, openMode: 0 }). For quarantining individual specs, use it.skip(...) or the cypress-grep plugin's tagging convention.
Jest / Vitest
test.skip(...) and test.todo(...) are the canonical primitives. For periodic auto-evaluation, use test.skip.if(condition) patterns or introduce a project-specific tagging convention parsed by your CI.
JUnit / TestNG (JVM)
JUnit 5: @Disabled("Quarantined 2026-05-04 (#1234) — ..."). TestNG: @Test(enabled = false, description = "..."). For per-method retries before quarantine, JUnit 5's @RetryingTest(N) extension and TestNG's @Test(retryAnalyzer = ...).
Step 2 - Annotate with failure rate + bisect link + expiry
The annotation body is the load-bearing part of the workflow. Every quarantine record carries:
| Field | Required | Format |
|---|---|---|
| Date | yes | YYYY-MM-DD of the quarantine. |
| Issue link | yes | #1234 or full URL - links a tracked ticket. |
| Failure rate | yes | ~12% of runs - measured, not guessed. |
| Bisect status | yes | bisect inconclusive / bisected to commit abc1234 / not yet bisected. |
| Re-evaluate by | yes | YYYY-MM-DD - the auto-expiry date. |
| Owner | optional | @team-handle for routing. |
The format is parseable by the re-evaluation report (Step 4):
Quarantined 2026-05-04 (#1234) — fails ~12% of runs on tablet-768;
bisect inconclusive. Re-evaluate by 2026-06-04. Owner: @web-platform.Step 3 - Auto-expiry
Default TTL: 30 days. Picked because:
Adjust per project:
Step 4 - Re-evaluation report
A nightly (or weekly) CI job greps all quarantine annotations, extracts the Re-evaluate by date, and lists expired entries. A minimal Bash version against a Playwright suite:
#!/usr/bin/env bash
# scripts/list-expired-quarantines.sh
set -e
TODAY=$(date -u +%Y-%m-%d)
grep -rn -B1 -A5 "test\.fixme(" tests/ \
| awk '/Re-evaluate by/ { print FILENAME ":" $0 }' \
| while IFS= read -r line; do
EXPIRY=$(echo "$line" | grep -oE 'Re-evaluate by [0-9]{4}-[0-9]{2}-[0-9]{2}' | awk '{print $3}')
if [[ "$EXPIRY" < "$TODAY" ]]; then
echo "EXPIRED: $line"
fi
doneRun it as a scheduled GitHub Action and post the output to a Slack channel or open a tracking issue per expired entry.
Step 5 - Pruning rules
When a re-evaluation expires, the team has three options:
| Outcome | Action |
|---|---|
| Underlying issue fixed | Remove test.fixme() and re-run; close the issue. |
| Underlying issue still present | Renew the quarantine for one more TTL with updated annotation; never more than two consecutive renewals - at that point, delete the test or rewrite it. |
| The test is no longer relevant | Delete the test outright; close the issue. |
The two-renewal cap is the lever that prevents quarantine from becoming a permanent dead-letter. Past two renewals, the team has either lost interest in the assertion or the test is fundamentally unfixable - both signal "delete."
CI integration
# .github/workflows/quarantine-report.yml
name: quarantine-report
on:
schedule:
- cron: '0 9 * * 1' # Mondays 09:00 UTC
workflow_dispatch:
jobs:
list-expired:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- name: List expired quarantines
run: bash scripts/list-expired-quarantines.sh > expired.txt
- name: Open tracking issue per expired entry
if: ${{ hashFiles('expired.txt') != '' }}
run: |
while IFS= read -r line; do
gh issue create --title "Expired quarantine: ${line%%:*}" --body "$line"
done < expired.txt
env:
GH_TOKEN: ${{ github.token }}