flake-pattern-reference
Reference catalog of flake patterns - async/timing, test ordering, shared parallel state, resource leaks, network, locator drift, environment variance, randomness - with detection heuristics and remediation per pattern. Use when triaging an unknown flake to identify the category before bisecting.
flake-pattern-reference
Terminology note: "flaky test" is a practitioner-emergent term popularized by the Google Testing Blog (google-causes, google-flaky); ISTQB does not maintain a canonical entry. This catalog reflects industry-engineering consensus, not ISTQB authority.
A flake is rarely random - it almost always falls into one of eight recurring patterns. Identifying the pattern early shrinks the bisect search space dramatically. This catalog is a reference, not a workflow; the matching workflow is in flaky-test-quarantine, and the agent that drives a structured bisect is e2e-flake-bisector.
The Google Testing Blog observed a near-linear correlation between test size and flakiness rate across ~4.2M tests (google-causes) - larger tests touch more of the eight patterns at once.
Pattern 1: async / timing
The most common flake category in UI and integration tests.
| Signal | What's happening |
|---|---|
| Fails ~5 - 20% of runs; passes when the machine is faster | Test waits for an arbitrary setTimeout(N) instead of a deterministic event. |
| Fails on CI but never locally | CI runners have different cold-start timings than dev laptops. |
| Fails after a dependency upgrade with no test code change | Library's internal timing changed (e.g. Playwright auto-wait window). |
Remediation:
Pattern 2: test ordering
Tests pass alone, fail when run with siblings.
| Signal | What's happening |
|---|---|
npm test -- --testNamePattern='^X$' passes; full run fails | Test relies on state from a previously-run test. |
| Adding a new test breaks an unrelated existing one | Implicit ordering dependency exposed by the new test pushing the old test into a different position. |
Random-order test runners (Jest randomize) flag the suite | Suite is order-dependent. |
Remediation:
Pattern 3: shared parallel state
Tests pass sequentially, fail when run in parallel workers.
| Signal | What's happening |
|---|---|
Fails ~50% of runs in CI matrix; passes locally with -j 1 | Two workers writing to the same DB row / file / port. |
| Fails more often as worker count goes up | Linear shared-state contention. |
| Error message mentions "duplicate key" / "address in use" / "file already exists" | Direct collision evidence. |
Remediation:
Pattern 4: resource leaks
Tests pass on a fresh machine, fail after the test process has run for hours.
| Signal | What's happening |
|---|---|
| Fails increasingly often as suite duration grows | Memory or file-descriptor leak in the test setup. |
EMFILE / EADDRINUSE errors mid-suite | File-descriptor or port exhaustion. |
| Long-running processes (Playwright browsers, Cypress runners) crash mid-suite | Process accumulating zombies. |
Remediation:
Pattern 5: network / external service
Tests pass when the upstream is healthy, fail otherwise.
| Signal | What's happening |
|---|---|
| Fails on the same handful of tests that hit the same external URL | Real network call to a flaky third party. |
| Fails right after a deploy of a non-test service | Test is hitting prod / staging of a sibling service. |
ETIMEDOUT / ECONNRESET in error logs | Network-layer error, not test-logic error. |
Remediation:
Pattern 6: locator drift
UI tests pass when the page looks one way, fail when it shifts.
| Signal | What's happening |
|---|---|
| Fails after an unrelated CSS change | Selector matched by position rather than identity. |
selector matched 2 elements errors | Ambiguous selector now matches more than one node. |
| Fails only at certain viewports | Layout shifts cause mobile / desktop selectors to differ. |
Remediation:
Pattern 7: environment variance
Tests pass on Linux CI, fail on macOS dev machines (or vice versa).
| Signal | What's happening |
|---|---|
| Fails only on a specific CI runner / OS | OS-specific path separator, line ending, or filesystem case sensitivity. |
| Snapshot tests fail with sub-pixel diffs across OS | OS font / anti-aliasing differences (see playwright-snapshots). |
Fails in tz configurations not set to UTC | Timezone-sensitive assertion. |
Remediation:
Pattern 8: randomness
Tests use random data without a controlled seed.
| Signal | What's happening |
|---|---|
| Failures don't reproduce on retry | Test data was randomized; the failing combination is gone. |
| Test asserts a property that holds "almost always" | Property-based test exposing a real edge case (this is good - fix the production bug). |
| Faker-generated data triggers a layout overflow | Random string longer than the assertion expected. |
Remediation:
Triage decision tree
Test fails ~50% of runs?
├── Yes → likely "shared parallel state" or "test ordering"
└── No → fails ~5–20% of runs?
├── Yes → likely "async/timing" or "network"
└── No → fails only on specific OS / runner?
├── Yes → "environment variance"
└── No → fails after long suite duration?
├── Yes → "resource leaks"
└── No → fails after unrelated UI change?
├── Yes → "locator drift"
└── No → does the test use random data?
├── Yes → "randomness"
└── No → bisect with `e2e-flake-bisector`For systematic bisection, hand the test off to the e2e-flake-bisector agent, which varies one axis at a time per the patterns above.