Testland

Flaky Playwright Tests: Five Root Causes and Fixes

TestlandJune 9, 2026

Five root causes of flaky Playwright tests with code fixes: auto-wait limits, non-retrying assertions, brittle locators, network races, and hard sleeps.

The five root causes of Playwright test flakiness: auto-wait limits, non-retrying assertions, brittle locators, network races, and hard sleeps. Equal segments, drawn without ranking any cause by frequency.

The test that passes every time locally and fails three runs out of five in CI is a classic Playwright mystery. The usual suspect is "timing," but the diagnosis stops there and the fix never comes. Part of the confusion is that Playwright's auto-wait creates a false sense of immunity: if the framework handles waiting automatically, how can timing tests still flake? It handles element actionability, not network state or app logic.

For detection and categorization across any framework, fixing flaky tests: a systematic approach covers the full diagnostic process. This post is the Playwright-specific fix list.

Prerequisites

These fixes assume an existing Playwright suite with intermittent failures. Examples were tested with Playwright 1.60 and TypeScript 5.x, with CI assumed. If you're starting from scratch, see the Playwright TypeScript setup guide first.

Auto-wait covers actionability, not app state

Playwright's actionability checks confirm that an element is visible, stable, able to receive events, and enabled before acting on it. For a click, Playwright will wait until all four conditions are true. That's a lot of protection against DOM timing.

What it doesn't cover: network responses, in-flight state updates, server-side rendering cycles, or any app-level logic that runs after the element becomes visible. A button can be fully actionable while the cart data behind it is still loading.

force: true is the tell. When a test passes only with force: true, the check being skipped was the safety net warning you the element wasn't truly ready. The fix isn't to disable the check:

// Playwright 1.60 - the "fix" that hides the problem
await page.getByRole('button', { name: 'Checkout' }).click({ force: true });
// Playwright 1.60 - wait for readiness explicitly, then act
const checkout = page.getByRole('button', { name: 'Checkout' });
await expect(checkout).toBeEnabled();
await checkout.click();

Low-level actions like focus, press, and dispatchEvent also skip actionability checks by design. Use them deliberately, not as workarounds for timing problems.

Non-retrying assertions fail on slow renders

Playwright's assertion docs are direct: "using non-retrying assertions can lead to a flaky test." The best practices page puts it just as plainly: "Don't use manual assertions that are not awaiting the expect."

The distinction matters. Web-first assertions like expect(locator).toBeVisible() poll until the condition is true or the timeout (default 5s) expires. locator.isVisible() evaluates exactly once, at whatever moment it's called. If the element takes 200ms to render after a state change, the snapshot assertion will fail intermittently depending on CI load.

// Playwright 1.60 - evaluates once, at exactly the wrong moment
const visible = await page.getByText('Order confirmed').isVisible();
expect(visible).toBe(true);
// Playwright 1.60 - polls until the condition is met or times out
await expect(page.getByText('Order confirmed')).toBeVisible();

For conditions that aren't covered by a built-in web-first assertion, expect.poll accepts any async function and retries it on the same schedule.

CSS and XPath selectors break when the DOM changes

The Playwright locators docs are explicit: "CSS and XPath are not recommended as the DOM can often change leading to non resilient tests." The best practices section reinforces it: prefer user-facing attributes to XPath or CSS selectors.

The failure mode is predictable. A CSS selector like .checkout-form > div:nth-child(3) .submit-btn encodes the entire structural path to a button. One design change (a wrapping div added, a class renamed, a section reordered) and the selector silently targets nothing, or worse, the wrong element.

// Playwright 1.60 - breaks the next time a designer touches this component
await page.locator('.checkout-form > div:nth-child(3) .submit-btn').click();
// Playwright 1.60 - survives DOM refactors as long as the button is still a button
await page.getByRole('button', { name: 'Place order' }).click();

getByRole ties the test to the accessible contract: the ARIA role and the user-visible label. That contract tends to stay stable even when the underlying markup changes. It also doubles as an accessibility check. If the role or label breaks, so does the locator, which is the right behavior.

Assertions that race async data

A click triggers an API call. The assertion fires before the response arrives. This is one of the most common flake patterns in Playwright suites, and the fix is a deterministic sync point: wait for the response before asserting on its contents.

The waitForResponse pattern requires setting up the promise before the action that triggers the request. Set it up after, and you'll miss the response on fast connections:

// Playwright 1.60 - the assertion races the API response
await page.getByRole('button', { name: 'Load items' }).click();
await expect(page.getByRole('listitem')).toHaveCount(5);
// Playwright 1.60 - deterministic sync point: response first, assertion second
const itemsResponse = page.waitForResponse('**/api/items');
await page.getByRole('button', { name: 'Load items' }).click();
await itemsResponse;
await expect(page.getByRole('listitem')).toHaveCount(5);

toHaveCount is itself a retrying assertion, so both layers contribute: waitForResponse anchors the test to the network event, and toHaveCount handles any remaining render delay. For third-party endpoints, page.route() mocking removes external variability entirely.

One pattern to avoid: networkidle as a readiness signal. The Playwright API reference flags it directly: "Don't use this method for testing, rely on web assertions to assess readiness instead."

Hard sleeps and real-time logic

page.waitForTimeout carries an explicit Discouraged warning in the Playwright API reference: "Never wait for timeout in production. Tests that wait for time are inherently flaky." The method still exists and won't throw, which is exactly why it persists in codebases long after teams stop intending to use it. Sleep-and-pray passes on a fast local machine and fails under CI load, where containers compete for CPU and a 3000ms sleep can end well before the thing you were waiting for actually happened.

Replace sleeps with web-first assertions wherever possible. For tests that depend on timers, sessions, or date-dependent logic, the Clock API is the right tool: "Utilizing Clock functionality allows developers to manipulate and control time within tests."

// Playwright 1.60 - passes on your laptop, fails under CI load
await page.waitForTimeout(3000);
await expect(page.getByText('Session expired')).toBeVisible();
// Playwright 1.60 - control time instead of waiting for it
await page.clock.install();
await page.clock.fastForward('30:00');
await expect(page.getByText('Session expired')).toBeVisible();

This removes the ambient dependency on wall-clock time entirely. The test is now deterministic regardless of where it runs.

Retries and traces: contain the flake while you fix it

Retries are containment, not cure. They keep the build green while the root cause is diagnosed, but they don't fix the flake. Playwright's test retries docs are clear: "When enabled, failing tests will be retried multiple times until they pass."

The HTML report marks a test that passed after a retry as "flaky" (distinct from "failed"). That distinction matters for tracking which tests still need attention.

One caveat: tests inside test.describe.serial blocks restart the entire group on retry, not just the failing test.

Pair retries with traces. The trace viewer docs recommend: "Traces should be run on continuous integration on the first retry of a failed test by setting the trace: 'on-first-retry' option." This gives a full timeline of the failure without generating trace artifacts on every passing run.

// playwright.config.ts - Playwright 1.60
import { defineConfig } from '@playwright/test';

export default defineConfig({
  retries: process.env.CI ? 2 : 0,
  use: { trace: 'on-first-retry' },
});

For uploading trace artifacts on failure, the GitHub Actions setup guide covers the workflow configuration.

Where cross-test flake actually comes from

Playwright's browser contexts give each test isolated state for free: "Playwright creates a context for each test," and contexts are "equivalent to incognito-like profiles." So when tests still affect each other despite that isolation, the browser isn't the source. Look at backend data and shared fixtures instead.

A test that creates a user record and doesn't clean up will affect any test that queries the same table. The systematic approach post covers data isolation patterns in detail.

Common questions about Playwright flakiness

Does Playwright retry assertions automatically?

It depends which assertion you use. Web-first assertions (expect(locator).toBeVisible(), expect(locator).toHaveCount(), etc.) poll until the condition is true or the timeout expires. That's the retrying behavior. Locator query methods like locator.isVisible() and locator.textContent() evaluate once and return immediately. If you're calling one of those inside a manual expect, there's no retry, and timing windows will produce flakes.

When does force: true make sense?

Rarely. The legitimate cases are narrow: decorative overlays that intentionally intercept pointer events but don't affect the underlying element's functionality, or custom focus-trap components where you need to fire an event without the element being in the viewport. Outside those edge cases, force: true is a code smell. It silences the safety net instead of fixing the timing problem the safety net detected.

How do you find which tests are flaky across a long CI history?

Start with trace: 'on-first-retry' in the CI config and check the HTML reporter's flaky status. It distinguishes a test that failed and was retried from one that failed outright. The trace gives a frame-by-frame record of what happened on the failing run. For trend tracking across many runs and identifying which tests are consistently unreliable over time, see the test observability metrics guide.

Getting started

The five patterns above cover most of what causes Playwright flakiness in practice. Here's the quickest path to measurable improvement:

  1. Turn on retries: 2 and trace: 'on-first-retry' in your CI config. This gives you visibility before you change anything else.
  2. Grep the suite for force: true, waitForTimeout, and isVisible() used inside a manual expect. Each is a direct replacement target using the patterns above.
  3. Convert the worst structural CSS selectors to getByRole or getByLabel. Prioritize the ones that have already failed in CI.
  4. Re-run the flagged tests 10 times and compare the flaky count before and after.

Playwright's Test Agents include a healer that "executes the test suite and automatically repairs failing tests." Selector-level fixes are starting to automate. The timing-design problems above remain on you for now.

For the full diagnostic framework across frameworks, start with fixing flaky tests: a systematic approach. For measuring how your flake rate trends over time, test observability metrics covers the instrumentation side.