Testland
Browse all skills & agents

test-isolation-patterns

Pure reference catalog of test-isolation and fixture-lifecycle patterns - fixture scope (per-test / per-describe / shared / global), Meszaros's four-phase test pattern, Fowler's Fresh-Fixture-vs-Shared-Fixture trade-off, database isolation (transaction-rollback / database-per-worker / template-database), parallel-safety patterns, and cleanup discipline (afterEach / afterAll / tagged-cleanup). Distinct from `test-code-conventions` §6 (file-level fixture coupling rule) - this catalog is the architecture-tier reference. Preloaded by `framework-architecture-auditor` to anchor the §A3 fixture-coupling and §A6 retry/wait audits.

test-isolation-patterns

Overview

A test that fails sometimes for non-obvious reasons is non-deterministic. Per Martin Fowler - Eradicating Non-Determinism in Tests: "A test is non-deterministic when it passes sometimes and fails sometimes, without any noticeable change in the code, tests, or environment… Once you start ignoring a regression test failure, then that test is useless and you might as well throw it away." The dominant cause is broken isolation - one test affecting another, the environment leaking, fixtures sharing state. This catalog is the canonical reference for the isolation patterns that prevent it.

This skill is a pure reference - no execution steps. It is the catalog the framework-architecture-auditor cites when auditing fixture coupling (§A3), retry/wait policy consistency (§A6), and CI integration health (§A8). It complements test-code-conventions §6 (which is the file-level rule against global-fixture hubs) with the cross-cutting architecture patterns. It also complements flake-pattern-reference which catalogs flake symptoms; this skill catalogs the prevention patterns.

When to use

  • Designing a new framework - pick the fixture scope and isolation strategy.
  • Auditing an existing framework where flake-rate is rising (broken isolation drives the concurrency and test-order-dependency flake categories, together about a third of flakes per Luo et al. 2014).
  • Migrating from sequential to parallel execution - the parallel-safety patterns become load-bearing.
  • Refactoring fixture inheritance chains - apply the cleanup-discipline patterns.

Pattern 1 - The four-phase test pattern

Canonical source: Gerard Meszaros - xUnit Test Patterns: Refactoring Test Code (2007). Referenced in the Wikipedia entry on test fixture.

Every test has four phases:

PhaseWhat
1. SetupEstablish the pre-conditions / fixture
2. ExerciseInteract with the System Under Test
3. VerifyDetermine whether the expected outcome was obtained
4. TeardownReturn to a clean state

Phases 1 and 4 together are fixture management. Patterns 2-6 below cover how to do them safely.

Pattern 2 - Fixture scope

The framework's test runner offers three or four scopes; the team picks the tightest scope that meets the constraint.

ScopeLifecycleUse when
Per-test (function-scoped)Setup before each test; teardown after eachDefault. Maximally isolated. Slowest. Always parallel-safe.
Per-describe (class / module-scoped)Setup before the first test in the group; teardown after the lastSetup is expensive and the group of tests genuinely shares it (read-only)
Shared (session / worker-scoped)Setup once for the whole run; teardown at endSetup is unaffordable per-describe (e.g., spinning up a Docker stack) and the tests don't mutate it
Global (module-loading)Setup at module-import time; no teardownAnti-pattern in nearly all cases. Use only for truly immutable language-level fixtures (constants, configuration).

The single rule that prevents most flake: never share mutable fixtures across tests. If a fixture is mutated by any test, it must be per-test scoped.

Framework-specific scope syntax (illustrative; cite the per-framework skill for tool-specific details)

  • Jest / Vitest: beforeEach / beforeAll (per-describe by default within a describe block).
  • Playwright Test: test.beforeEach / test.beforeAll; test.use({}) for per-test config; fixtures via test.extend().
  • pytest: @pytest.fixture(scope="function" | "class" | "module" | "session").
  • JUnit 5: @BeforeEach / @BeforeAll; @TestInstance(Lifecycle.PER_CLASS).
  • RSpec: before(:each) / before(:all).

Anti-patterns

Anti-patternWhy it fails
Per-describe fixture that any test in the describe mutatesOne test fails; the next "starts" from the mutated state
Shared fixture mutated through a leaky abstraction (e.g., factory returns a shared object)Cross-test mutation without an obvious culprit; flake follows
Per-test scope for genuinely expensive setup (a 30s Docker spin-up per test)Suite time explodes; team skips tests
Global fixture for anything that has stateCannot reset between test runs; CI run pollutes the next run
Inheritance hierarchy of fixtures (BaseTestAppTestDomainTestSpecificTest)Per framework-architecture-auditor §A2, depth-3+ chains break unpredictably

Pattern 3 - Fresh Fixture vs Shared Fixture trade-off

Canonical source: Martin Fowler - Eradicating Non-Determinism in Tests.

Fowler's framing: "I prefer the former [Fresh Fixture], as it's often easier - and in particular easier to find the source of a problem… [but] rebuilding the database each time can add a lot of time to test runs, so that argues for switching to a clean-up strategy."

ApproachSetup costIsolationWhen
Fresh Fixture (rebuild from scratch every test)HighMaximumDefault; use unless measured slow
Cleanup strategy (preserve the fixture, undo changes at teardown)LowStrong if cleanup is comprehensiveWhen Fresh Fixture's cost is prohibitive
Persistent Fresh Fixture (fresh per test, persisted via transaction-rollback)LowMaximumThe pragmatic middle for DB-backed tests

The transaction-rollback pattern (Persistent Fresh Fixture): Begin a transaction at test start; do all the test's DB work inside it; rollback at test end. The database is materially unchanged across tests. The pattern works for any DB that supports transactions; integration-test frameworks like DatabaseCleaner (Ruby), pytest-django's db fixture, Spring's @Transactional test annotation all implement it.

Anti-patterns

Anti-patternWhy it fails
Fresh Fixture that takes 60+ seconds per testSuite time becomes prohibitive; team skips tests
Cleanup strategy that misses one mutation surface (cache; queue; file system)Cross-test coupling through the missed surface
Transaction-rollback that doesn't actually rollback (autocommit, DDL changes)Silent state leakage
Shared Fixture documented as "immutable" but tests mutate it anywayThe documentation is unverified; flake follows

Pattern 4 - Database / external-store isolation

The dominant source of test flake at scale. Five canonical strategies, each with trade-offs.

4a - Transaction-rollback (the default)

Each test runs in a transaction; teardown rollbacks. Works for: relational DBs with full transaction support. Doesn't work for: DDL changes, multiple DB connections, queues, caches.

4b - Database-per-test-worker

Each parallel worker gets its own database (named app_test_worker_1, app_test_worker_2, etc.). Created once at startup; reused across tests within the worker; dropped at suite end. Works for: parallel execution with mutation-heavy tests. Cost: pre-suite setup time + N× DB storage.

4c - Template database / pristine clone

Pre-create a template database with seed data; clone it per test (or per worker). PostgreSQL's CREATE DATABASE … TEMPLATE template_db is the canonical mechanism. Works for: tests needing complex seed state. Cost: template maintenance.

4d - Containerised DB-per-test

Each test gets a fresh Docker container (Testcontainers is the canonical library). Maximum isolation; highest cost. Works for: integration tests where the DB version / extensions / config matter. Don't use for: unit tests.

4e - In-memory substitution

Use SQLite in-memory instead of the production DB engine. Fast; works for simple SQL. Doesn't work for: production-specific features (PostgreSQL JSON, Postgres extensions, MySQL spatial types). Cited as an anti-pattern by Fowler on integration tests when the production engine has features the in-memory substitute lacks.

Anti-patterns

Anti-patternWhy it fails
Tests that mutate a shared DB without isolationCross-test coupling; the dominant source of flake at scale
In-memory substitution masking production-engine differencesTests pass locally; fail in production
Transaction-rollback for tests that do DDL (CREATE TABLE in test)DDL is auto-commit in most engines; rollback doesn't undo it
Database-per-worker without a maximum-worker limitStorage explodes; CI cost surges
Containerised DB-per-test for unit tests5-second container startup × 1000 unit tests = unworkable

Pattern 5 - Parallel safety

Canonical source: Fowler - Eradicating Non-Determinism in Tests on isolation as the parallel-safety prerequisite, plus Luo et al. FSE 2014, which attributes 20% of flakes to concurrency problems (race conditions and deadlocks).

Parallel execution magnifies every isolation bug. The patterns that make parallel safe:

PatternWhat it does
Worker-scoped fixturesEach parallel worker has its own state (DB, file system path, port range)
Unique identifiers per testTest names, file paths, generated IDs include the worker ID (worker_${WORKER_ID}_user_${TEST_ID})
Ephemeral output pathsTests write to tmp/${WORKER_ID}/${TEST_ID}/ and clean up at teardown
Port range allocationEach worker gets a port range (30000 + WORKER_ID * 100) to avoid binding conflicts
No global singletonsNo process.env writes, no global config mutation, no static state
Idempotent setupRe-running the setup produces the same state (so a flaky-and-retried test isn't tainted)

Anti-patterns

Anti-patternWhy it fails
process.env.X = "..." in a test (writes to a shared global)Worker N's env-write affects worker M's reads
Hard-coded port 3000 in tests (port collisions)First worker binds; others fail
Tests writing to /tmp/test.log (path collision)Workers stomp each other's files
Test-name-based DB seeding (collides across workers if names overlap)Cross-worker state pollution
Per-test setup that does setTimeout / sleep to "let things settle"Flake source: async-wait is the largest flake category at 45% per Luo et al. 2014; use proper event-based synchronisation

Pattern 6 - Cleanup discipline

Canonical source: Meszaros's xUnit Test Patterns (2007) - the Garbage-Collected Teardown vs In-line Teardown vs Implicit Teardown vs Setup Decorator patterns.

The four canonical cleanup approaches:

PatternMechanism
In-line TeardownEach test explicitly cleans up at end (last line of the test body)
Implicit TeardownafterEach / afterAll hooks the runner calls automatically
Garbage-Collected TeardownCleanup happens when the language's GC reclaims the fixture (typed in C# / Java with IDisposable / AutoCloseable)
Tagged CleanupFixture registers itself with a "cleanup queue" at setup; queue drains at suite end

Rule: Implicit Teardown via the runner's afterEach hook is the default. In-line Teardown is acceptable when the cleanup is specific to one test. Tagged Cleanup is for fixtures whose lifetime is variable (held across multiple tests, then released).

Anti-patterns

Anti-patternWhy it fails
No teardown ("the next test will clean up")Failing test orphans state; the next test fails too
Teardown that swallows errors silentlyReal cleanup failures are invisible; flake follows
Teardown that depends on test-pass state (if (test.passed) cleanup())Failing tests don't clean up; cascading flake
Teardown order-dependent on setup orderRefactoring setup breaks teardown

Pattern 7 - Network / external-service isolation

Tests should not depend on external services they don't control. Three patterns:

PatternWhen
Stub (canned response)The test doesn't care about the network; use a stub library (nock, WireMock, Mountebank, msw-handlers, wiremock-stubs, mountebank-imposters)
Contract testThe test cares whether the service contract holds; use Pact or schemathesis
Real network call in a controlled environmentSmoke / canary test in a staging tier with a dedicated test partition

Anti-patterns

Anti-patternWhy it fails
Unit tests calling the real external APITests fail when the API is down; tests pass when the API silently changes
Stubs that drift from production response shapeTests pass with stubs that don't match reality
One global stub for the whole suiteTests cross-couple through the stub configuration
Contract test with no contract refreshStub goes stale; tests pass while production breaks

Cross-cutting anti-patterns

Anti-patternWhy it fails
Implicit ordering (test B depends on test A's side effects)Per Fowler: "isolation… gives you more flexibility in running subsets of tests and parallelizing tests." Ordering breaks both.
Tests that "sleep until it works"Timing-fragile; async-wait is 45% of all flakes per Luo et al. 2014
Tests that read system time without overridesTests fail at midnight / DST / leap year
Tests that read random data without seedingNon-reproducible failures
Tests that depend on file-system layoutOS / CI-runner-specific failures
Tests that depend on locale / timezone of the runnerInternationalisation-dependent flake

Pattern-selection guide

ScenarioRecommended pattern
Default (unit / integration test)Per-test fixture scope + Fresh Fixture
DB-backed integration testPer-test fixture + transaction-rollback (Persistent Fresh Fixture)
Slow expensive E2E setupPer-describe Shared Fixture documented as immutable + transactional teardown
Parallel executionWorker-scoped DB + unique IDs per worker + ephemeral output paths
External service interactionStubs by default; contract tests at API surface; real-network only in smoke / canary
Multi-worker DB-heavy suiteDatabase-per-worker + template-database cloning
Mutation-heavy unit testsPer-test fixture + in-memory mock

Hand-off targets

References

  • Martin Fowler - Eradicating Non-Determinism in Tests (the load-bearing reference for Fresh-vs-Shared-Fixture trade-off and the "non-deterministic test is useless" rule): https://martinfowler.com/articles/nonDeterminism.html
  • Martin Fowler - Practical Test Pyramid (cited for the in-memory-substitution anti-pattern): https://martinfowler.com/articles/practical-test-pyramid.html
  • Gerard Meszaros - xUnit Test Patterns: Refactoring Test Code (2007) - the canonical reference for the four-phase test pattern and all named fixture / teardown patterns: ISBN 978-0131495050.
  • Wikipedia - Test fixture (cites Meszaros's four-phase pattern): https://en.wikipedia.org/wiki/Test_fixture
  • Luo et al. (FSE 2014) - An Empirical Analysis of Flaky Tests (the original academic taxonomy of flake categories: 45% async-wait, 20% concurrency, 12% test-order-dependency, from 201 fixes across 51 projects) which this catalog's patterns prevent: https://mir.cs.illinois.edu/marinov/publications/LuoETAL14FlakyTestsAnalysis.pdf
  • Google Testing Blog, "Flaky Tests at Google and How We Mitigate Them" - flake prevalence (about 16% of tests show some flakiness): https://testing.googleblog.com/2016/05/flaky-tests-at-google-and-how-we.html
  • Testcontainers - https://testcontainers.com/ (the canonical containerised-DB-per-test reference)
  • ISTQB glossary - test isolation: https://glossary.istqb.org/en_US/term/independent-testing
  • ISTQB glossary - test fixture: https://glossary.istqb.org/en_US/term/test-fixture
  • ISTQB glossary - flaky test: https://glossary.istqb.org/en_US/term/flaky-test
  • test-code-conventions §6, flake-pattern-reference, framework-architecture-auditor - companion file-level / symptom-level / audit-level references.
  • object-model-patterns, test-data-patterns, test-step-design-patterns - sister architecture-tier pattern catalogs.