Testland
Browse all skills & agents

test-step-design-patterns

Pure reference catalog of test-step design patterns at the architecture tier - step granularity (one logical action per step), abstraction layers (mechanical → page → business), step extraction rules (when to inline / when to extract to a helper / when to extract to a Page Object method), the declarative-vs-imperative phrasing rule, FIRST principles (Fast / Independent / Repeatable / Self-validating / Timely), and the AAA / Given-When-Then mapping. Distinct from `test-code-conventions` §1 (AAA structure at the file level) and from `manual-step-to-gherkin` (Gherkin-specific translation) - this catalog is the cross-framework architecture-tier reference for what a step IS, when it should exist, and where it should live.

test-step-design-patterns

Overview

This skill is a pure reference - no execution steps. It is the catalog the framework-architecture-auditor, test-code-critic, and playwright-codegen-reviewer cite when auditing step granularity at the architecture tier - within an "Act" phase, what is one step? It complements test-code-conventions §1 (AAA structure at the file level).

When to use

  • Designing a test framework - pick the abstraction layers up front (mechanical / page / business).
  • Auditing existing tests where step count per test is high (>15 actions per test signals granularity problems).
  • Refactoring codegen output where every UI mechanic became its own step.
  • Onboarding engineers who are writing first tests - point them at the canonical citations.

Pattern 1 - FIRST principles

Canonical source: Robert C. "Uncle Bob" Martin - Clean Code (2008), chapter 9 "Unit Tests", reaffirmed in Clean Coder and across his blog. The FIRST mnemonic is the foundational quality bar for every test step.

LetterPrinciple
F - FastTests must be fast. Slow tests don't get run; tests that don't get run don't catch bugs.
I - IndependentTests must not depend on each other. Each test sets up its own world. Per Fowler on test isolation, this is the prerequisite to parallel execution and selective re-runs.
R - RepeatableTests run consistently on any environment (laptop, CI, prod-like). No "runs only on Tuesdays" / "passes on Linux only."
S - Self-validatingTests pass or fail on their own. No human reads logs to determine the outcome.
T - TimelyTests are written close to (ideally before) the production code. Stale tests rot.

FIRST is the underlying rationale for most of the patterns below. A step that violates FIRST is the smell; the patterns prescribe the fix.

Pattern 2 - Step granularity

A "step" is the minimal unit a test reader can name in business terms. One logical action per step - not one click, not one assertion, but one meaningful operation.

The single-purpose-step rule

Each step should do exactly one of: Arrange (setup), Act (the operation under test), Assert (verify outcome), or Annotate (logging / labelling). Steps that mix two are split.

SmellRefactor
await page.click('#submit'); expect(toast).toBeVisible();Two steps: the Act (submit) and the Assert (toast visible). Split.
const user = await createUser({...}); await login(user);Two Arrange steps. Acceptable as one logical "Arrange a logged-in user" if the helper is named that way (Pattern 4).
await page.click('#row-1'); await page.click('#row-1-edit'); await page.fill('#name', 'New');Three mechanical clicks comprising one business action ("edit row 1's name"). Extract to a single business step (Pattern 4).

Step count per test

Aim for 3-8 steps per test body (counting Arrange / Act / Assert phases as steps). >15 is the threshold the framework-architecture-auditor flags as a smell - the test is doing too much or operating at the wrong abstraction.

Anti-patterns

Anti-patternWhy it fails
One step that does everything: await fullCheckoutFlow();Single line of action; the test reads as "did the helper work?" not as "did the SUT work?"
30 mechanical clicks comprising one logical flowBrittle to UI changes; the test reads as a script, not a specification
Mixed Act + Assert in one line (expect(await page.click(...)).toBeTruthy())Cannot distinguish "the action failed" from "the assertion failed" in the diagnostic
One step with two unrelated assertions (expect(cart.total).toBe(10); expect(user.role).toBe('admin');)Test fails on the first assert; the second is never evaluated. Split per test-code-critic §2 single-responsibility

Pattern 3 - Abstraction layers

The dominant test-code smell at scale is mechanical steps in the test body. The fix is layered abstraction. Three layers, named consistently:

LayerVocabularyExample
Business layer (the test body)Domain verbs the PM / business stakeholder would recognisecustomer.signsIn(), cart.addsItem(sku), checkout.placesOrder()
Page / Component layer (Page Objects, Tasks, Service Objects)Page-specific or component-specific operationsLoginPage.submit({ email, password }), CartPage.applyCoupon(code)
Mechanical layer (the framework primitives)Click, type, navigate, requestpage.click(), page.fill(), request.post()

The test body lives at the business layer. The test never reaches into the mechanical layer directly. If it does, the team has either no abstraction or the abstraction leaks.

Worked example (cross-framework)

Bad (test body at the mechanical layer):

test('places an order', async ({ page }) => {
  await page.goto('/login');
  await page.fill('#email', '[email protected]');
  await page.fill('#password', 'pass');
  await page.click('#submit');
  await page.goto('/product/sku-001');
  await page.click('button.add-to-cart');
  await page.goto('/checkout');
  await page.fill('#shipping-address', '123 Main St');
  await page.click('#place-order');
  await expect(page.locator('.confirmation')).toBeVisible();
});

Good (test body at the business layer):

test('places an order', async ({ customer, cart, checkout }) => {
  await customer.signsIn();
  await cart.addsItem('sku-001');
  await checkout.placesOrderWithDefaultShipping();
  await expect(checkout.confirmation).toBeVisible();
});

The mechanics live in the Page Object / Task / fixture. The test body reads as the specification, not the implementation.

When to keep the test mechanical

Some tests legitimately operate at the mechanical layer - testing the mechanical surface itself:

  • Accessibility tests asserting keyboard navigation order.
  • Visual regression tests asserting pixel-level rendering.
  • Selector-resilience tests asserting getByRole works across viewports.

For these, the test body at the mechanical layer is correct. Tag them (@a11y, @visual) so reviewers don't refactor them by mistake.

Pattern 4 - Step extraction rules

When should a step move out of the test body?

HeuristicAction
The step appears in 3+ testsExtract to a Page Object method / Task / fixture
The step is mechanical (click, fill, navigate) but the test isn't about that mechanicExtract
The step is business-meaningful and only used hereKeep inline; name it well
The step is 5+ lines of mechanical operationsAlways extract
The step requires explanatory commentsThe comment is a smell; the abstraction is missing - extract
The step is the first thing in 80% of testsExtract to a fixture (per-test or per-describe setup)

The "rule of three" for extraction

Per Refactoring (Fowler 1999, 2nd ed. 2018), duplicate code is acceptable at first occurrence (write it inline). At the second occurrence, note the duplication. At the third occurrence, extract.

This applies to test steps: don't pre-extract on the first test ("we might need this later" is YAGNI). Extract when the third test needs the same step.

Anti-patterns

Anti-patternWhy it fails
Extracting every step on the first testYAGNI; the abstraction doesn't match real usage
Never extracting (every test is 30 lines of mechanics)Mechanical leakage; brittle
Extracting to a helper that doesn't belong to a layer (helpers/random-stuff.ts)Sprawl; the team can't find the helper
Extracted helper that takes 8 parametersThe helper is doing too much; split it
Helper that hides Act vs Arrange (named setupAndDoThing())Test reads as if it's doing one thing when it does two

Pattern 5 - AAA / Given-When-Then mapping

Two equivalent step-grouping vocabularies. Same content, different name conventions.

AAA (xUnit tradition)Given-When-Then (BDD tradition)
ArrangeGiven
ActWhen
AssertThen
(Annotate)(no equivalent; goes in step comments)

Both express the same three phases. The team picks one and uses it consistently. Don't mix: if some tests use AAA comments and others use Given-When-Then helpers, the cognitive cost grows.

When AAA is the right choice

  • xUnit-family frameworks (Jest, pytest, JUnit, NUnit).
  • Tests that aren't user-facing scenarios (data-quality, security, perf).
  • The team prefers comment-anchored phase separation.

When Given-When-Then is the right choice

  • BDD frameworks (Cucumber, SpecFlow, Reqnroll, Behave) where the syntax is enforced.
  • Tests that are user-facing scenarios (E2E, integration).
  • The team wants product stakeholders to read the tests.

The phase-separation rule

Whatever vocabulary the team uses, the phases must be visually separable. The pattern from test-code-conventions §1:

test('places an order', async () => {
  // Arrange
  const customer = await aCustomer().withCartItem('sku-001').build();

  // Act
  const order = await customer.placesOrder();

  // Assert
  expect(order.status).toBe('confirmed');
});

Or, with blank-line separation (no comments needed):

test('places an order', async () => {
  const customer = await aCustomer().withCartItem('sku-001').build();

  const order = await customer.placesOrder();

  expect(order.status).toBe('confirmed');
});

Either works; lacking either fails test-code-critic §1 AAA review.

Pattern 6 - Declarative vs imperative step phrasing

Even at the business layer, two phrasings exist:

ImperativeDeclarative
"The customer enters the email and password and clicks submit""The customer signs in"
"The customer adds 5 items to the cart and proceeds to checkout""The customer initiates checkout with a 5-item cart"
"The customer is created with role=admin and org_id=42""An admin customer in org 42"

The declarative form is preferred per Cucumber's better-Gherkin guidance (which applies broadly, not just to Gherkin): "scenarios should describe the intended behaviour of the system, not the implementation."

The test: would the wording need to change if the implementation changed? If yes, the step is imperative; rewrite declaratively.

When imperative is correct

  • Tests that test the imperative mechanic (e.g., "the form submits on Enter key press" - Enter is the mechanic under test).
  • Accessibility tests where the keyboard sequence IS the test.

Anti-patterns

Anti-patternWhy it fails
Declarative step that hides a critical mechanic (customer.signsIn() when the test is about SSO redirect)The test no longer tests what it claims to test
Imperative step that re-describes the system internalsBrittle to refactors; the test breaks when the implementation changes for unrelated reasons
Mixing imperative and declarative within one testReader can't tell what abstraction level they're operating at

Pattern 7 - Step naming

Per Roy Osherove's The Art of Unit Testing (2013), test names follow the pattern <system_under_test>_<scenario>_<expected_outcome>. Step naming (within the test) follows similar discipline:

SmellRefactor
await doThing()Name what the thing is: await customer.signsIn()
await test1()Helpers don't get numeric names; describe what they do
await x = await getX()Single-letter variables hide what's being created
await page.click('#submit')Wrap in a named Page Object method: await loginPage.submit()

The reader test

A reviewer should be able to read the test body aloud and have it sound like a specification:

"A customer signs in. They add SKU-001 to their cart. They place an order with default shipping. The order is confirmed."

If reading aloud doesn't produce a specification - if it produces "click, type, click, click, navigate, click, expect-truthy" - the steps are at the wrong abstraction.

Pattern selection guide

ScenarioPattern
Test reads as 20 clicks-and-typesExtract to Page Objects / Tasks (Pattern 4) and rewrite at business layer (Pattern 3)
Test does Arrange and Act in the same lineSplit (Pattern 2 single-purpose rule)
Three tests share the same first 4 linesExtract to fixture / helper (Pattern 4 rule of three)
Test phases not visually separableAdd blank lines or AAA comments (Pattern 5)
Step name reads as implementation detailRewrite declaratively (Pattern 6)
Step requires explanatory comment to understandThe abstraction is missing - extract (Pattern 4)
Test mixes business and mechanical vocabularyPick one layer per test (Pattern 3)

Cross-cutting anti-patterns

Anti-patternWhy it fails
Tests with 30+ steps (one logical thing per "step" but the whole test does 10 logical things)Per test-code-critic §2, single-responsibility violation; split into multiple tests
Helpers that wrap one framework call (async function click(sel) { await page.click(sel); })Wrapping for the sake of wrapping; adds no abstraction
Step extracted into a helper but the helper takes a boolean flag that branches behaviorTwo helpers masquerading as one
Step that does retry / wait / fallback insideHides flakiness; the test passes when it should fail
Tests that read top-down look fine but the helpers contain hidden assertionsThe test seems to assert one thing; actually asserts more (or different things)
Tests where the assertion is inside the Page Object methodViolates object-model-patterns no-assertion rule

Hand-off targets

References

  • Robert C. Martin - Clean Code: A Handbook of Agile Software Craftsmanship (2008), chapter 9 "Unit Tests" (the FIRST principles): ISBN 978-0132350884. The canonical reference for the FIRST mnemonic. https://www.oreilly.com/library/view/clean-code-a/9780136083238/
  • Roy Osherove - The Art of Unit Testing (2nd ed. 2013) (the <sut>_<scenario>_<expected> naming pattern cited in test-code-conventions §3): ISBN 978-1617290893.
  • Kent Beck - Test-Driven Development by Example (2002) - the canonical TDD reference for step / test design rhythm: ISBN 978-0321146533.
  • Martin Fowler - Refactoring: Improving the Design of Existing Code (2nd ed. 2018) - the "rule of three" for extraction (Pattern 4): https://martinfowler.com/books/refactoring.html
  • Martin Fowler - Eradicating Non-Determinism in Tests (cited for the Independent principle): https://martinfowler.com/articles/nonDeterminism.html
  • Cucumber documentation - Better Gherkin (declarative vs imperative phrasing rule, Pattern 6): https://cucumber.io/docs/bdd/better-gherkin/
  • Gerard Meszaros - xUnit Test Patterns (2007) - the named-pattern catalog for Test Method, Assertion Method, Custom Assertion, Inline Resource: ISBN 978-0131495050.
  • ISTQB glossary - test step: https://glossary.istqb.org/en_US/term/test-step
  • ISTQB glossary - test procedure (the imperative form, by ISTQB convention): https://glossary.istqb.org/en_US/term/test-procedure-1
  • test-code-conventions, test-code-critic, framework-architecture-auditor, playwright-codegen-reviewer, manual-step-to-gherkin - the related-tier components.
  • object-model-patterns, test-isolation-patterns, test-data-patterns - sister architecture-tier pattern catalogs.