test-code-conventions
Pure-reference catalog of test code conventions - AAA structure (Arrange / Act / Assert), per-test single-responsibility, descriptive naming patterns (`<system_under_test>_<scenario>_<expected>` vs nested describe), assertion specificity, mocking rationale (state vs behavior verification, fake vs mock preference), fixture-coupling rules, and the magic-number / hard-coded-string anti-pattern. The agents in this plugin (test-code-critic, assertion-quality-reviewer, mocking-anti-pattern-detector, e2e-selector-quality-critic) load this as their shared rule book. Use when you need the underlying rationale for any verdict one of those agents issues, as a team's onboarding reference for "what makes a test code-reviewable," or as the source-of-truth the critics' verdicts cite back to.
test-code-conventions
Overview
This skill is a pure reference - no actions, no workflows. It catalogs the test-code conventions that the rest of qa-test-review's critics enforce. When a critic flags an issue, it cites this reference for the reviewer to learn the underlying rule.
When to use
§1 - AAA structure
The canonical test shape - Arrange, Act, Assert - splits each test into three phases:
test('addItem increases cart count', () => {
// Arrange
const cart = new Cart();
// Act
cart.addItem({ sku: 'BOOK-001', qty: 1 });
// Assert
expect(cart.itemCount).toBe(1);
});The phases are visually separated (blank line; comment; or // arrange / act / assert headers). The benefits:
Each test has exactly one Act. Two Acts = two tests. Splitting on multiple Acts is the canonical refactor when a single test grows too long.
§2 - Single-responsibility per test
A test that asserts cart.count === 1 AND cart.totalPrice === 10 AND cart.lastUpdated > now() is three tests in disguise. When one assertion fails, the test stops; the other two failures are masked.
The rule: one logical assertion per test. "Logical" means related to the same observable property - multiple expect calls that all verify "the cart has one item" (count = 1, items.length = 1, contents[0].sku = '...') are one logical assertion.
Splitting:
// Bad — three logical assertions
test('addItem updates cart', () => {
cart.addItem(...);
expect(cart.itemCount).toBe(1); // assertion 1
expect(cart.totalPrice).toBe(10); // assertion 2 (different property)
expect(cart.lastUpdated).toBeGreaterThan(t0); // assertion 3 (different property)
});
// Good — three tests, each assertion isolated
test('addItem increments count', () => { /* ... */ });
test('addItem updates totalPrice', () => { /* ... */ });
test('addItem updates lastUpdated', () => { /* ... */ });§3 - Naming patterns
Two well-established conventions:
A - <system_under_test>_<scenario>_<expected> (Roy Osherove)
test('addItem_validQty_incrementsCount', () => { /* ... */ });
test('addItem_zeroQty_throwsValidationError', () => { /* ... */ });
test('addItem_negativeQty_throwsValidationError', () => { /* ... */ });The triple makes the test self-documenting; no need to read the body to understand what's verified.
B - Nested describe + it
describe('Cart', () => {
describe('addItem', () => {
describe('with valid qty', () => {
it('increments count', () => { /* ... */ });
});
describe('with zero qty', () => {
it('throws ValidationError', () => { /* ... */ });
});
});
});Both are valid; the team picks one and stays consistent. Mixing the two in one suite is the smell.
Avoid: test('it works'), test('test 1'), test('addItem 1'), test('addItem 2'). Generic names are zero debugging help when the failure surfaces.
§4 - Assertion specificity
Assertions should narrow the verification window to exactly what's being asserted. Vague matchers hide regressions:
| Vague | Specific | Why specific is better |
|---|---|---|
expect(x).toBeTruthy() | expect(x).toEqual({ id: 1, name: 'foo' }) | "truthy" passes for 1, 'a', {}, [] - many bug shapes pass. |
expect(arr).toBeDefined() | expect(arr).toEqual(['BOOK-001']) | "defined" passes for [], null-prototype, … |
expect(err).toBeInstanceOf(Error) | expect(err.code).toBe('VALIDATION_ERROR') | Catches "right type, wrong reason" regressions. |
expect(response.status).toBeGreaterThan(199) | expect(response.status).toBe(201) | 200, 204, 299 also pass - masks status-code regressions. |
expect(html).toContain('error') | expect(html).toMatch(/<div class="error">.*Invalid/) | "error" matches "no errors" too. |
The general principle: the assertion should fail on any change to the SUT's behavior that isn't intentional. If the assertion passes for behaviors that shouldn't, it's too loose.
§5 - Mocking
The full taxonomy per mocks-stubs:
"Dummy objects: passed around but never actually used. Usually they are just used to fill parameter lists." (mocks-stubs)
"Fake objects: actually have working implementations, but usually take some shortcut which makes them not suitable for production." (mocks-stubs)
"Stubs: provide canned answers to calls made during the test, usually not responding at all to anything outside what's programmed in." (mocks-stubs)
"Spies: stubs that also record some information based on how they were called." (mocks-stubs)
"Mocks: objects pre-programmed with expectations which form a specification of the calls they are expected to receive." (mocks-stubs)
Per mocks-stubs: "Only mocks insist upon behavior verification. The other doubles can, and usually do, use state verification."
Convention rules:
§6 - Fixture coupling
Tests that share fixtures (parameters, builders, factory functions) should be coupled at the smallest scope:
| Scope | When to use |
|---|---|
| Inline (per test) | Default. The test owns the data; reviewer sees what's being tested. |
describe-block | Multiple tests verify the same scenario differently. |
File-level (beforeEach) | Shared setup with no per-test variation. |
| Cross-file factory | Shared shapes, not shared instances. Use builders / factories. |
Anti-pattern: a giant globalFixtures.ts that every test imports. Tests now break in unrelated ways when the global is touched; the test no longer "owns" what it verifies.
§7 - Magic numbers and strings
Test code is more tolerant of magic numbers than production code - the test's job is often to assert against specific values. But meaningful magic matters:
// Bad
expect(cart.totalPrice).toBe(43.21); // why 43.21?
// Better
const PRICE_PER_BOOK = 10.99;
const QTY = 4;
const EXPECTED_TAX_RATE = 0.0825;
const EXPECTED_TOTAL = PRICE_PER_BOOK * QTY * (1 + EXPECTED_TAX_RATE);
expect(cart.totalPrice).toBeCloseTo(EXPECTED_TOTAL, 2);The named constants double as documentation. The reviewer sees the math; the assertion failure message becomes interpretable.
§8 - E2E selectors
Per pw-best-practices: "Your DOM can easily change so having your tests depend on your DOM structure can lead to failing tests."
Per tl-queries (Testing Library priority order):
| Priority | Query | When to use |
|---|---|---|
| 1 | getByRole('button', { name: 'Submit' }) | Default. Tests via the accessibility tree - same path as users. |
| 2 | getByLabelText('Email') | Form fields with associated <label>. |
| 3 | getByPlaceholderText, getByText, getByDisplayValue | When labels aren't available. |
| 4 | getByAltText, getByTitle | Images, tooltips. |
| 5 | getByTestId('submit-button') | Last resort. Per tl-queries: "The user cannot see (or hear) these." |
CSS class selectors (.button-primary) and XPath (//div[@class='cart']//button[1]) are not on the priority list - per pw-best-practices they are explicitly identified as brittle.
The same convention holds for Playwright (page.getByRole(...)), Cypress (cy.findByRole(...) via cypress-testing-library), and Selenium (when the test framework supports role-based queries via extensions).
§9 - Web-first assertions (E2E)
Per pw-best-practices: avoid "manual assertions without waiting. Using isVisible() checks immediately without awaiting, rather than web-first assertions like toBeVisible() that wait for conditions to be met."
// Bad — race condition
expect(page.locator('.toast').isVisible()).toBe(true);
// Good — auto-waits
await expect(page.locator('.toast')).toBeVisible();The web-first form auto-waits for the assertion to become true within the test's timeout, eliminating the wait-N-seconds-and-hope pattern.
§10 - Slow setup is a smell
A test that takes >1s in setup (creating fixtures, seeding DB, warming caches) has a coupling problem. The remedies:
The whole-suite cost of slow setup compounds: 10 tests × 2s = +20s per CI run × 50 PRs/day = 1000s/day burned.