test-data-patterns
Pure reference catalog of the cross-language object-construction patterns for test data - Test Data Builder (Pryce/Freeman), Factory (with traits and associations), Object Mother, Fixture composition (per-test / per-describe / shared), Snapshot (defers to `golden-file-conventions` for the operational details), and Production-Data Anonymisation. Distinct from per-language data wrappers in this plugin (`factory-bot-data` Ruby, `faker-data` JS, `mimesis-data` Python, `bogus-data` .NET) which document tool-specific configuration; this catalog is the architecture-tier reference for choosing **which pattern** before reaching for the tool. Preloaded by `framework-architecture-auditor` as the data-construction-tier reference.
test-data-patterns
Overview
This skill is a pure reference - no execution steps. It is the catalog the framework-architecture-auditor cites when it audits a test framework's data-construction approach. It complements factory-bot-data (Ruby), faker-data (JS), mimesis-data (Python), bogus-data (.NET), synthetic-pii-generator (cross-language), and golden-file-conventions (snapshot pattern). Those skills document the tools; this skill documents the patterns.
When to use
Do not use this skill to:
Pattern 1 - Test Data Builder
Canonical source: Nat Pryce and Steve Freeman, Growing Object-Oriented Software, Guided by Tests (2009) - the Test Data Builder pattern is named in chapter 22. Discussed in Pryce's blog post Test Data Builders (the cross-language origin reference).
Definition: A Test Data Builder is a class with chainable methods (.withName("Alice").withOrgId(42).build()) that constructs domain objects step-by-step. Every field has a sensible default; the test overrides only the fields it cares about.
Example (cross-language pseudocode):
// Default-everything builder; override only what matters
const user = aUser()
.withRole("admin")
.withOrg(anOrg().withPlan("enterprise"))
.build();When to use Test Data Builder
Anti-patterns
| Anti-pattern | Why it fails |
|---|---|
Builders with .set<Field>(value) for every field (no defaults) | Loses the pattern's benefit; every test specifies every field |
| Builders that mutate the object in place rather than returning a new one | Test cross-coupling: builders shared between tests leak state |
Builders that perform side effects (build() writes to DB) | Mixes two concerns; the test cannot tell whether it's constructing or persisting |
| Builders for objects with 2 fields | Overhead exceeds benefit; use a struct literal |
Pattern 2 - Factory (with traits and associations)
Canonical source: Thoughtbot's factory_bot (Ruby) is the cross-language reference implementation. The pattern itself predates the library - Joshua Kerievsky's Refactoring to Patterns (2004) traces it to the Gang of Four's Factory Method but adapted for test data.
Definition: A Factory is a registered, named template for creating an object. Traits are named modifiers (:admin, :disabled, :premium) that compose with the base template. Associations express relationships (user.org, org.plan).
Example (Ruby FactoryBot, cited as the canonical implementation):
factory :user do
name { Faker::Name.name }
email { Faker::Internet.email }
trait :admin do
role { :admin }
end
trait :with_org do
association :org, factory: :org
end
end
# Test usage:
admin_user = create(:user, :admin, :with_org)Cross-language equivalents (cite the canonical per-language tool):
When to use Factory
Anti-patterns
| Anti-pattern | Why it fails |
|---|---|
Factory definitions that hard-code IDs (id: 1) | Tests collide in parallel; factories must let the DB / Faker assign |
Traits that overlap silently (:admin and :premium both set role) | Order-dependent behaviour; trait composition becomes unpredictable |
Factories that persist by default (create is the only mode) | Slow tests, unnecessary DB writes. Builders should expose build / attributes / create strategies (see factory-bot-data) |
| One mega-factory for the entire domain | Becomes a god-object; every test pulls a fully-populated graph |
Pattern 3 - Object Mother
Canonical source: Martin Fowler - Object Mother. Predates Test Data Builder; superseded by it for most use cases but still useful for stable canonical objects.
Definition: A central class (the "Mother") exposes named methods that return fully-constructed canonical test objects: ObjectMother.standardUser(), ObjectMother.adminUser(), ObjectMother.userInOrgWithFiveMembers().
Fowler's framing: "An Object Mother is a class that contains methods that create well-known objects for use in tests." Useful when the team has a small, stable set of canonical fixtures.
When to use Object Mother
Anti-patterns
| Anti-pattern | Why it fails |
|---|---|
| Object Mother that grows to 50+ methods | Becomes a god-class; engineers can't find the right factory |
| Methods that return shared mutable instances | Tests cross-couple through the Mother's return values |
| Mixing Mother with Builder (some objects via Mother, some via Builder) | Inconsistent test idiom; vocabulary drift |
Mother methods with implicit dependencies (adminUser() requires seedOrgs() to have run) | Implicit ordering creates flakiness |
Pattern 4 - Fixture composition
Canonical source: Gerard Meszaros, xUnit Test Patterns: Refactoring Test Code (2007) - the seminal reference for Fresh Fixture, Shared Fixture, Implicit Setup, Delegated Setup, and Setup Decorator patterns. The Wikipedia entry on test fixture describes the four-phase test pattern (setup / exercise / verify / teardown) attributed to Meszaros.
Definition: Fixture composition is the pattern of building per-test state from reusable fragments. The three flavours:
| Flavour | When |
|---|---|
| Fresh Fixture | Each test creates its own state from scratch. Most isolated; slowest. |
| Shared Fixture | Multiple tests share one initialised state. Fastest; brittle to test ordering. |
| Persistent Fresh Fixture | Fresh state per test, but persisted in a transaction that rolls back at teardown. The pragmatic middle ground. |
Fowler on the trade-off (Eradicating Non-Determinism in Tests): "I prefer the former [Fresh Fixture], as it's often easier - and in particular easier to find the source of a problem." But: "rebuilding the database each time can add a lot of time to test runs, so that argues for switching to a clean-up strategy."
When to use Fresh Fixture (default)
When to use Shared Fixture
Anti-patterns
| Anti-pattern | Why it fails |
|---|---|
| Shared Fixture that some tests quietly mutate | Cross-test coupling; failures depend on test order |
| Fresh Fixture for a 30-minute E2E seed | Test suite time becomes infeasible; team starts skipping tests |
| Multiple fixture flavours in the same suite without explicit convention | Engineers can't tell what to write; bugs creep in |
| Fixture inheritance hierarchies >2 levels deep | Per framework-architecture-auditor §A2, depth-3+ chains break unpredictably |
Pattern 5 - Snapshot / golden-file
Canonical source: Jest snapshot testing is the cross-language reference for the test-code-side; Michael Feathers' Working Effectively with Legacy Code coined "characterisation tests" which is the legacy-code-tier version of snapshot testing.
Definition: A snapshot test compares the current output of code under test to a previously-saved canonical output ("the golden file"). When the test runs, it serialises the output, compares against the file, fails if they differ. Engineers explicitly approve a new golden file when the change is intentional.
This skill's role: Snapshot is a recurring concept in test-data conversation, but the operational details (file naming, sanitisation of timestamps / IDs / PII, per-OS variants, review workflow) are documented in detail by golden-file-conventions. Reach for that skill for the operational catalog; this section is the pattern's catalog entry only.
When to use Snapshot
When NOT to use Snapshot
Pattern 6 - Production-Data Anonymisation
Canonical source: Per ISO/IEC 25024 (data quality) and GDPR/CCPA legal requirements; practitioner adoption documented across Tonic.ai, Gretel.ai, K2view, and synthetic-pii-generator (the marketplace's existing skill for synthesizing PII).
Definition: Anonymisation is the technique of using production data (or production-shaped data) for testing after removing or masking personally-identifiable information (PII), commercially-sensitive data, and any field that would breach privacy / compliance if leaked to a test environment.
Why this is a pattern, not just a tool concern: The pattern dictates that no production data enters a test environment without anonymisation - even if the test environment is "internal only." Cross-environment data leakage is the dominant security failure mode in test-data management (2025 Verizon DBIR cited cross-environment data leakage as a top-10 breach pattern).
The three anonymisation flavours
| Flavour | Definition | When |
|---|---|---|
| Masking | Replace sensitive fields with deterministic placeholders (X*** for surnames; static fake date) | Production-shape preserved; field-level reversible if needed |
| Synthesis | Generate fake data that statistically resembles production (length distributions, locale mix) | No mapping back to production; safest |
| Tokenisation | Replace sensitive values with tokens that map back via a secured lookup | When the test environment needs to round-trip data to production (rare in QA) |
Anti-patterns
| Anti-pattern | Why it fails |
|---|---|
| Copying production DB to staging "for realism" | Cross-environment data leakage; GDPR / CCPA / HIPAA breach surface |
| Anonymisation that preserves the join keys | Sensitive relations (who bought what) survive the anonymisation |
| Anonymisation in CI that doesn't anonymise in dev local | Engineers have raw prod data on their laptops |
| Anonymisation as a one-time operation | Production data changes; anonymisation must run continuously |
Pattern-selection guide
| Need | Pattern | When to mix |
|---|---|---|
| Few-fields, default-most strategy | Test Data Builder | Use Factory underneath the Builder for DB persistence |
| Many variants of one entity | Factory with traits | Combine with Builder for the test-API surface |
| Small stable set of canonical objects | Object Mother | Generally legacy; consider migrating to Builder + Factory |
| Per-test independence | Fresh Fixture | Always the default; reach for Shared only when measured slow |
| Read-only shared state | Shared Fixture | Document immutability; one mutation kills the contract |
| Large structured output | Snapshot / golden-file | See golden-file-conventions for operational details |
| Production-shaped privacy-safe data | Anonymisation | Always for production-sourced data; pair with synthetic-pii-generator |
Cross-cutting anti-patterns
| Anti-pattern | Why it fails |
|---|---|
| Mixing all six patterns in one codebase | Engineers can't tell what to write; vocabulary fragments |
Test data inline-literaled in tests ({ name: "Alice", id: 1 }) at scale | Refactors break 200 tests when one schema field changes |
| Test data setup that takes >5s per test | Suite time becomes infeasible; teams skip tests |
Data construction and persistence collapsed into one method (createUser() always writes to DB) | Cannot test the construction logic without the DB |
| Implicit Setup (relies on global state from a previous test) | Tests become order-dependent; flake follows |
| Test data with PII / production keys | Compliance + security breach surface |