qa-test-data-privacy

PII detection, masking, and synthetic data generation for test environments: 8 skills (data-masking-techniques-reference, faker-synthetic-data, k-anonymity-verifier, pii-categories-reference, pii-masking-pipeline-builder, presidio-pii-detection, synthea-healthcare-data, test-data-governance-reference) and 1 agent (pii-leak-critic).

Install this plugin

/plugin install qa-test-data-privacy@testland-qa

Part of role bundle: qa-role-security

qa-test-data-privacy

PII detection, masking, and synthetic data generation for test environments: 5 skills (pii-categories-reference, data-masking-techniques-reference, presidio-pii-detection, faker-synthetic-data, synthea-healthcare-data) + 1 build skill (pii-masking-pipeline-builder) and 1 agent (pii-leak-critic).

Components

Type	Name	Description
skill	pii-categories-reference	Catalog of PII categories across GDPR, CPRA, NIST SP 800-122, HIPAA Safe Harbor
skill	data-masking-techniques-reference	Masking operators + NIST 800-188 privacy models (k-anonymity, l-diversity, t-closeness, DP)
skill	presidio-pii-detection	Microsoft Presidio analyzer + anonymizer for PII scanning + masking
skill	faker-synthetic-data	Faker libraries (Python, JavaScript, Java, .NET) for synthetic substitution
skill	synthea-healthcare-data	MITRE Synthea synthetic-patient simulator (FHIR / C-CDA / CSV output)
skill	pii-masking-pipeline-builder	Build a deployable masking pipeline spec from a source-data inventory
agent	pii-leak-critic	Audits masked output for leaks; classifies findings by regime; emits block/pass verdict
Skill	k-anonymity-verifier	Verify k-anonymity / l-diversity / t-closeness on masked datasets (ARX, pycanon).
Skill	test-data-governance-reference	Pure reference: test-data lifecycle governance (retention, cross-env promotion, deletion).

Differentiation

This plugin scopes detection + masking + synthetic-substitution of existing data. Sibling neighbours:

qa-test-data - fixture construction (Test Data Builder, Factory, Object Mother, etc.). Its synthetic-pii-generator generates fresh fake PII; this plugin detects + masks existing PII.
qa-compliance - regulatory feature testing (does GDPR Art. 17 erasure work? does CCPA delete-on-request work?). This plugin engineers the data those tests run against.
qa-secrets - credentials / API keys (different scope from personal data).

Install

/plugin marketplace add testland/qa
/plugin install qa-test-data-privacy@testland-qa

Skills

data-masking-techniques-reference

Pure-reference catalog of data-masking techniques and de-identification privacy models. Enumerates the seven canonical masking operators (substitution, shuffling, number/date variance, encryption, hashing, nulling, masking-out / character-scrambling) plus tokenisation, redaction, format-preserving encryption, and Microsoft Presidio's six built-in operators. Distinguishes reversible techniques (pseudonymisation candidates per GDPR Art. 4(5)) from irreversible techniques (anonymisation candidates). Maps techniques to NIST SP 800-188 privacy models - k-anonymity, l-diversity, t-closeness, differential privacy. Cites ISO/IEC 20889:2018 for the standard taxonomy. Use to pick the right masking operator per field type and risk level.

faker-synthetic-data

Author and run Faker libraries (Python `Faker`, JavaScript `@faker-js/faker`, Java `JavaFaker`, .NET `Bogus`) for generating synthetic substitute data when masking pipelines remove real PII. Covers locale-aware generators, deterministic seeding for test reproducibility, the common provider methods (name / email / address / phone / SSN / credit card / IBAN / date / UUID / text), pytest fixture integration, and the trade-off between random vs deterministic substitution for referential integrity. Use after a PII detector flags fields that need synthetic replacement (distinct from synthetic-pii-generator which assembles fixtures from scratch - this is the underlying library skill those build skills compose).

k-anonymity-verifier

Verifies that a masked dataset satisfies k-anonymity, l-diversity, and t-closeness by computing equivalence classes over chosen quasi-identifiers and reporting re-identification risk. Covers quasi-identifier selection heuristics, threshold guidance, pycanon API (k_anonymity / l_diversity / t_closeness / report), ARX Java API and GUI workflow, SmartNoise for differential-privacy comparison, and CI-gate integration. Distinct from data-masking-techniques-reference (which catalogs masking operators but defers k-anonymity measurement to dedicated tooling) and from presidio-pii-detection (which detects PII spans but offers no equivalence-class analysis). Use when you need to confirm whether a masked dataset meets a stated k, l, or t threshold before promoting it to a non-production environment.

pii-categories-reference

Pure-reference catalog of personally identifiable information (PII) categories across GDPR, CCPA/CPRA, NIST SP 800-122, and HIPAA. Defines what counts as personal data under each regime, enumerates the explicit identifiers each regulator lists (GDPR Art. 4(1) and Art. 9 special categories; CPRA sensitive personal information; NIST direct-identifier vs linkable distinction; HIPAA Safe Harbor 18 identifiers), and maps overlapping fields across jurisdictions so a masking pipeline knows which regulator's rules apply. Use as the authoritative source when authoring or reviewing masking rules, classifying a dataset's risk level, or scoping which fields a PII detector must catch.

pii-masking-pipeline-builder

Build-an-X workflow that produces a PII masking pipeline spec from a source-data inventory. Walks the author through (1) classifying each field against pii-categories-reference, (2) picking a masking operator from data-masking-techniques-reference, (3) deciding pseudonymisation (reversible, in GDPR scope) vs anonymisation (irreversible, out of scope), (4) ordering the pipeline (detect → operator → audit), and (5) emitting a deployable config for Presidio + Faker + Synthea wrappers. Output is a YAML pipeline spec plus a per-field rationale table. Use after classifying a dataset's PII risk; this is the workflow that translates classification into runnable masking config.

presidio-pii-detection

Author and run Microsoft Presidio PII detection - wraps presidio-analyzer (PII detector) + presidio-anonymizer (replace/redact/mask/hash/encrypt operators) for scanning datasets, log streams, and free-text fields. Covers AnalyzerEngine + AnonymizerEngine setup, built-in recognizers (PERSON, EMAIL_ADDRESS, CREDIT_CARD, US_SSN, IBAN_CODE, country-specific IDs across US/UK/Spain/Italy/Poland/Singapore/Australia/India and more), custom PatternRecognizer authoring, score thresholds, and CI gating. Use when scanning *existing* data for PII (vs synthesising fresh fixtures with synthetic-pii-generator).

synthea-healthcare-data

Author and run Synthea (MITRE's open-source synthetic patient population simulator) to produce HIPAA-safe synthetic medical records for testing health IT systems. Covers Gradle build, population-size and state-specific generation, FHIR R4 / STU3 / DSTU2 / C-CDA / CSV / CPCDS output formats, disease-module customisation, and the lifecycle-simulation approach (birth-through-death patient journeys with realistic demographics). Use when testing FHIR servers, EHR integrations, claims processing, or any health IT system that needs realistic patient records without HIPAA exposure (distinct from faker-synthetic-data which is generic; this is health-domain-specific).

test-data-governance-reference

Pure-reference catalog of test-data lifecycle governance: retention schedules for test datasets, cross-environment data-sharing agreements, deletion of test data containing real PII, refresh cadence, access controls, and the legal basis for each policy under GDPR Art. 5 storage limitation and NIST SP 800-122. Use when defining a data-steward role for test environments, authoring a retention policy for a test database, scoping a data-sharing agreement before promoting a dataset from production to staging, or determining the deletion timeline for any test fixture that contains live personal data.

Agents

pii-leak-critic

Adversarial agent that audits a masking-pipeline output (or a candidate test fixture) for PII leaks the pipeline missed. Runs Presidio detection on a sampled output, cross-references hits against the per-field operator spec, classifies leaks by regulatory regime (GDPR Art. 4(1), CPRA SPI, NIST direct vs linkable, HIPAA Safe Harbor 18), and emits a block/pass verdict. Use after a pii-masking-pipeline-builder spec runs (audit step) or before promoting a test fixture set to a shared environment.