Testland
Browse all skills & agents

qa-test-data-privacy

PII detection, masking, and synthetic data generation for test environments: 8 skills (data-masking-techniques-reference, faker-synthetic-data, k-anonymity-verifier, pii-categories-reference, pii-masking-pipeline-builder, presidio-pii-detection, synthea-healthcare-data, test-data-governance-reference) and 1 agent (pii-leak-critic).

Install this plugin

/plugin install qa-test-data-privacy@testland-qa

Part of role bundle: qa-role-security

qa-test-data-privacy

PII detection, masking, and synthetic data generation for test environments: 5 skills (pii-categories-reference, data-masking-techniques-reference, presidio-pii-detection, faker-synthetic-data, synthea-healthcare-data) + 1 build skill (pii-masking-pipeline-builder) and 1 agent (pii-leak-critic).

Components

TypeNameDescription
skillpii-categories-referenceCatalog of PII categories across GDPR, CPRA, NIST SP 800-122, HIPAA Safe Harbor
skilldata-masking-techniques-referenceMasking operators + NIST 800-188 privacy models (k-anonymity, l-diversity, t-closeness, DP)
skillpresidio-pii-detectionMicrosoft Presidio analyzer + anonymizer for PII scanning + masking
skillfaker-synthetic-dataFaker libraries (Python, JavaScript, Java, .NET) for synthetic substitution
skillsynthea-healthcare-dataMITRE Synthea synthetic-patient simulator (FHIR / C-CDA / CSV output)
skillpii-masking-pipeline-builderBuild a deployable masking pipeline spec from a source-data inventory
agentpii-leak-criticAudits masked output for leaks; classifies findings by regime; emits block/pass verdict
Skillk-anonymity-verifierVerify k-anonymity / l-diversity / t-closeness on masked datasets (ARX, pycanon).
Skilltest-data-governance-referencePure reference: test-data lifecycle governance (retention, cross-env promotion, deletion).

Differentiation

This plugin scopes detection + masking + synthetic-substitution of existing data. Sibling neighbours:

  • qa-test-data - fixture construction (Test Data Builder, Factory, Object Mother, etc.). Its synthetic-pii-generator generates fresh fake PII; this plugin detects + masks existing PII.
  • qa-compliance - regulatory feature testing (does GDPR Art. 17 erasure work? does CCPA delete-on-request work?). This plugin engineers the data those tests run against.
  • qa-secrets - credentials / API keys (different scope from personal data).

Install

/plugin marketplace add testland/qa
/plugin install qa-test-data-privacy@testland-qa

Skills

data-masking-techniques-reference

Pure-reference catalog of data-masking techniques and de-identification privacy models. Enumerates the seven canonical masking operators (substitution, shuffling, number/date variance, encryption, hashing, nulling, masking-out / character-scrambling) plus tokenisation, redaction, format-preserving encryption, and Microsoft Presidio's six built-in operators. Distinguishes reversible techniques (pseudonymisation candidates per GDPR Art. 4(5)) from irreversible techniques (anonymisation candidates). Maps techniques to NIST SP 800-188 privacy models - k-anonymity, l-diversity, t-closeness, differential privacy. Cites ISO/IEC 20889:2018 for the standard taxonomy. Use to pick the right masking operator per field type and risk level.

faker-synthetic-data

Author and run Faker libraries (Python `Faker`, JavaScript `@faker-js/faker`, Java `JavaFaker`, .NET `Bogus`) for generating synthetic substitute data when masking pipelines remove real PII. Covers locale-aware generators, deterministic seeding for test reproducibility, the common provider methods (name / email / address / phone / SSN / credit card / IBAN / date / UUID / text), pytest fixture integration, and the trade-off between random vs deterministic substitution for referential integrity. Use after a PII detector flags fields that need synthetic replacement (distinct from synthetic-pii-generator which assembles fixtures from scratch - this is the underlying library skill those build skills compose).

k-anonymity-verifier

Verifies that a masked dataset satisfies k-anonymity, l-diversity, and t-closeness by computing equivalence classes over chosen quasi-identifiers and reporting re-identification risk. Covers quasi-identifier selection heuristics, threshold guidance, pycanon API (k_anonymity / l_diversity / t_closeness / report), ARX Java API and GUI workflow, SmartNoise for differential-privacy comparison, and CI-gate integration. Distinct from data-masking-techniques-reference (which catalogs masking operators but defers k-anonymity measurement to dedicated tooling) and from presidio-pii-detection (which detects PII spans but offers no equivalence-class analysis). Use when you need to confirm whether a masked dataset meets a stated k, l, or t threshold before promoting it to a non-production environment.

pii-categories-reference

Pure-reference catalog of personally identifiable information (PII) categories across GDPR, CCPA/CPRA, NIST SP 800-122, and HIPAA. Defines what counts as personal data under each regime, enumerates the explicit identifiers each regulator lists (GDPR Art. 4(1) and Art. 9 special categories; CPRA sensitive personal information; NIST direct-identifier vs linkable distinction; HIPAA Safe Harbor 18 identifiers), and maps overlapping fields across jurisdictions so a masking pipeline knows which regulator's rules apply. Use as the authoritative source when authoring or reviewing masking rules, classifying a dataset's risk level, or scoping which fields a PII detector must catch.

pii-masking-pipeline-builder

Build-an-X workflow that produces a PII masking pipeline spec from a source-data inventory. Walks the author through (1) classifying each field against pii-categories-reference, (2) picking a masking operator from data-masking-techniques-reference, (3) deciding pseudonymisation (reversible, in GDPR scope) vs anonymisation (irreversible, out of scope), (4) ordering the pipeline (detect → operator → audit), and (5) emitting a deployable config for Presidio + Faker + Synthea wrappers. Output is a YAML pipeline spec plus a per-field rationale table. Use after classifying a dataset's PII risk; this is the workflow that translates classification into runnable masking config.

presidio-pii-detection

Author and run Microsoft Presidio PII detection - wraps presidio-analyzer (PII detector) + presidio-anonymizer (replace/redact/mask/hash/encrypt operators) for scanning datasets, log streams, and free-text fields. Covers AnalyzerEngine + AnonymizerEngine setup, built-in recognizers (PERSON, EMAIL_ADDRESS, CREDIT_CARD, US_SSN, IBAN_CODE, country-specific IDs across US/UK/Spain/Italy/Poland/Singapore/Australia/India and more), custom PatternRecognizer authoring, score thresholds, and CI gating. Use when scanning *existing* data for PII (vs synthesising fresh fixtures with synthetic-pii-generator).

synthea-healthcare-data

Author and run Synthea (MITRE's open-source synthetic patient population simulator) to produce HIPAA-safe synthetic medical records for testing health IT systems. Covers Gradle build, population-size and state-specific generation, FHIR R4 / STU3 / DSTU2 / C-CDA / CSV / CPCDS output formats, disease-module customisation, and the lifecycle-simulation approach (birth-through-death patient journeys with realistic demographics). Use when testing FHIR servers, EHR integrations, claims processing, or any health IT system that needs realistic patient records without HIPAA exposure (distinct from faker-synthetic-data which is generic; this is health-domain-specific).

test-data-governance-reference

Pure-reference catalog of test-data lifecycle governance: retention schedules for test datasets, cross-environment data-sharing agreements, deletion of test data containing real PII, refresh cadence, access controls, and the legal basis for each policy under GDPR Art. 5 storage limitation and NIST SP 800-122. Use when defining a data-steward role for test environments, authoring a retention policy for a test database, scoping a data-sharing agreement before promoting a dataset from production to staging, or determining the deletion timeline for any test fixture that contains live personal data.