Testland
Browse all skills & agents

qa-shift-right

Production-side QA per ISTQB-canonical shift right ("a test approach to test a system continuously in production"): 4 skills (feature-flag-experiment-validator, prod-canary-validator, rum-to-synthetic-gap-analyzer, synthetic-monitor-author) and 2 agents (canary-and-experiment-coordinator, observability-to-test).

Install this plugin

/plugin install qa-shift-right@testland-qa

Part of role bundle: qa-role-performance

qa-shift-right

Production-side QA per ISTQB-canonical shift right ("a test approach to test a system continuously in production"). Synthetic monitors that exercise critical journeys, canary-deploy validators with statistical comparison vs baseline, A/B / feature-flag experiment significance validators, and the loop from production-side incident → regression test added.

Components

TypeNameDescription
Skillsynthetic-monitor-authorBuild-an-X synthetic monitor: pick journey + platform (Datadog/Checkly/Pingdom/etc.) + Playwright-style script + per-step assertions + multi-region cadence + alert thresholds.
Skillprod-canary-validatorBuild-an-X canary verdict: per-metric absolute + relative thresholds, two-sample statistical tests (chi-square / Welch's t-test), promote/pause/rollback verdict.
Skillfeature-flag-experiment-validatorBuild-an-X A/B test analysis: chi-square / Welch's / Mann-Whitney U per metric, FDR multiple-comparisons correction, practical-vs-statistical significance, ship/don't-ship verdict.
Agentobservability-to-testCloses the loop: production-signal → regression test (cheapest catching layer per test pyramid) + fix PR + postmortem update.
Agentcanary-and-experiment-coordinatorCoordinates a simultaneous canary deploy + A/B experiment, catching cohort contamination and sequencing the validators.
Skillrum-to-synthetic-gap-analyzerFinds high-traffic user journeys with no synthetic monitor by analyzing RUM / CrUX data.

Install

/plugin marketplace add testland/qa
/plugin install qa-shift-right@testland-qa

Skills

feature-flag-experiment-validator

Validates the statistical significance of an A/B / feature-flag experiment result - computes per-metric effect size + p-value (chi-square for proportions, Welch's t-test for continuous metrics), applies a multiple-comparison correction (Bonferroni / Benjamini-Hochberg) when N>1 metric, surfaces practical-vs-statistical-significance distinction, and emits a ship/don't-ship verdict per metric. Use to keep PMs / engineers from "shipping the winning variant" based on under-powered or multiple-tested results - the rigorous version of "the variant looks better in the dashboard.

prod-canary-validator

Builds a canary-validation workflow that compares a canary deploy's metrics against the baseline (current main) - picks the metric set (error rate, p50/p95/p99 latency, business KPIs like checkout-completion), defines per-metric thresholds (absolute + relative-to-baseline), runs a statistical-comparison check (effect size + significance) over the canary's observation window, and emits a promote/rollback verdict. Use as the gate between canary deploy and full rollout - the deterministic version of "the on-call eyeballs the dashboard for 30 min.

rum-to-synthetic-gap-analyzer

Reads Real User Monitoring data (Datadog RUM, Sentry Performance, GA4 Core Web Vitals / CrUX) to identify high-traffic user journeys that have no synthetic monitor coverage: ranks journeys by session volume times business value, diffs the ranked list against existing synthetic monitors, and emits a prioritized gap list ready to feed into synthetic-monitor-author. Use when an observability stack has RUM instrumented but the team suspects synthetic coverage is sparse, biased toward low-traffic paths, or was never systematically derived from real usage data.

synthetic-monitor-author

Drafts a synthetic monitor configuration for one critical user journey - picks the platform (Datadog Synthetics, Pingdom, Checkly, New Relic, etc.), authors the scripted-transaction body (Playwright-style for browser checks; HTTP-step for API checks), wires the cadence (typical 1-15 min), defines per-step assertions (DOM presence, API status, response shape) and aggregate alert thresholds (consecutive-failure count + on-call routing). Use when a critical journey needs continuous-in-production verification per ISTQB-canonical shift-right ("a test approach to test a system continuously in production").