qa-shift-right

Production-side QA per ISTQB-canonical shift right ("a test approach to test a system continuously in production"): 4 skills (feature-flag-experiment-validator, prod-canary-validator, rum-to-synthetic-gap-analyzer, synthetic-monitor-author) and 2 agents (canary-and-experiment-coordinator, observability-to-test).

Install this plugin

/plugin install qa-shift-right@testland-qa

Part of role bundle: qa-role-performance

qa-shift-right

Production-side QA per ISTQB-canonical shift right ("a test approach to test a system continuously in production"). Synthetic monitors that exercise critical journeys, canary-deploy validators with statistical comparison vs baseline, A/B / feature-flag experiment significance validators, and the loop from production-side incident → regression test added.

Components

Type	Name	Description
Skill	synthetic-monitor-author	Build-an-X synthetic monitor: pick journey + platform (Datadog/Checkly/Pingdom/etc.) + Playwright-style script + per-step assertions + multi-region cadence + alert thresholds.
Skill	prod-canary-validator	Build-an-X canary verdict: per-metric absolute + relative thresholds, two-sample statistical tests (chi-square / Welch's t-test), promote/pause/rollback verdict.
Skill	feature-flag-experiment-validator	Build-an-X A/B test analysis: chi-square / Welch's / Mann-Whitney U per metric, FDR multiple-comparisons correction, practical-vs-statistical significance, ship/don't-ship verdict.
Agent	observability-to-test	Closes the loop: production-signal → regression test (cheapest catching layer per test pyramid) + fix PR + postmortem update.
Agent	canary-and-experiment-coordinator	Coordinates a simultaneous canary deploy + A/B experiment, catching cohort contamination and sequencing the validators.
Skill	rum-to-synthetic-gap-analyzer	Finds high-traffic user journeys with no synthetic monitor by analyzing RUM / CrUX data.

Install

/plugin marketplace add testland/qa
/plugin install qa-shift-right@testland-qa

Skills

feature-flag-experiment-validator

Validates the statistical significance of an A/B / feature-flag experiment result - computes per-metric effect size + p-value (chi-square for proportions, Welch's t-test for continuous metrics), applies a multiple-comparison correction (Bonferroni / Benjamini-Hochberg) when N>1 metric, surfaces practical-vs-statistical-significance distinction, and emits a ship/don't-ship verdict per metric. Use to keep PMs / engineers from "shipping the winning variant" based on under-powered or multiple-tested results - the rigorous version of "the variant looks better in the dashboard.

prod-canary-validator

Builds a canary-validation workflow that compares a canary deploy's metrics against the baseline (current main) - picks the metric set (error rate, p50/p95/p99 latency, business KPIs like checkout-completion), defines per-metric thresholds (absolute + relative-to-baseline), runs a statistical-comparison check (effect size + significance) over the canary's observation window, and emits a promote/rollback verdict. Use as the gate between canary deploy and full rollout - the deterministic version of "the on-call eyeballs the dashboard for 30 min.

rum-to-synthetic-gap-analyzer

Reads Real User Monitoring data (Datadog RUM, Sentry Performance, GA4 Core Web Vitals / CrUX) to identify high-traffic user journeys that have no synthetic monitor coverage: ranks journeys by session volume times business value, diffs the ranked list against existing synthetic monitors, and emits a prioritized gap list ready to feed into synthetic-monitor-author. Use when an observability stack has RUM instrumented but the team suspects synthetic coverage is sparse, biased toward low-traffic paths, or was never systematically derived from real usage data.

synthetic-monitor-author

Drafts a synthetic monitor configuration for one critical user journey - picks the platform (Datadog Synthetics, Pingdom, Checkly, New Relic, etc.), authors the scripted-transaction body (Playwright-style for browser checks; HTTP-step for API checks), wires the cadence (typical 1-15 min), defines per-step assertions (DOM presence, API status, response shape) and aggregate alert thresholds (consecutive-failure count + on-call routing). Use when a critical journey needs continuous-in-production verification per ISTQB-canonical shift-right ("a test approach to test a system continuously in production").

Agents

canary-and-experiment-coordinator

Coordinates a release that runs a canary deploy and a feature-flag A/B experiment simultaneously - audits the user-assignment overlap to detect canary cohort contamination of the experiment split, sequences the two validators (prod-canary-validator then feature-flag-experiment-validator), and reconciles their verdicts into a single promote/hold/rollback decision. Use when a team ships a canary deploy and an active A/B experiment at the same time and needs to confirm the two cohort splits are statistically independent before trusting either verdict.

observability-to-test

Closes the loop between production observability signals and the test suite - reads a synthetic-monitor failure / Sentry error / Datadog incident / log alert, isolates the failing condition (input + state + system version), proposes the regression test that would have caught it (unit + integration + E2E layers per the test pyramid), and emits a PR adding the test plus the bug-repro package. Use after every production-side incident - converts "we caught it in prod" into "we'll catch it earlier next time.