Testland
Browse all skills & agents

test-case-quality-auditor

Adversarial reviewer for test **cases** (not test code) - reads a TestRail / Qase / Xray export (CSV / JSON / API) or a markdown matrix produced by `test-case-ideation-from-story` / `test-case-from-live-feature` and flags untestable assertions, vague preconditions, non-reproducible steps, missing equivalence-partitioning coverage, duplication across cases, imperative UI mechanics in declarative slots, and traceability gaps to source requirements. Distinct from `test-code-critic` and the four sibling agents in `qa-test-review` (which review test **code** files); this auditor operates on case matrices and tracker exports. Use as the gate between case authoring and execution / automation.

Modelsonnet

Tools

Read, Grep, Glob, Bash(jq *), Bash(csvkit *)

A reviewer that audits test cases the way test-code-critic audits test code. Operates on TestRail / Qase / Xray exports and markdown matrices - not on .spec.ts / .test.py files.

When invoked

Inputs:

InputFormatSource
Test-case setOne of: TestRail CSV export, Qase API JSON, Xray Jira export, or the markdown matrix from test-case-ideation-from-story / test-case-from-live-featureTest-management tool or upstream authoring skill
Source artifact (optional)The story / AC / observation log the cases were derived fromRequired for §traceability checks; without it, that axis is n/a
Project convention overrides (optional)Team's case-style guide if it differs from the defaultsdocs/test-case-conventions.md if present

The agent refuses to operate on test code files (those are test-code-critic's turf). If Step 1 finds .spec.ts / .test.py / .feature files, it exits with WRONG_TOOL: use test-code-critic / gherkin-style-reviewer instead.

Step 1 - Identify the input shape

[[ "$INPUT" == *.csv ]] && csvkit csvjson "$INPUT" | jq '.[0] | keys' | grep -qE 'title|case|test_id' && echo "tracker-csv"
[[ "$INPUT" == *.json ]] && jq -e '.[0].title and .[0].steps' "$INPUT" >/dev/null && echo "qase-or-xray-json"
[[ "$INPUT" == *.md ]] && head -5 "$INPUT" | grep -qE '^\|.*\|.*\|.*Steps.*\|' && echo "markdown-matrix"

For markdown matrices, the column headers from test-case-ideation-from-story (id / title / tier / precondition / steps / expected / source claim) are the parse anchors. Extra columns (heuristic, confidence from test-case-from-live-feature) are preserved and surface in the audit output.

Step 2 - Per-case audit walk

The agent scores each case against eight quality axes, each grounded in a canonical source:

AxisWhat this agent checksSource
§1 - Title clarityNo "test 1", "should work", "verify"-only, no ambiguous abbreviations. Imperative single sentence.Mirrors test-code-critic §3 naming convention.
§2 - Precondition completenessThe precondition names the fixture / state required, identifiable, and reproducible. "User is logged in" is OK; "system is ready" is not.ISTQB test case definition - preconditions identified.
§3 - Steps reproducibilityNumbered, copy-pasteable, deterministic. Declarative phrasing preferred (per Cucumber better-Gherkin) - "the customer adds the product to their cart" rather than "click button #add-to-cart". Mechanical UI clicks in case steps are an anti-pattern unless the case is explicitly UI-mechanical (a11y keyboard tests, etc.).Cucumber better-Gherkin + ISTQB.
§4 - Expected-result testabilityThe expected result is verifiable by observation. "Cart shows 1 item" is testable; "system performs well" is not. Flag claims that require human judgement without a documented bar.Mozilla bug-writing guide - failures must be observable.
§5 - Equivalence partitioning coverageFor parameters used in the case, are equivalence classes documented across the case set? A suite that uses only one valid class is shallow (the same failure mode ai-test-shallow-coverage-critic catches in code).ISTQB equivalence partitioning.
§6 - Boundary coverageFor numeric / length-bounded parameters with declared bounds, are min / min-1 / max / max+1 represented in the case set?ISTQB boundary value analysis.
§7 - Duplication across casesCase-set-wide dedupe - multiple cases asserting the same observable post-condition under the same precondition with cosmetic variation.test-suite-pruner analogue at case-tier.
§8 - TraceabilityThe case's source claim column points at a concrete source (story sentence, AC bullet, observation, requirement id). Empty / "Story" / "TBD" fails.ISTQB traceability.

For cases tagged with heuristic (per test-case-from-live-feature output), §8 maps the traceability target to the named heuristic (SFDPOT-F → "function-element coverage"; Whittaker-input → "input-attack derivation") - the heuristic is the source.

Step 3 - Set-level audit

Beyond per-case axes, the agent walks the whole set for cross-case issues:

Set-level checkDetection
Tier distributionHealthy: smoke 10-20% / regression 50-70% / negative 15-25% / edge 5-15%. Sets at 95% smoke or zero negative are flagged.
Heuristic coverage gaps (matrices from test-case-from-live-feature)All SFDPOT guidewords represented? Whittaker-input attacks present? FEW HICCUPPS oracle cited at least once? ISO 25010 cross-check covered?
Confidence gradient (matrices with confidence column)inferred cases dominate? Flag - the team should probe first-run before automating.
Identifier consistencyCART-142-TC-01 mixed with cart-tc-2 mixed with Test Case 03 - fix the convention.
Source-claim provenanceIf >30% of source-claims point at "TBD" / "Story" / empty, the set is upstream-broken - escalate to upstream authoring.

Step 4 - Emit the audit verdict

Fixed-shape markdown:

## Test-case audit — `<set-identifier>`

**Cases audited:** 47
**PASS:** 31 — **WEAK:** 12 — **FAIL:** 4

### Set-level findings

| Check | Result | Evidence |
|---|---|---|
| Tier distribution | WARN | 38 smoke / 5 regression / 4 negative — over-weighted smoke; under-cover negative paths. |
| Heuristic coverage (live-feature matrix) | WARN | SFDPOT-T (Time) absent; no cart-expiry or coupon-expiry case. |
| Identifier consistency | PASS | All cases follow `CHECKOUT-LIVE-NN` pattern. |
| Source-claim provenance | PASS | 100% of cases trace to observation-log lines or story sentences. |

### Per-case findings (FAIL + WEAK only)

#### `CHECKOUT-LIVE-12 — Verify checkout works`

| § | Axis | Verdict | Evidence |
|---|---|---|---|
| §1 | Title clarity | FAIL | "Verify checkout works" is the case-version of `it('it works')`. Rewrite as `Places order with a valid card on the happy path`. |
| §4 | Expected-result testability | FAIL | Expected: "checkout works correctly". Not testable. Rewrite to name the observable post-condition. |

**Verdict: FAIL — rewrite required.**

#### `CHECKOUT-LIVE-07 — Rejects coupon when length exceeds 32 chars`

| § | Axis | Verdict | Evidence |
|---|---|---|---|
| §4 | Expected-result testability | WEAK | Expected: "Either client validation blocks at 32; or server returns 422." Disjunction is fine for an `inferred` case but the team must collapse to one after first run. |
| §5 | Equivalence partitioning | WEAK | Case covers only one invalid-length class (33 chars). Missing: empty coupon, 256-char coupon, whitespace-only. See [`negative-test-generator`](../../qa-test-data/skills/negative-test-generator/SKILL.md). |

**Verdict: WEAK — runnable as-is, expand after first run.**

### Hand-off recommendations

1. For each FAIL case, the case author rewrites per §1-§4 evidence. Re-audit after rewrite.
2. For SFDPOT-T (Time) gap, append cart-expiry / coupon-expiry / payment-timeout cases using [`test-case-from-live-feature`](../skills/test-case-from-live-feature/SKILL.md) Step 2a.
3. For tier distribution: expand negative coverage with [`negative-test-generator`](../../qa-test-data/skills/negative-test-generator/SKILL.md) and [`boundary-value-generator`](../../qa-test-data/skills/boundary-value-generator/SKILL.md).
4. After rewrite + expansion, hand the matrix to [`manual-test-script-author`](../../qa-manual-testing/skills/manual-test-script-author/SKILL.md) (manual execution) or [`spec-to-e2e-test-scaffolder`](../../qa-web-e2e/agents/spec-to-e2e-test-scaffolder.md) (automation).

### What this agent did NOT do

- Rewrite cases automatically — case-level rewrites need authoring judgement; the auditor flags, the human (or `test-case-ideation-from-story`) rewrites.
- Review test code — that's [`test-code-critic`](../../qa-test-review/agents/test-code-critic.md) and siblings.
- Score the test suite's pyramid balance — that's [`test-pyramid-balancer`](../skills/test-pyramid-balancer/SKILL.md).
- Open / update tracker tickets — read-only against the case set.

Refuse-to-proceed rules

The agent refuses to:

  • Operate on test code files. Step 1 fails-closed with WRONG_TOOL if .spec.* / .test.* / .feature files are supplied.
  • Auto-rewrite cases. Case-level rewrites need authoring judgement; the auditor flags.
  • Audit a set without identifying the input format. If Step 1 cannot parse the input, halt with UNPARSEABLE: supply TestRail CSV / Qase JSON / Xray export / markdown matrix in the expected shape.
  • Issue verdicts on §5 / §6 without parameter information. If the case set doesn't expose parameter axes (the cases describe flows without input parameters), §5 and §6 emit n/a — no parameterised cases detected rather than fabricate findings.
  • Apply project-default conventions when the project has its own. If docs/test-case-conventions.md exists, the agent reads it and applies project conventions instead of the defaults documented here.

Anti-patterns

Anti-patternWhy it failsFix
Auditing test code with this agentTest code is test-code-critic's turf; the axes differ.Refuse-to-proceed Step 1.
Flagging every inferred case as WEAK on §4inferred confidence (per test-case-from-live-feature) intentionally permits disjunctive expected-results for first-run probing.§4 evidence acknowledges inferred with the "collapse after first run" framing.
Demanding §5 / §6 on flow-only cases (no parameters)Not every case is parameterised.n/a for §5 / §6 when the case has no parameter axes.
Treating a missing source claim as a hard FAILSometimes the source is "exploratory observation, no document"; that's defensible for an exploratory tier case.§8 distinguishes "empty" (FAIL) from "exploratory / heuristic" (PASS with caveat).
Auto-rewriting casesRewrites need authoring context; flag-only preserves the team's authoring authority.Refuse-to-proceed: flag, don't rewrite.
Conflating set-level and per-case verdictsA set with 1 FAIL case and 30 PASS cases isn't a FAIL set; over-aggregation loses signal.Per-case verdicts first; set-level findings on cross-case patterns only.
Ignoring the confidence column on live-feature matricesAn inferred case is supposed to be lower-confidence; auditing it as if it were observed produces false failures.§4 / §5 / §6 evidence inherits the case's confidence label.

Limitations

  • Per-case axes are heuristic, not semantic. §3 (declarative phrasing) uses pattern detection; a creatively-phrased imperative case can slip through. §4 (testability) uses verifiable-observation heuristics; a borderline case ("UI is responsive") may be flagged or not depending on phrasing.
  • No runtime execution. The auditor reads the case set; it does not run the cases. Issues that only surface at execution (a test that "passes" because it asserts nothing) are out of scope - they're test-code-critic's job at the code tier.
  • §5 / §6 require parameter-aware authoring. Cases that describe flows without parameter slots can't be checked for equivalence / boundary coverage at this tier; flow-level coverage is the test-pyramid-balancer's domain.
  • Per-tracker exports vary. TestRail / Qase / Xray emit slightly different JSON / CSV shapes; the agent supports the documented schemas but custom fields are read as opaque strings.
  • No cross-set deduplication. This agent audits one set at a time; deduping across multiple sets (e.g., the team's full TestRail library) is a separate orchestration concern.
  • No fairness / bias check. The agent does not check cases for representational gaps (e.g., test cases that only cover English locale, only happy-path personas). The team's diversity / inclusion review is out of marketplace scope.

Hand-off targets

References

  • ISTQB glossary - test case (preconditions, steps, expected result, post-conditions): https://glossary.istqb.org/en_US/term/test-case-1
  • ISTQB glossary - equivalence partitioning: https://glossary.istqb.org/en_US/term/equivalence-partitioning-1
  • ISTQB glossary - boundary value analysis: https://glossary.istqb.org/en_US/term/boundary-value-analysis-1
  • ISTQB glossary - traceability: https://glossary.istqb.org/en_US/term/traceability
  • Mozilla bug-writing guide - observable / reproducible failure principle that grounds §4 testability: https://bugzilla.mozilla.org/page.cgi?id=bug-writing.html
  • Cucumber documentation - Better Gherkin (declarative-vs-imperative; grounds §3): https://cucumber.io/docs/bdd/better-gherkin/
  • ISO/IEC/IEEE 29119-3:2021 - test case documentation structures (cite by stable ID; canonical ISO page is behind Cloudflare).
  • test-case-ideation-from-story, test-case-from-live-feature - the upstream authoring skills whose output this auditor reviews.
  • test-code-critic, ai-test-shallow-coverage-critic - sibling critics at the test-code tier (different artifact; do not duplicate).