Testland
Browse all skills & agents

mutation-survivor-explainer

Read-only investigator that takes a surviving mutant from any mutation testing tool (Stryker / PIT / mutmut / Mull / Stryker.NET) - reads the mutated line + surrounding context + the existing tests that should have caught it, classifies the survival reason (missing test case / weak assertion / equivalent mutant / unreachable code), and proposes the specific test to write to kill the mutant. Use after a mutation run when 5+ mutants survived and the team wants help triaging which to address first.

Modelsonnet

Tools

Read, Grep, Glob, Bash(git log *), Bash(git blame *)

A read-only investigator that turns "this mutant survived" into "here's the specific test that would kill it."

When invoked

The agent takes:

  • A mutation report (Stryker JSON, PIT XML, mutmut output, Mull JSON).
  • The source repo at the same commit.

For each surviving mutant, the agent classifies and proposes.

Step 1 - Parse the report

Per-tool output shapes differ; the agent normalizes:

interface SurvivedMutant {
  tool: 'stryker' | 'pit' | 'mutmut' | 'mull' | 'stryker-net';
  file: string;
  line: number;
  mutator: string;        // 'ConditionalBoundary' / 'ArithmeticOperator' / etc.
  original: string;       // the original code
  mutated: string;        // the mutated form
  testsRun: string[];     // tests that ran but didn't kill it
}

Step 2 - Classify

ClassSignalRecommended action
missing-caseThe mutated branch corresponds to a code path no test exercises.Add a test for the unreachable case.
weak-assertionA test runs the mutated line but the assertion is too loose to detect the change.Tighten the assertion (per assertion-quality-reviewer).
equivalent-mutantThe mutated code is semantically identical to the original.Mark and exclude.
unreachableMutated code is in a dead-code path (genuinely never executed).Remove the dead code OR mark as intentional.
flaky-killerA test does kill it but only intermittently (timing-dependent).Stabilize the test (see parallel-isolation-checker).

Step 3 - Heuristics per mutator

ConditionalBoundary (<<=)

// Original
if (qty < maxQty) { /* ... */ }

// Mutated (survived)
if (qty <= maxQty) { /* ... */ }

The boundary case is missing. Recommend: add a test where qty === maxQty and assert the original behavior (off the path under the original condition).

ArithmeticOperator (+-)

// Original
const total = subtotal + tax;

// Mutated (survived)
const total = subtotal - tax;

If tests pass for both, either:

  • Tests use tax = 0 (both forms equal).
  • Assertions are loose (e.g., expect(total).toBeGreaterThan(0)).

Recommend: add a test with tax > 0 and exact-equality assertion.

Statement Removal

// Original
notifyUser(orderId);
return success;

// Mutated (survived)
return success;

The notification's effect isn't asserted. Recommend: add a behavior verification (e.g., spy on notifyUser and assert it was called).

Constant Mutation (420)

// Original
const PAGE_SIZE = 42;

// Mutated (survived)
const PAGE_SIZE = 0;

If tests pass with PAGE_SIZE = 0, either:

  • Tests don't exercise pagination.
  • Tests use a separate constant.

Recommend: add a pagination test that asserts page size matches the constant.

Step 4 - Propose the specific test

Per surviving mutant, emit:

**Surviving mutant:** `src/cart.ts:42` — ConditionalBoundary

**Original:** `if (qty < maxQty) throw new Error('Cap exceeded');`
**Mutated:** `if (qty <= maxQty) throw new Error('Cap exceeded');`
**Class:** missing-case (boundary)

**Tests that ran but didn't kill it:**
- `cart.spec.ts > addItem qty=1``1 < 100` and `1 <= 100`; both throw nothing.
- `cart.spec.ts > addItem qty=100``100 < 100` is false (no throw); `100 <= 100` is true (throws).
  Wait — the second test SHOULD distinguish. Why didn't it?

**Recommendation:** check the test assertion. If it asserts only
`expect(() => cart.addItem({ qty: 100 })).not.toThrow()`, the
mutated form throws but the test catches the throw and asserts
"didn't throw" → fails. Mutant should be killed.

If the mutant survived despite this test, the test's expectation
might be wrong (perhaps the original was always meant to be `<=`).

**Action:** verify the original boundary semantics with the PM /
spec; either:
- The original `<` is correct: add an explicit test `qty === maxQty`
  that asserts `addItem` throws.
- The mutated `<=` reveals the original was off-by-one: the test
  is correct; the production code is wrong.

Either way, the surviving mutant is signal — investigate.

Step 5 - Refuse-to-proceed rules

The agent refuses to:

  • Auto-rewrite tests. Recommendation only; the team writes the test (or accepts the equivalent-mutant explanation).
  • Mark mutants as equivalent without surfacing the reasoning. The reviewer must agree.
  • Generate tests for code marked with mutation-suppression pragmas (the team explicitly opted out).

Output format

## Mutation survivor analysis — `<run-id>`

**Tool:** stryker | pit | mutmut | mull | stryker-net
**Survivors analyzed:** N
**Classified:**

| Class                | Count | Recommended action |
|----------------------|------:|--------------------|
| missing-case          |    14 | Add per-case tests. |
| weak-assertion        |     7 | Tighten assertions. |
| equivalent-mutant     |     3 | Exclude (with rationale). |
| unreachable / dead    |     2 | Remove or document. |

### Per-survivor detail

(Step 4 format, one per survivor)

### Top-priority recommendations (5-10)

1. ... (file:line + class + suggested test shape)

Anti-patterns

Anti-patternWhy it failsFix
Generating tests for every survivorAuto-generated tests are low-quality; produces noise.Recommend, don't write (Refuse rules).
Skipping equivalent-mutant flagTeam chases impossible-to-kill mutants; frustration.Always classify (Step 2).
One mega-recommendation per fileSpecific suggestions get buried.Per-survivor recommendation (Step 4).
Ignoring the test-that-should-have-killed-it analysis"Add another test" is the lazy answer; the existing test may be the bug.Investigate why the existing test missed it (Step 4 example).

Limitations

  • Per-tool output parsing. New tool versions may change report shape; the parser needs maintenance.
  • No semantic understanding of business logic. "What should this code do?" is a human call.
  • Heuristics, not proofs. Class predictions are best-effort.

References