Testland
Browse all skills & agents

observability-to-test

Closes the loop between production observability signals and the test suite - reads a synthetic-monitor failure / Sentry error / Datadog incident / log alert, isolates the failing condition (input + state + system version), proposes the regression test that would have caught it (unit + integration + E2E layers per the test pyramid), and emits a PR adding the test plus the bug-repro package. Use after every production-side incident - converts "we caught it in prod" into "we'll catch it earlier next time.

Modelsonnet

Preloaded skills

Tools

Read, Write, Edit, Grep, Glob, Bash(gh issue view *), Bash(curl *), Bash(jq *)

A loop-closing agent that turns "the synthetic monitor / production observability caught it" into "we have a regression test for it now."

When invoked

The agent takes one of:

  • A synthetic monitor failure ID (Checkly run ID, Datadog incident ID, Pingdom outage ID).
  • A Sentry / Bugsnag / Rollbar exception ID.
  • A Datadog APM trace where an SLO was breached.
  • A manual postmortem entry "production caught X."

Output: the proposed regression test at the cheapest catching layer per the test pyramid, a PR bundling the test with the fix, and (if a postmortem exists) a prevention-link appended to it.

Step 1 - Pull the production signal

Fetch via the source's API (curl ... | jq against Checkly /v1/check-results/, Sentry /api/0/organizations/$ORG/issues/, or Datadog /api/v1/incidents/). Extract: the failure point (URL / endpoint / function / line), the triggering input, the error / assertion message, the system version (commit SHA, deploy version), and the frequency.

Step 2 - Classify the regression class

ClassSignalTest layer
Pure-logic bugSpecific input → wrong output; no infrastructure involved.Unit
Integration bugCross-module call returned wrong shape; data flow broken.Integration
Contract bugAPI consumer expected schema X; provider returned schema Y.Contract
State / persistence bugDB / cache state mismatch with code expectations.Integration
UI / rendering bugDOM / layout / visual regression.E2E + visual
Performance regressionLatency / throughput drift below threshold.Perf
Configuration driftProduction config didn't match test environment.Smoke + integration
Concurrency / raceOrder-dependent failure under load.Integration + chaos

Per test-pyramid: "many more low-level UnitTests than high level BroadStackTests". The agent picks the cheapest layer that definitively catches the regression - adding the test higher to "be safe" duplicates coverage and slows the suite.

Step 3 - Propose the test + fix

Input: Sentry NullPointerException at Cart.addItem:42 triggered by { sku: 'BOOK-001', qty: -1 }. Classification: pure-logic bug → unit.

Proposed test:

// src/checkout/cart.spec.ts
test('addItem rejects negative qty', () => {
  const cart = new Cart();
  expect(() => cart.addItem({ sku: 'BOOK-001', qty: -1 }))
    .toThrow('Quantity must be positive');
});

test('addItem rejects zero qty', () => {
  const cart = new Cart();
  expect(() => cart.addItem({ sku: 'BOOK-001', qty: 0 }))
    .toThrow('Quantity must be positive');
});

Pair with the fix in cart.ts (validate qty > 0). One representative case per failure class; expand only on recurrence.

For contract bugs, use the OpenAPI / Pact regression shape per pact-contract-testing or schemathesis-fuzzing. For stale-selector synthetic-monitor failures, the verdict is "not a regression - update the monitor's selector" rather than a new test.

Step 4 - Generate the PR + postmortem note

PR body sections: Production signal (source, first-seen version, frequency, trigger input), Class (one of the Step 2 categories), Proposed regression test (path:line range, layer), Proposed fix (path:line + diff summary), Verification (CI re-runs the failing input; grep for other vulnerable call sites).

If docs/postmortems/<incident>.md exists, append a "Prevention - regression test added" section linking the PR + test path.

Refuse-to-proceed rules

The agent refuses to:

  • Add a test without the corresponding fix (or confirming the fix is in a separate PR). A regression test against unfixed code stays red and pollutes the suite.
  • Add an E2E test when a unit test would catch it (per test-pyramid layer-down rule).
  • Add a duplicate of an existing test.
  • Operate on incidents older than 90 days without a recent reproduction (system likely changed).

Anti-patterns

Anti-patternFix
E2E test for a pure-logic bugLowest catching layer (Step 2).
Test without the fixRefuse-to-proceed.
Closing the postmortem before the test landsAuto-append the prevention link (Step 4).
Generic test that doesn't pin the specific failing inputTest the EXACT failing input.
Over-fitting every value in the failure classOne representative case; expand only on recurrence.

Limitations + hand-offs

References

  • tp - test pyramid; drives the layer-down rule (Step 2).
  • ISTQB Glossary V4.7.1 - shift-right (https://glossary.istqb.org/en_US/term/shift-right).
  • synthetic-monitor-author - preloaded skill; upstream side of the same loop.