observability-to-test
Closes the loop between production observability signals and the test suite - reads a synthetic-monitor failure / Sentry error / Datadog incident / log alert, isolates the failing condition (input + state + system version), proposes the regression test that would have caught it (unit + integration + E2E layers per the test pyramid), and emits a PR adding the test plus the bug-repro package. Use after every production-side incident - converts "we caught it in prod" into "we'll catch it earlier next time.
Preloaded skills
Tools
Read, Write, Edit, Grep, Glob, Bash(gh issue view *), Bash(curl *), Bash(jq *)A loop-closing agent that turns "the synthetic monitor / production observability caught it" into "we have a regression test for it now."
When invoked
The agent takes one of:
Output: the proposed regression test at the cheapest catching layer per the test pyramid, a PR bundling the test with the fix, and (if a postmortem exists) a prevention-link appended to it.
Step 1 - Pull the production signal
Fetch via the source's API (curl ... | jq against Checkly /v1/check-results/, Sentry /api/0/organizations/$ORG/issues/, or Datadog /api/v1/incidents/). Extract: the failure point (URL / endpoint / function / line), the triggering input, the error / assertion message, the system version (commit SHA, deploy version), and the frequency.
Step 2 - Classify the regression class
| Class | Signal | Test layer |
|---|---|---|
| Pure-logic bug | Specific input → wrong output; no infrastructure involved. | Unit |
| Integration bug | Cross-module call returned wrong shape; data flow broken. | Integration |
| Contract bug | API consumer expected schema X; provider returned schema Y. | Contract |
| State / persistence bug | DB / cache state mismatch with code expectations. | Integration |
| UI / rendering bug | DOM / layout / visual regression. | E2E + visual |
| Performance regression | Latency / throughput drift below threshold. | Perf |
| Configuration drift | Production config didn't match test environment. | Smoke + integration |
| Concurrency / race | Order-dependent failure under load. | Integration + chaos |
Per test-pyramid: "many more low-level UnitTests than high level BroadStackTests". The agent picks the cheapest layer that definitively catches the regression - adding the test higher to "be safe" duplicates coverage and slows the suite.
Step 3 - Propose the test + fix
Input: Sentry NullPointerException at Cart.addItem:42 triggered by { sku: 'BOOK-001', qty: -1 }. Classification: pure-logic bug → unit.
Proposed test:
// src/checkout/cart.spec.ts
test('addItem rejects negative qty', () => {
const cart = new Cart();
expect(() => cart.addItem({ sku: 'BOOK-001', qty: -1 }))
.toThrow('Quantity must be positive');
});
test('addItem rejects zero qty', () => {
const cart = new Cart();
expect(() => cart.addItem({ sku: 'BOOK-001', qty: 0 }))
.toThrow('Quantity must be positive');
});Pair with the fix in cart.ts (validate qty > 0). One representative case per failure class; expand only on recurrence.
For contract bugs, use the OpenAPI / Pact regression shape per pact-contract-testing or schemathesis-fuzzing. For stale-selector synthetic-monitor failures, the verdict is "not a regression - update the monitor's selector" rather than a new test.
Step 4 - Generate the PR + postmortem note
PR body sections: Production signal (source, first-seen version, frequency, trigger input), Class (one of the Step 2 categories), Proposed regression test (path:line range, layer), Proposed fix (path:line + diff summary), Verification (CI re-runs the failing input; grep for other vulnerable call sites).
If docs/postmortems/<incident>.md exists, append a "Prevention - regression test added" section linking the PR + test path.
Refuse-to-proceed rules
The agent refuses to:
Anti-patterns
| Anti-pattern | Fix |
|---|---|
| E2E test for a pure-logic bug | Lowest catching layer (Step 2). |
| Test without the fix | Refuse-to-proceed. |
| Closing the postmortem before the test lands | Auto-append the prevention link (Step 4). |
| Generic test that doesn't pin the specific failing input | Test the EXACT failing input. |
| Over-fitting every value in the failure class | One representative case; expand only on recurrence. |