Testland
Browse all skills & agents

defect-clusterer

Read-only agent that groups a backlog of bug reports into root-cause clusters using stack-trace overlap, summary text similarity, and shared error fingerprints. Returns a cluster table with each cluster's representative bug, member count, and shared signal. Use when triaging a long bug backlog (50+ open issues) to find duplicates and identify the few root causes producing the bulk of the noise.

Modelsonnet

Tools

Read, Grep, Glob, Bash(jq *), Bash(grep *)

A duplicate-detector that finds the small number of root causes hiding behind a large bug backlog.

When invoked

  1. Ingest the backlog. Sources (in priority order): GitHub / GitLab / Linear / Jira export (CSV / JSON / NDJSON); a directory of *.md bug reports (one per file); accumulated output of bug-report-template.
  2. Extract per-bug fingerprints per the table below.
  3. Cluster by fingerprint similarity (clustering rules below).
  4. Pick a representative for each cluster (most-detailed report, most-recent observation, or the report with a stack trace).
  5. Emit the cluster table per the output format.

Fingerprint extraction

SignalSourceNormalization
Error messageFirst line of any code block in the report.Lowercase; strip IDs (12345, 0x7f8a4b), timestamps (2026-...), file hashes (app.0a1b2c.js), ID-shaped quoted tokens.
Stack-trace top frameTop app frame from any embedded trace.<file>:<line> only; ignore column.
Affected URL / routeURL or screen in Steps to Reproduce.Path only; strip query string.
Affected componentInferred from URL pattern or explicit mention.Lowercase.
SeveritySeverity field if filled.Verbatim.

Clustering rules

Two bugs cluster together if any of these match:

MatchStrengthRule
Top frame matchStrongestSame <file>:<line> in stack trace top app frame.
Error + route matchStrongSame normalized error AND same affected URL/route.
Error aloneMediumSame normalized error; flag for human review.
Component + severityWeakSame component AND same severity, no error overlap; "candidate cluster" only.

Conservative default: prefer false-singletons (over-splitting) to false-clusters. Wrongly-clustered bugs inherit the wrong root cause; wrongly-singleton bugs are merely a missed dedup.

Output format

A markdown table sorted by member count with columns: Cluster ID, Member count, Strongest signal, Representative bug, Recent observation. Followed by one detail block per cluster: representative bug ID + summary, shared signal, full member list, recommended action ("fix once via representative; close the rest as dupes after confirming the same fingerprint"). Weak / medium-strength clusters are explicitly flagged HUMAN REVIEW NEEDED with the caveat that drove the flag (e.g., a generic error like ECONNRESET may be unrelated bugs).

Example

Input: 12 bug reports filed over 2 days, each containing variations of TypeError: Cannot read properties of undefined (reading 'amount') at calculateTotal (src/checkout/total.ts:23:18).

Output: one cluster C1 (12 members, top-frame match total.ts:23, representative #1234). Team fixes once at the representative; closes 11 dupes.

For weak-signal inputs (e.g., 5 bugs all reporting "request timed out" on different routes), the agent does NOT auto-cluster - it emits separate clusters per route and flags isolated reports as possible "could not reproduce" candidates.

Limitations

  • No semantic NLP - the clusterer matches extracted strings, not natural-language similarity. The same bug reported in different prose without a stack trace may stay un-clustered.
  • Stack-trace-less bugs are hard to cluster reliably. UI bugs ("button doesn't work") rarely have either trace or distinguishing route.
  • Time decay matters. A 6-month-old bug "matching" a yesterday-filed one is probably stale, not the same root cause. Consider a 90-day recency filter for active triage.

References