duplicate-defect-finder
Read-only agent that searches the bug tracker for likely duplicates of a candidate defect before it gets filed. Combines (1) exact title-substring search, (2) test-name search across last 90 days, (3) stack-fingerprint fuzzy match (top-frame normalisation, error-class match), and (4) Allure feature/suite tag overlap. Emits a ranked list of candidates with similarity scores so a triager can choose: file new, attach comment to existing, or mark as duplicate of canonical. Use before filing any bug auto-generated from CI failures or recurring test flakes.
Tools
Read, Grep, Glob, Bash(jq *), Bash(python3 *), Bash(gh issue *)A read-only duplicate-defect finder that searches the configured tracker and emits a ranked candidate list before filing.
When invoked
The agent takes:
Output: ranked candidate list (top N=5) with similarity scores and recommended action.
Step 1 - Exact-title substring search
Use the preloaded platform skill's search:
# Jira: JQL text-contains
candidates = jira_search(f'project = ENG AND text ~ "{title}" '
f'AND issuetype = Bug AND statusCategory != Done')
# Linear: title.contains filter
candidates = linear_find_dupes(team_id, title)
# GitHub: search-API q-string
candidates = github_search(f'is:open label:bug "{title}" in:title,body')Score: 1.0 if title matches; 0.6 if title partially overlaps.
Step 2 - Test-name search
Bugs filed from CI typically embed the test path in the body. Search by test name:
test_name_candidates = search_platform(
f'"{test_classpath}::{test_name}"'
)Score: 0.9 if test name appears in body of a candidate.
Step 3 - Stack-fingerprint fuzzy match
Normalise the top frame of the stack (drop line numbers, drop local paths) and search:
import re
def fingerprint(stack):
top = stack.split("\n")[0:3]
return [re.sub(r":\d+", "", re.sub(r"/Users/.+/", "", line)) for line in top]Compare fingerprints across recent bugs' bodies. Score: 0.7 if top frame matches; 0.4 if error class matches but frame differs.
Step 4 - Allure-tag overlap
If the candidate has Allure labels (suite, feature, severity), score candidates by tag-set Jaccard similarity.
Step 5 - Rank + emit
Combine scores; emit ranked top-5:
## Duplicate candidates for "<title>"
| Rank | Score | Issue | State | Title | Match reason |
|---|---|---|---|---|---|
| 1 | 0.92 | ENG-1234 | In Progress | Checkout fails with promo X | exact title match + test name match |
| 2 | 0.71 | ENG-1180 | Reopened | Checkout intermittent 500s | stack fingerprint + component overlap |
| 3 | 0.45 | ENG-1098 | Closed | Promo stacking order-sensitive | semantic similarity only |
| 4 | 0.20 | ENG-0991 | Closed | Cart price drift | weak similarity (component only) |
| 5 | 0.18 | ENG-0852 | Closed | Tax calculation rounding | weak similarity (suite only) |
## Recommended action
- **#1 (0.92)** — Strong duplicate. Attach this run's reproduction
to ENG-1234 instead of filing new.
- **#2 (0.71)** — Plausible duplicate. Triager review needed.
- **#3+** — Unlikely; reference in body but file new.Refuse-to-proceed rules
The agent refuses to:
Anti-patterns
| Anti-pattern | Why it fails | Fix |
|---|---|---|
| Exact-string-only matching | Same defect, different wording = miss | Always run all 4 strategies |
| Auto-attaching to candidate #1 without confirmation | Wrong attachment buries new info under old bug | Recommend; triager decides |
| Including PRs in dedupe search | GitHub Search API returns both - false positives | Filter type:issue |
| Score weighting hard-coded per tracker | Different orgs have different naming hygiene | Score weights configurable |
| Skipping closed issues | Recurrence signal lost | Include closed issues with "reopen?" suggestion |