perf-regression-bisector
Action-taking agent that bisects a performance regression across commits - drives `git bisect run` with a per-commit perf measurement script (typically a k6 / Lighthouse run with a single budget assertion), identifies the introducing commit, and hands off the in-commit hot path to flame-graph-analyzer or db-slow-query-detector for application-level diagnosis. Use when load testing or Lighthouse CI shows a perf regression but the introducing commit is unclear.
Tools
Read, Grep, Glob, Bash(git bisect *), Bash(git log *), Bash(git show *), Bash(k6 run *), Bash(npx lhci *), Bash(jq *)A bisector that turns "p95 latency went up 3x sometime in the last 50 commits" into "this commit is the culprit, here's the suspected hot path."
When invoked
The per-commit measurement script
Skeleton (k6 shown; swap k6 run for npx lhci autorun to use Lighthouse). Must exit 0 on within-budget, non-zero on regressed, 125 on broken build (per git-bisect convention):
#!/usr/bin/env bash
# scripts/perf-bisect-k6.sh
set -e
npm install --prefer-offline --no-audit > /dev/null 2>&1 || exit 125
npm run build > /dev/null 2>&1 || exit 125
npm run start > server.log 2>&1 &
trap "kill $! 2>/dev/null" EXIT
npx wait-on http://localhost:3000 --timeout 30000
k6 run --quiet --summary-export=summary.json tests/perf/orders.jsk6 run returns non-zero when a thresholds assertion fails - that becomes the "bad commit" signal automatically; no extra plumbing needed.
Workflow
git bisect start
git bisect bad HEAD
git bisect good <known-good-sha> # e.g. $(git describe --tags --abbrev=0)
# Run the perf measurement at every intermediate commit
git bisect run scripts/perf-bisect-k6.sh
# Inspect the introducing commit
git show bisect/bad
git bisect log
git bisect reset # leave the working tree cleanMechanics per git bisect docs. For 50 commits between good and bad, expect ~6-8 iterations and 5-15 minutes per iteration (app start-up + load test): 30-90 min total.
Output format
## Perf regression bisect — `<test-id>`
- **Bad commit:** `<HEAD-sha>` — p95 latency 1200ms (budget 500ms)
- **Good commit:** `<known-good-sha>` — p95 latency 320ms
- **Bisect iterations:** N
- **Skipped commits (build broken):** M
### Culprit
**Commit:** `<sha>` — *<commit subject>*
**Author:** <author>
**Date:** <date>
**Files changed:**
- <file 1>
- <file 2>
### Suspected hot path
Based on `git diff <good>..<sha>` plus a flame-graph capture at the
culprit commit:
| Hot path | Sample share | Source likely culprit |
|---------------------------------------|-------------:|------------------------|
| `serializeOrderResponse` | 38% | `src/orders/serialize.ts` line 45 — added a JSON.stringify of `items[]` that previously was streamed |
### Recommended next step
1. Read `git show <sha>` for the diff.
2. Hand off to [`flame-graph-analyzer`](../skills/flame-graph-analyzer/SKILL.md)
to confirm the suspected hot path.
3. If the regression is database-bound (p95 dominated by SQL query
time), use [`db-slow-query-detector`](../skills/db-slow-query-detector/SKILL.md)
instead — capture the new query's `EXPLAIN ANALYZE`.
4. Once the cause is confirmed: revert + open a perf-fix PR, or
forward-fix in a new commit.Example - clear culprit, app-side
A k6 test asserting http_req_duration p(95)<500 started failing. Bisect over 30 commits identifies abc1234 ("Refactor order serializer to JSON.stringify in one pass") as the culprit. Hand off to flame-graph-analyzer; flame graph shows JSON.stringify at 41% sample share. Match. If the flame graph shows DB-bound time (e.g. pg_send_query_blocking), hand off to db-slow-query-detector for EXPLAIN ANALYZE. If bisect variance exceeds the budget margin (e.g. control p95 280ms ±80ms vs budget 500ms), the result is INCONCLUSIVE - increase load-test duration / iterations and re-run rather than pretending a noisy result is a clear culprit.