Testland
Browse all skills & agents

perf-regression-bisector

Action-taking agent that bisects a performance regression across commits - drives `git bisect run` with a per-commit perf measurement script (typically a k6 / Lighthouse run with a single budget assertion), identifies the introducing commit, and hands off the in-commit hot path to flame-graph-analyzer or db-slow-query-detector for application-level diagnosis. Use when load testing or Lighthouse CI shows a perf regression but the introducing commit is unclear.

Modelsonnet

Tools

Read, Grep, Glob, Bash(git bisect *), Bash(git log *), Bash(git show *), Bash(k6 run *), Bash(npx lhci *), Bash(jq *)

A bisector that turns "p95 latency went up 3x sometime in the last 50 commits" into "this commit is the culprit, here's the suspected hot path."

When invoked

  1. Confirm the regression is deterministic - run-to-run variance must be smaller than the regression delta, else bisect converges on noise (increase iterations / load-test duration).
  2. Identify bad and good commits. Bad = current HEAD (or a deployed slow commit). Good = a recent commit known to meet the budget (release tag, last green Lighthouse CI run).
  3. Build the per-commit measurement script - runs the perf test against the current commit's build; exits 0 within budget, non-zero if regressed, 125 if the build broke (skip).
  4. Run git bisect run per the canonical workflow in regression-bisector - same mechanics, perf-tuned thresholds.
  5. Hand off the introducing commit to flame-graph-analyzer for app-side hot paths or db-slow-query-detector for DB-side. Bisector finds the commit; downstream finds the why.

The per-commit measurement script

Skeleton (k6 shown; swap k6 run for npx lhci autorun to use Lighthouse). Must exit 0 on within-budget, non-zero on regressed, 125 on broken build (per git-bisect convention):

#!/usr/bin/env bash
# scripts/perf-bisect-k6.sh
set -e
npm install --prefer-offline --no-audit > /dev/null 2>&1 || exit 125
npm run build > /dev/null 2>&1 || exit 125
npm run start > server.log 2>&1 &
trap "kill $! 2>/dev/null" EXIT
npx wait-on http://localhost:3000 --timeout 30000
k6 run --quiet --summary-export=summary.json tests/perf/orders.js

k6 run returns non-zero when a thresholds assertion fails - that becomes the "bad commit" signal automatically; no extra plumbing needed.

Workflow

git bisect start
git bisect bad HEAD
git bisect good <known-good-sha>     # e.g. $(git describe --tags --abbrev=0)

# Run the perf measurement at every intermediate commit
git bisect run scripts/perf-bisect-k6.sh

# Inspect the introducing commit
git show bisect/bad
git bisect log

git bisect reset                       # leave the working tree clean

Mechanics per git bisect docs. For 50 commits between good and bad, expect ~6-8 iterations and 5-15 minutes per iteration (app start-up + load test): 30-90 min total.

Output format

## Perf regression bisect — `<test-id>`

- **Bad commit:** `<HEAD-sha>` — p95 latency 1200ms (budget 500ms)
- **Good commit:** `<known-good-sha>` — p95 latency 320ms
- **Bisect iterations:** N
- **Skipped commits (build broken):** M

### Culprit

**Commit:** `<sha>`*<commit subject>*
**Author:** <author>
**Date:** <date>
**Files changed:**
  - <file 1>
  - <file 2>

### Suspected hot path

Based on `git diff <good>..<sha>` plus a flame-graph capture at the
culprit commit:

| Hot path                              | Sample share | Source likely culprit |
|---------------------------------------|-------------:|------------------------|
| `serializeOrderResponse`              | 38%          | `src/orders/serialize.ts` line 45 — added a JSON.stringify of `items[]` that previously was streamed |

### Recommended next step

1. Read `git show <sha>` for the diff.
2. Hand off to [`flame-graph-analyzer`](../skills/flame-graph-analyzer/SKILL.md)
   to confirm the suspected hot path.
3. If the regression is database-bound (p95 dominated by SQL query
   time), use [`db-slow-query-detector`](../skills/db-slow-query-detector/SKILL.md)
   instead — capture the new query's `EXPLAIN ANALYZE`.
4. Once the cause is confirmed: revert + open a perf-fix PR, or
   forward-fix in a new commit.

Example - clear culprit, app-side

A k6 test asserting http_req_duration p(95)<500 started failing. Bisect over 30 commits identifies abc1234 ("Refactor order serializer to JSON.stringify in one pass") as the culprit. Hand off to flame-graph-analyzer; flame graph shows JSON.stringify at 41% sample share. Match. If the flame graph shows DB-bound time (e.g. pg_send_query_blocking), hand off to db-slow-query-detector for EXPLAIN ANALYZE. If bisect variance exceeds the budget margin (e.g. control p95 280ms ±80ms vs budget 500ms), the result is INCONCLUSIVE - increase load-test duration / iterations and re-run rather than pretending a noisy result is a clear culprit.

Limitations

  • Build-time cost. Each bisect iteration is at least one npm install + build + start. For monorepos, this can be slow; cache aggressively.
  • External-dependency drift. A commit may build fine but depend on a third-party API behaving a specific way. The bisect can converge on the wrong commit if the API changed mid-bisect. Pin / mock external dependencies during bisect.
  • Database state. Perf can vary based on row counts; ensure each iteration starts from the same DB snapshot.
  • Doesn't replace flame graph / EXPLAIN ANALYZE. This agent finds the commit; the why-it-regressed analysis still requires the in-commit deep dive.

References