Precision in PROOF Debriefs: The Debriefer's 60-Second Test

TestlandJuly 2, 2026

Run the debriefer's 60-second test on every field of a PROOF debrief before the report ships: five checks that separate action-ready fields from filler.

The debriefer's 60-second test, in seconds: Past, Results, Obstacles, Outlook and Feelings each get about a minute before a PROOF report ships.

A PROOF report can hit every field in the canonical order, Past, Results, Obstacles, Outlook, Feelings, and still come back to the tester with a follow-up question. Getting the shape right isn't passing the report. A session runs one to two hours; the debriefer's read on whether a single field is sharp enough to act on takes about a minute. This post assumes the reader already runs SBTM sessions, writes charters, and knows the PROOF order from Session-Based Test Management Is the Audit Trail Exploratory Testing Needed. One reusable test decides that minute-long verdict, repeated five times per report. This post packages it as a checklist a tester or debriefer can run before the report ships.

Why a PROOF report still gets bounced back

A PROOF report is the short write-up a tester files at the end of an exploratory testing session; the debrief that follows is where a lead acts on it. A report can follow the canonical order exactly and still fail the reader. The order is structural, not evaluative: it tells a debriefer where to look, not whether what's there is usable.

The use case is concrete. A tester finishes a charter session, writes the report under the five headings, and brings it to a debrief with a lead or peer. What hangs on that write-up is every downstream decision: whether a finding blocks the release, whether the area earns a follow-up session, and what the next charter should cover. A report the debriefer can't act on stalls all three.

The five fields themselves take one line each to define, per Wikipedia's session-based testing entry: Past is what happened during the session, Results is what was achieved, Obstacles is what got in the way of good testing, Outlook is what still needs to be done, and Feelings is how the tester feels about all this. Together they are the debriefing agenda Jonathan Bach built into SBTM in 2000.

"This is the seminal article on Session-Based Test Management, written by my brother Jon and I based on the process we pioneered at Hewlett-Packard," says James Bach, co-creator of Session-Based Test Management at Satisfice. The 2000 paper defines the fields; it does not grade them. The real failure mode isn't a missing field: it's a debriefer reading a complete report and picking up the phone to ask a question the report should already answer.

The debriefer's 60-second test, defined

Call it the Debriefer's 60-Second Test. For each PROOF field, in canonical order, the debriefer answers one question before moving to the next. A debrief is already defined as "a short discussion between the manager and tester (or testers) about the session report"; the 60-second test turns "short" into five separate, testable checks instead of a shared assumption.

Field	The 60-second check
Past	Can the debriefer picture what was actually run, without re-reading the charter?
Results	Can the debriefer state each finding's severity and reproducibility without opening a ticket?
Obstacles	Can the debriefer tell if this is a one-off or a pattern across sessions?
Outlook	Can the debriefer name the next charter's scope from this field alone?
Feelings	Can the debriefer tell which specific finding the tester is confident vs. unsure about?

The test runs twice per report: once by the tester, as a pre-send self-check, and once by the debriefer, as the bounce-back criterion. A field that fails either read gets rewritten before the debrief starts, not during it.

Passing the 60-second test, field by field

One scenario runs through all five checks: a CSV bulk-import screen for contact lists, charted with four files, clean, malformed-email, 5,000-row stress, and UTF-8 accented names, mission: find where partial-row failures diverge from the import summary.

Past: what happened, not what the charter promised

The check: can the debriefer picture what was actually run, without re-reading the charter?

Weak:
Tested CSV import.

Sharp:
Uploaded 4 CSV variants (clean, malformed-email, 5,000-row
stress, UTF-8 accented) against the contacts importer.

The weak version fails because the debriefer can't picture coverage from it: "Tested CSV import" could mean one file or ten. The sharp version names the inputs, so a debriefer who never saw the charter can tell what ground the session covered.

Results: findings, not a link dump

The check: can the debriefer state each finding's severity and reproducibility without opening a ticket?

Weak:
Found some bugs, see JIRA-2201, JIRA-2202, JIRA-2204.

Sharp:
Two confirmed bugs: malformed-email rows silently skip instead
of failing the row (JIRA-2201); accented names get mangled on
re-export after a successful import (JIRA-2202). Both reproduce
on the 10-row and 5,000-row files.

A ticket number is a pointer, not a finding. Results earns its place once raw observations become two or three synthesized claims, the same shift observability tooling makes for production data. The debriefer shouldn't need three tab switches to know what shipped broken.

Obstacles: recurring friction, not a one-day complaint

The check: can the debriefer tell if this is a one-off or a pattern across sessions?

Weak:
Import kept timing out today, annoying.

Sharp:
The staging import endpoint has hit its 30s timeout on every
file over 2,000 rows across the last three sessions charting
this screen, not just today. Needs a fixture or a staging-side
fix before further sessions are useful.

"Annoying" is a mood, not an obstacle report. Obstacles flags friction the next session will hit again, not a one-day complaint. The sharp version names a session count, a threshold, and a fix path, giving the debriefer enough to escalate instead of shrugging it off.

Outlook: a scoped gap list, not the whole backlog

The check: can the debriefer name the next charter's scope from this field alone?

Weak:
Still need to test more edge cases, error handling, file
types, larger files, concurrent imports, permissions, and the
export path too.

Sharp:
Two items left in this charter's scope: duplicate-email
handling within one file, and the partial-failure summary
shown after a mixed-success import. Concurrent imports and the
export path belong to a separate charter.

An unbounded Outlook means the charter was scoped too broadly, not that the tester ran out of time. Seven items across seven surfaces is a backlog, not a next session. The sharp version leaves two items a debriefer can hand back as the charter's next mission.

Feelings: a risk call, not a mood

The check: can the debriefer tell which specific finding the tester is confident versus unsure about?

Feelings records calibrated judgment about a specific finding, not general sentiment about the session.

Weak:
Import feels shaky overall.

Sharp:
Confident the malformed-row bug is real and worth blocking on.
Less confident about the encoding bug: only reproduced on one
machine's file, so worth one retest before writing it up as a
P1.

"Shaky overall" gives the debriefer nothing to act on: not which bug, not what to check next. The sharp version attaches a confidence level to each named finding, so a debriefer can decide where to spend the next retest.

Applying the PROOF debrief outside a charter

The PROOF shape works past the charter it was built for. Nothing in Bach's original proposal extends it to non-charter investigations; this is a practitioner adaptation, not a citation. Applied to a systematic flaky-test investigation, the same five checks separate a report that answers questions from one that invites them:

FLAKY-TEST INVESTIGATION: test_checkout_retry
PAST: Reran the test 40x locally and 20x in CI; diffed network
      timing logs.
RESULTS: Root cause is a race between the retry click and a
      toast animation, not the API.
OBSTACLES: CI-only failure; local repro rate was 1 in 40 vs
      CI's 1 in 20.
OUTLOOK: Two more flaky tests share the same toast-animation
      pattern; not yet checked.
FEELINGS: Confident in the root cause; unsure whether the fix
      belongs in the test or the app.

Run the 60-second test against each line and it passes the same way a charter report does: specific inputs, a named root cause, a bounded next step, and a confidence level attached to one claim. The shape transfers even without a charter.

Five questions about writing tighter PROOF fields

What if a PROOF report's Results field genuinely needs more than one paragraph?

Let it run long when findings are genuinely distinct, but lead each one with a one-line severity and reproducibility summary. The 60-second check is a per-finding check, not a per-field word limit: a debriefer who can state each finding's status from the first line still passes it.

How do you trim an overloaded Outlook field without dropping real gaps?

Split it. Keep what belongs to the next charter, and move the rest into a backlog note or a separate charter proposal. A field listing seven surfaces is naming seven future charters; write it as candidates and let the debriefer pick one instead of inheriting all seven.

Is "none" ever an acceptable Obstacles entry in a PROOF report?

Yes, when it's true and specific. "None" for a session with no environment, data, or tooling friction is a real, checkable claim. It fails only as a placeholder for a skipped field. A debriefer can tell the difference by asking whether obstacles were actually checked for.

How does a debriefer bounce a PROOF report without turning debrief into a grading exercise?

Bounce a field, not the whole report, and ask a specific question instead of a verdict: which of the two Results bugs reproduces on the stress file, not "Results is weak." The 60-second test already names what failed, so the bounce-back reads as one targeted question, not a rewrite request.

Does the 60-second test change when the PROOF report goes to a non-technical stakeholder?

The five checks stay the same; only the vocabulary in a passing answer changes. A non-technical stakeholder still needs to picture what ran, state each finding's severity, and name a scoped next step, without ticket IDs or endpoint names. If the report needs a simplified version for that audience, write one instead of diluting the original.

Run this checklist against the last PROOF report sent, before the next one ships. For a weak charter or session, not just a weak report, revisit the full SBTM structure. LLM-assisted drafting will make PROOF fields shape-compliant faster, not action-compliant. The debriefer's 60-second read stays the actual gate.