hiring-rubric-author
Build-an-X workflow that produces a per-role QA hiring rubric - takes a role description (manual QA / SDET / automation engineer / test lead / quality manager) plus the question bank from `interview-question-author` and emits a competency-anchored scoring rubric with 4-level behavioral anchors (no-hire / borderline / hire / strong-hire) per competency. Distinct from `interview-question-author` (sibling skill that produces the questions) and from `calibration-guide-author` (sibling that produces the gold-standard answer guide). Use after the question bank exists and before the first interview is scheduled - the rubric is what brings interviewer scoring into agreement.
hiring-rubric-author
Overview
Without a rubric, two interviewers asking the same question produce different scores; the literature on structured interviewing is clear that the questions alone are not sufficient - the scoring rubric is what converts them into a comparable signal. This skill produces the rubric half of the structured-interview pair.
Anchored rubrics outperform free-form scoring because the anchor descriptions at each level (no-hire / borderline / hire / strong-hire) constrain what each score means. An interviewer who reads "level 3: candidate explains the AAA pattern with a worked example and identifies one of: assertion strength, mocking pitfalls, or fixture coupling" cannot drift the score on tone or rapport - the anchor is concrete.
When to use
Do not use this skill to:
Step 1 - Capture the inputs
Required:
| Input | Notes |
|---|---|
| Role + seniority | Same as the upstream question bank - manual QA / SDET / automation / test lead / quality manager × junior / mid / senior / staff+ |
| Question bank | The output of interview-question-author. Each question's competency tag drives the rubric's competency-by-question matrix. |
| Team's competency model | Optional. If absent, defaults to the ISTQB-aligned default model in Step 2. |
Step 2 - Pick the competency dimensions
A QA hiring rubric scores against 5 - 8 competency dimensions. The default set (drawn from ISTQB Foundation Level v4.0 competencies and adapted to interviewable behaviour) per role:
manual-qa-engineer / qa-automation-engineer
sdet
test-lead
quality-manager
The skill emits the dimensions selected for the role; the team can add or remove dimensions before locking the rubric.
Step 3 - Author the 4-level anchors per dimension
For each (competency × question) cell, the rubric needs four behavioural anchors. The anchor describes what the candidate said or did, not what the interviewer felt - this is the load-bearing principle that reduces interviewer noise.
### Test analysis & design — Q3 (Behavioral, STAR: late-defect catch)
| Score | Anchor (what the candidate said / did) |
|---|---|
| **1 — no hire** | Cannot articulate a partition / boundary / decision-table technique. Describes the catch as "I just got lucky." Or attributes the catch to a tool ("the linter caught it"). |
| **2 — borderline** | Names one ISTQB technique correctly but cannot apply it to the catch they describe. STAR is partial: missing Result or missing the candidate's specific Action (says "we" throughout). |
| **3 — hire** | Identifies the specific technique that caught the defect (e.g., "we had no negative test for the empty-cart case — equivalence partitioning would have flagged it"). STAR complete: situation, task, the candidate's specific action, measurable result + retro learning. |
| **4 — strong hire** | Generalises beyond the specific defect: identifies a systemic gap (e.g., "we had no convention requiring a negative test per public method; I added that to our `test-code-conventions` doc"), and ties the change to a measurable downstream improvement. |
**Probe-trigger:** If the candidate scores 2 on STAR completeness, probe for the missing component; do not deduct further on the second pass.
**Time-budget impact:** A score of 4 typically takes 2 extra minutes; budget accordingly.Each anchor is concrete enough that two interviewers reading the same transcript would arrive at the same score - that is the only test of the anchor's quality.
Step 4 - Compute the role-level summary score
The rubric outputs a per-dimension score and a summary recommendation. The summary is not a simple average:
| Per-dimension scoring rule | Summary recommendation |
|---|---|
| All dimensions ≥ 3, ≥ 1 dimension at 4 | Strong hire |
| All dimensions ≥ 3 | Hire |
| 1 dimension at 2, all others ≥ 3 | Borderline - debrief required |
| ≥ 2 dimensions at 2, no 1s | No hire - competency gap |
| Any dimension at 1 | No hire - fundamental gap |
The summary refuses to average across competencies - a candidate weak in defect lifecycle and strong in tooling depth is not "average"; the role demands both. Per-dimension floors are the load-bearing constraint.
Step 5 - Emit the rubric
The output is a single markdown document with:
## HAND-OFF — required next steps
1. Pair with `calibration-guide-author` to produce gold-standard model answers and common pitfalls per question — without those, the anchors here are aspirational.
2. Run a calibration interview (one panel scores the same recorded interview together) before the first real candidate. Per the structured-interview research, calibration is the dominant variable in inter-rater agreement.
3. Lock the rubric at the start of the hiring round; mid-round changes invalidate prior candidates' scores.
4. After the round, run `defect-trend-narrator`-style retro on the rubric: which competencies discriminated; which were noise; which scored everyone at 3 (a sign the anchor is too generous).Anti-patterns
| Anti-pattern | Why it fails | Fix |
|---|---|---|
| Free-text "1 - 5 score" with no anchors | The score is the interviewer's opinion, not a behavioural observation. | Step 3 anchors are mandatory; no anchorless dimensions. |
| Anchors that describe the interviewer's feeling ("I was impressed", "the candidate seemed confident") | Tone signals; not behaviour. Interviewer noise is the dominant source. | Anchors describe what the candidate said or did verbatim. |
| Averaging dimension scores into a summary | Hides the load-bearing competency gaps. | Step 4's per-dimension floor; no averages. |
| Using the same rubric across seniority levels | A senior candidate at "score 3" is mid-level performance for that role; the absolute number means different things. | Per-seniority anchors; junior-3 ≠ senior-3. |
| Rubrics with 10+ dimensions | Interviewer can't hold them all; scoring fragments. | Cap at 5 - 8 dimensions. |
| Rubric authored without the question bank | Anchors drift from the actual questions; scoring becomes generic. | Step 1 hard-requires the question bank as input. |
| "Cultural fit" as a dimension | Documented bias amplifier; legally fraught. | Use the team's Definition of Done / engineering values translated into behavioural anchors instead. |