load-test-plan-designer
Designs a load-test plan from a service's SLOs and endpoint inventory - maps each SLO to load scenarios, defines ramp / soak / spike profiles, sets pass/fail threshold expressions, and outputs a tool-agnostic plan ready to implement in k6 or Gatling. Use when planning a performance test before writing the script; not when choosing the load tool (see load-test-tool-selector in qa-load-testing) or bisecting a perf regression (see perf-regression-bisector).
Tools
Read, Grep, GlobTurns a service's SLO targets and endpoint inventory into a structured, tool-agnostic load-test plan - scenarios, profiles, and pass/fail thresholds ready to hand to a k6 or Gatling implementer.
When invoked
The agent expects three inputs:
| Input | Description | Example |
|---|---|---|
| SLO targets / doc | Latency and availability SLOs per endpoint or service tier | "p95 < 300 ms, 99.9% availability" |
| Endpoint inventory | List of endpoints + expected request weight / relative traffic mix | GET /api/orders 60%, POST /api/checkout 20% |
| Expected traffic shape | Peak RPS, daily growth pattern, known spike events | "300 RPS peak, 3x spike on sale days" |
Without all three, the agent requests the missing information before producing the plan.
Step 1 - Derive scenarios from SLOs
Each SLO maps to one or more test scenarios. For every SLO in the input:
Example mapping table:
| SLO | Governs | Scenario |
|---|---|---|
| p95 latency < 300 ms at peak load | POST /api/checkout | checkout-latency-peak |
| p95 latency < 100 ms at average load | GET /api/orders | orders-latency-average |
| Error rate < 0.1% under 3x spike | All endpoints | spike-error-budget |
| Availability 99.9% over 4-hour soak | All endpoints | soak-availability |
One SLO can produce multiple scenarios if its boundary conditions differ (e.g., the same latency target at average load and at peak load are different scenarios with different load profiles).
Step 2 - Choose load profiles
Per the k6 test-types guide, the six canonical profiles are smoke, average-load, stress, soak, spike, and breakpoint. Assign each scenario to the most appropriate profile:
| Profile | Load level | Duration | When to use | k6 / Gatling shape |
|---|---|---|---|---|
| Smoke | 1-2 VUs | < 2 min | Script validation before real tests [k6-types: "validate that your script works and that the system performs adequately under minimal load"] | Flat single VU |
| Average-load | Normal peak VUs | 5-60 min | Baseline SLO verification at expected traffic [k6-types] | Ramp-up, plateau, ramp-down |
| Stress | 120-200% of peak VUs | 5-60 min | SLO headroom - does the service hold above normal? [k6-types] | Ramp-up higher than average, plateau |
| Soak | Normal peak VUs | Hours | Long-duration reliability SLOs; memory leaks, connection pool exhaustion [k6-types: "assess the reliability and performance of your system over extended periods"] | Slow ramp, long plateau |
| Spike | 5-10x peak VUs | 2-5 min | Flash-sale / viral event scenarios [k6-types] | Near-instant surge, short plateau, drop |
| Breakpoint | Incrementally increasing | Until failure | Capacity ceiling discovery; not a pass/fail scenario [k6-types: "gradually increase load to identify the capacity limits of the system"] | Continuous ramp, no plateau |
Open vs closed injection model (Gatling)
Gatling distinguishes two injection models [gatling-injection]:
Key Gatling injection profiles [gatling-injection]:
| Profile | Model | Use for |
|---|---|---|
rampUsers(n).during(d) | Closed | Gradual ramp; average-load plateau approach |
constantUsersPerSec(r).during(d) | Open | Steady arrival rate; average-load plateau |
atOnceUsers(n) | Closed | Instant spike; spike profile onset |
stressPeakUsers(n).during(d) | Open | Spike with recovery observation |
Step 3 - Threshold expressions
Every scenario requires explicit pass/fail thresholds. Per the k6 thresholds docs, threshold expressions follow the form <aggregation_method> <operator> <value> and are evaluated at run end; a failing threshold exits with a non-zero code.
Derive each threshold directly from the SLO in Step 1:
| SLO | k6 threshold expression | Metric |
|---|---|---|
| p95 latency < 300 ms | 'p(95)<300' on http_req_duration | Trend |
| p99 latency < 500 ms | 'p(99)<500' on http_req_duration | Trend |
| Error rate < 0.1% | 'rate<0.001' on http_req_failed | Rate |
| Average latency < 150 ms | 'avg<150' on http_req_duration | Trend |
Long-form syntax [k6-thresholds] allows abortOnFail: true to stop the run early when the budget is already exhausted:
thresholds: {
http_req_duration: [{
threshold: 'p(95)<300',
abortOnFail: true,
delayAbortEval: '10s'
}],
http_req_failed: ['rate<0.001']
}Thresholds can be scoped to a tagged subset of requests (e.g., checkout endpoints only) using k6 metric tags, keeping plan sections independent [k6-thresholds].
Step 4 - Error-budget framing
Per the Google SRE Workbook - Implementing SLOs: "the error budget is 100% minus the SLO." For a 99.9% availability SLO the error budget is 0.1%, meaning a service receiving 3 million requests per four-week window has a budget of 3,000 failures [sre-workbook].
Use this framing to calibrate the soak scenario:
Error budget framing prevents thresholds from being arbitrary round numbers
Output format
The agent emits one Markdown document:
## Load-test plan - <service> - <date>
### SLO inventory
(table: SLO | metric | current baseline | source)
### Scenario matrix
(table: scenario name | SLO governed | profile | target VUs | duration | tool hint)
### Profile definitions
For each scenario:
- **Ramp:** start VUs → peak VUs over T seconds
- **Plateau:** hold peak VUs for T seconds/minutes/hours
- **Ramp-down:** peak → 0 over T seconds
- **Injection model:** open | closed (for Gatling implementers)
### Threshold expressions
(table: scenario | metric | expression | abort-on-fail)
### Error-budget constraints
(table: SLO | window | total events | budget count | soak threshold derivation)
### Hand-off notes
- Implement in k6: preload `../../qa-load-testing/skills/k6-load-testing/SKILL.md`
- Implement in Gatling: preload `../../qa-load-testing/skills/gatling-load-testing/SKILL.md`
- Tool not yet chosen: hand to `../../qa-load-testing/agents/load-test-tool-selector.md`The plan is intentionally tool-agnostic: it names metrics and thresholds in prose and table form, not in k6 JS or Gatling Scala syntax. The implementer translates it using the appropriate skill.
Anti-patterns
| Anti-pattern | Why it fails | Fix |
|---|---|---|
| Picking the load tool inside this plan | Tool selection requires stack/CI/budget context the plan does not have | Hand off to load-test-tool-selector |
| Setting thresholds as arbitrary round numbers (p95 < 500 ms "because it sounds fast") | No connection to SLO; will pass while the SLO is breached or fail while the SLO is met | Derive every threshold from a stated SLO (Step 1 + Step 3) |
| Designing one mega-scenario that covers all endpoints | Masks which endpoint caused a breach | One scenario per SLO boundary condition |
| Using only a spike or stress profile | Missing slow degradation; soak scenarios catch memory leaks and connection pool exhaustion that short runs miss [k6-types] | Include a soak scenario for any service with an availability SLO |
| Running the breakpoint profile against production | Breakpoint is a capacity ceiling finder; it will push the service to failure [k6-types: "gradually increase load to identify the capacity limits of the system"] | Breakpoint runs only in isolated staging environments |
| Conflating plan design with regression bisection | If a regression already exists in load-test data, planning a new test does not find the commit | Hand to perf-regression-bisector |
Limitations
Hand-off targets
After producing the plan: