Testland
Browse all skills & agents

nfr-extractor

Reads a PRD, design doc, or product brief and pulls out the non-functional requirements (performance, accessibility, security, internationalization, reliability, observability) as concrete, threshold-bound, testable assertions. Maps every NFR to its measurement source (Lighthouse, axe, OWASP ASVS, WCAG criterion, etc.) so the test suite knows what to assert against. Use after acceptance-criteria-extractor handles functional requirements.

nfr-extractor

Overview

ISTQB defines non-functional testing as "A test type to evaluate that a component or system complies with non-functional requirements" (istqb-nft). The eight ISO/IEC 25010:2011 product-quality characteristics define the canonical NFR categories (cite by stable standard ID; iso.org is paywalled / Cloudflare-protected):

ISO/IEC 25010 characteristicCommon shorthandThis skill covers
Functional Suitabilityfunctional(out - handled by acceptance-criteria-extractor)
Performance Efficiencyperformance / perfyes
Compatibilitycompatibilityyes
Usabilitya11y / UXyes (a11y subset)
Reliabilityreliabilityyes
Securitysecurityyes
Maintainability(developer-facing)(out)
Portability(developer-facing)(out)

Plus two practitioner-emergent NFR families that are increasingly treated as first-class:

FamilySource
Internationalization (i18n)W3C i18n best practices
ObservabilityDORA / SRE practitioner literature

Acceptance-criteria-extractor handles functional requirements (Given/When/Then). This skill handles everything else - the threshold-and-measurement requirements that don't fit Gherkin's shape.

When to use

  • A PRD mentions "performance," "fast," "accessible," "secure," or similar - words the testability-reviewer would flag for failing the Observable heuristic.
  • A team is shifting from informal to formal NFRs in tickets.
  • A non-functional regression is being added to a release plan and needs a measurable baseline.

Step 1 - Scan the doc for each NFR family

Per family, look for the canonical signal phrases:

FamilySignal phrases
Performance"fast", "responsive", "within Xms", "p95", "throughput", "latency", "time to"
Accessibility"WCAG", "screen reader", "keyboard navigation", "color contrast", "ARIA"
Security"auth", "encrypt", "OWASP", "CSP", "rate limit", "secret"
Compatibility"browser", "iOS", "Android", "version", "device"
Reliability"uptime", "SLO", "SLA", "graceful degradation", "circuit breaker", "retry"
Internationalization"locale", "RTL", "translation", "Unicode", "timezone"
Observability"metric", "log", "trace", "alert", "dashboard"

Each signal becomes a candidate NFR - but NOT every signal is testable as authored. Run each through Step 2.

Step 2 - Make every NFR threshold-bound

Per ISTQB testability (istqb-testability) and the testability-reviewer heuristics, an NFR is testable only when it has:

  1. A threshold (numeric or boolean).
  2. A measurement source (which tool / metric provides the value).
  3. A scope (which surface / user / scenario it applies to).
Untestable NFRThreshold-bound rewrite
"The site should be fast""p95 page-load time on /dashboard is ≤2.5s, measured by Lighthouse CI under 4G throttling."
"Accessible to screen reader users""WCAG 2.2 AA conformance on /checkout; axe-core scan returns zero serious or critical violations."
"Secure against attacks""OWASP ASVS Level 2 controls 1-7; verified by npm audit --audit-level=high exit zero AND ZAP baseline scan zero alerts."
"Works on all browsers""Functional E2E tests pass on Chrome ≥120, Firefox ≥120, Safari ≥17 (latest two majors per WebRumStats data)."
"Highly available""Uptime ≥99.9% measured monthly via external prober; recovery from any single-AZ failure ≤5 min."

The agent never fabricates a threshold. If the doc says "fast" without a number, the agent emits an explicit question ("What's the target p95? Common defaults: 2.5s for landing, 1.5s for cached, 200ms for API."), it does not invent 2.5s.

Step 3 - Map to canonical measurement sources

Each NFR must name how it's measured. Canonical sources by family:

Performance

MetricSource
LCP (Largest Contentful Paint)Web Vitals (Google) - page-load perception
INP (Interaction to Next Paint)Web Vitals - replaces FID since March 2024
CLS (Cumulative Layout Shift)Web Vitals
p95 / p99 latencyk6 / JMeter / Gatling / load-test runner
ThroughputSame load-test runners

Web Vitals canonical thresholds: LCP ≤2.5s, INP ≤200ms, CLS ≤0.1.

Accessibility

StandardSource
WCAG 2.2 AA / AAAhttps://www.w3.org/TR/WCAG22/
ARIA practiceshttps://www.w3.org/WAI/ARIA/apg/
Toolsaxe-core, pa11y, Lighthouse a11y, IBM Equal Access, WAVE

Per-criterion citations use the WCAG SC ID format (WCAG 2.2 SC 1.4.3 Contrast (Minimum)) so the test report back-references the standard.

Security

StandardSource
OWASP Top 10https://owasp.org/www-project-top-ten/
OWASP ASVShttps://owasp.org/www-project-application-security-verification-standard/
CWEhttps://cwe.mitre.org/
ToolsZAP / Burp / Snyk / Trivy / Semgrep

Compatibility

Source
MDN browser-compat-data (https://github.com/mdn/browser-compat-data)
Can I use (https://caniuse.com/)
Per-team analytics (real device share)

Reliability

ConceptSource
SLO / SLA / SLIGoogle SRE Workbook (free at sre.google/workbook)
Error budgetsSame
Chaos engineering principleshttps://principlesofchaos.org/

Internationalization

Source
W3C i18n Web FAQ (https://www.w3.org/International/)
Unicode CLDR (https://cldr.unicode.org/)
ICU MessageFormat

Output format

# NFRs extracted from `<doc-id>`

**Source doc:** `<path or URL>`
**Date extracted:** YYYY-MM-DD
**NFRs found:** N
**Threshold gaps flagged:** M

## Performance

| ID    | Requirement                                                | Threshold | Measurement | Scope                |
|-------|------------------------------------------------------------|-----------|-------------|----------------------|
| PRF-1 | LCP on /dashboard                                          | ≤2.5s     | Lighthouse CI; Web Vitals 75th-percentile field data | logged-in user, 4G, 1280x800 |
| PRF-2 | p95 latency on POST /api/orders                            | ≤200ms    | k6 load test, 50 RPS for 5 min | production-like staging |

## Accessibility

| ID    | Requirement                                                | Threshold        | Measurement | Scope            |
|-------|------------------------------------------------------------|------------------|-------------|------------------|
| A11Y-1| WCAG 2.2 AA conformance                                    | zero serious/critical violations | axe-core scan | every page in app sitemap |

## Security

| ID    | Requirement                                                | Threshold        | Measurement | Scope                  |
|-------|------------------------------------------------------------|------------------|-------------|------------------------|
| SEC-1 | No high-severity dependency vulnerabilities                | zero high/critical | npm audit --audit-level=high | every PR; production deps |
| SEC-2 | OWASP ASVS Level 2 baseline                                | pass              | ZAP baseline scan + manual ASVS checklist | release readiness gate |

## Compatibility / i18n / Reliability / Observability

(similar tables, only included when the source doc raised them)

## Threshold gaps (HUMAN INPUT REQUIRED)

| Vague phrase                       | Where in doc          | Suggested concretization                                       |
|------------------------------------|-----------------------|----------------------------------------------------------------|
| "fast checkout"                    | line 12               | LCP ≤1.5s (typical for cached transactional flow)?            |
| "supports global users"            | line 28               | Locale list? RTL support? Timezone handling? Unicode in fields? |

The agent does NOT pick the answers — author must confirm.

Examples

Example 1: PRD with mixed NFRs

Input:

"Checkout must feel snappy and work for users worldwide. We need to meet enterprise security requirements."

Output (excerpt):

**NFRs found:** 0 (all phrases are vague)
**Threshold gaps flagged:** 3

| Vague phrase                              | Suggested concretization                                                                  |
|-------------------------------------------|-------------------------------------------------------------------------------------------|
| "feel snappy"                              | LCP ≤2.5s, INP ≤200ms (Web Vitals defaults)?                                            |
| "work for users worldwide"                 | Specify: which locales? RTL? timezones? Unicode validation? CLDR-based number/date formatting? |
| "enterprise security requirements"         | OWASP ASVS Level 2 (typical baseline) or Level 3 (strict)? SOC 2 Type II audit? FedRAMP? |

The output is a question list, not a fabricated set of thresholds. Author returns; agent re-extracts on next pass.

Example 2: PRD with concrete thresholds

Input:

"Page load on /dashboard must be ≤1.5s p95 measured by Lighthouse CI on the GitHub Actions ubuntu-latest runner. Color contrast ratio ≥4.5:1 for normal text per WCAG 2.2 SC 1.4.3."

Output:

## Performance

| ID    | Requirement                                                | Threshold | Measurement | Scope          |
|-------|------------------------------------------------------------|-----------|-------------|----------------|
| PRF-1 | Page load on /dashboard                                    | ≤1.5s p95 | Lighthouse CI on GitHub Actions ubuntu-latest | /dashboard route |

## Accessibility

| ID    | Requirement                                                | Threshold | Measurement | Scope          |
|-------|------------------------------------------------------------|-----------|-------------|----------------|
| A11Y-1| Color contrast ratio for normal text                       | ≥4.5:1    | WCAG 2.2 SC 1.4.3 | every page |

**Threshold gaps:** 0 — all NFRs are threshold-bound and measurement-cited.

Anti-patterns

Anti-patternWhy it failsFix
Inventing thresholds the doc didn't specifyTests pass for fabricated baselines; real users see worse perf.Flag as threshold gap; require author input.
One generic "performance" NFR for every pageDifferent surfaces have different budgets (cached vs non-cached).One NFR per measurable scope (route × viewport × auth state).
Mapping every security NFR to "OWASP Top 10"Top 10 is awareness-tier; ASVS is verifiable. Compliance maps to ASVS.Specify ASVS Level (1/2/3) and which controls.
Treating "uptime SLA" as a test requirementSLA is a runtime / monitoring concern, not a pre-deploy test.Map to monitoring metric + alert rule, not to a CI gate.

References

  • istqb-nft - ISTQB Glossary: non-functional testing.
  • istqb-testability - testability heuristic underlying the threshold-bound rule.
  • ISO/IEC 25010:2011 - eight product-quality characteristics. Cite by stable standard ID; spec is paywalled.
  • W3C WCAG 2.2 - https://www.w3.org/TR/WCAG22/
  • OWASP ASVS - https://owasp.org/www-project-application-security-verification-standard/
  • Web Vitals - https://web.dev/articles/vitals (LCP / INP / CLS).
  • testability-reviewer - the upstream agent that flags vague phrases for this skill to formalize.
  • acceptance-criteria-extractor - sibling skill for functional requirements.