hypothesis-testing
Authors property-based tests in Python using Hypothesis - wires `@given` with `strategies` (`st.integers`, `st.text`, `st.lists`, `st.from_regex`, `st.composite`), uses `assume()` / `.filter()` for preconditions, configures via `@settings(max_examples=..., deadline=...)`, and exploits Hypothesis's automatic shrinking to find the falsifying example. Integrates with pytest fixtures + parametrize. Use when a Python project needs PBT to catch edge cases the example-based tests miss - bug clusters around input ranges / boundary values / interaction between fields.
hypothesis-testing
Overview
Hypothesis is the canonical Python property-based testing library (hyp-quickstart). Per ISTQB, property-based testing is "a test approach in which test results are verified using specified relations between inputs and expected results of a test case."
The shape: instead of asserting add(2, 3) == 5, assert add(a, b) == add(b, a) for all a, b. Hypothesis generates the inputs and shrinks failures to the simplest reproducer.
"The framework automatically shrinks failing inputs to find the simplest reproduction case." (hyp-quickstart)
When to use
If only one or two specific examples need verification, parametrize or fixtures suffice - Hypothesis is overkill.
Step 1 - Install
pip install hypothesisPin a version in requirements-dev.txt / pyproject.toml. Hypothesis is well-maintained but adds 1-2 seconds to test runtime per generated case; default 100 examples per @given.
Step 2 - Basic property test
Per hyp-quickstart:
from hypothesis import given, strategies as st
@given(st.integers(0, 100))
def test_example(n):
assert n < 50The decorator runs the test 100 times with random n in [0, 100]. On failure, Hypothesis reports the falsifying example (the smallest n that violates the assertion) - typically 50 here.
Step 3 - Strategies catalog
Per hyp-quickstart, built-in generators:
| Strategy | Generates | Useful for |
|---|---|---|
st.integers(min, max) | Bounded / unbounded integers | Numeric inputs. |
st.floats(min, max, allow_nan, allow_infinity) | Floats with optional special-value handling | Numeric edge cases (NaN, ±Inf, denormals). |
st.text(alphabet, min_size, max_size) | Strings | Text inputs. |
st.binary(min_size, max_size) | Bytes | Binary protocol inputs. |
st.lists(elements, min_size, max_size, unique) | Lists | Collection inputs. |
st.dictionaries(keys, values) | Dicts | Map inputs. |
st.tuples(*element_strategies) | Tuples | Multi-field inputs. |
st.from_regex(pattern, fullmatch=True) | Strings matching a regex | Format-validated inputs (emails, dates). |
st.sampled_from(iterable) | One of a fixed set | Enum-like inputs. |
st.builds(callable, **kwargs) | Construct objects from strategies | Domain objects. |
st.composite (decorator) | Custom strategy combining draws | Dependent fields. |
Step 4 - Composite strategies (dependent fields)
Per hyp-quickstart, @st.composite lets later values depend on earlier ones via draw():
from hypothesis import strategies as st
@st.composite
def valid_dates(draw):
year = draw(st.integers(1900, 2100))
month = draw(st.integers(1, 12))
if month in (1, 3, 5, 7, 8, 10, 12):
day = draw(st.integers(1, 31))
elif month in (4, 6, 9, 11):
day = draw(st.integers(1, 30))
else:
# Feb: account for leap years
max_day = 29 if (year % 4 == 0 and year % 100 != 0) or year % 400 == 0 else 28
day = draw(st.integers(1, max_day))
return date(year, month, day)
@given(valid_dates())
def test_date_round_trip(d):
assert date.fromisoformat(d.isoformat()) == dThe draw() call requests a value from a strategy; the composite returns the constructed value.
Step 5 - Filtering and assumptions
Two ways to constrain inputs:
# Filter at strategy level (preferred — Hypothesis can sample efficiently)
@given(st.integers(0, 100).filter(lambda x: x % 2 == 0))
def test_even_squares(n):
assert (n * n) % 2 == 0
# Filter at test level (fallback when filtering depends on multi-input)
from hypothesis import assume
@given(st.integers(), st.integers())
def test_division(a, b):
assume(b != 0) # discard cases where b == 0
result = a // b
assert result * b + (a - result * b) == aPer hyp-quickstart, use .filter() at the strategy level when possible (Hypothesis can sample efficiently). Use assume() inside the test when the precondition involves multiple inputs.
Heavy filtering is a smell - if 90% of generated cases are discarded, redesign the strategy.
Step 6 - Settings and reproducibility
from hypothesis import given, settings, strategies as st
@settings(max_examples=500, deadline=2000) # 500 cases, 2s deadline per case
@given(st.integers())
def test_expensive(n):
expensive_function(n)Common settings:
| Setting | Default | Use |
|---|---|---|
max_examples | 100 | More cases for higher confidence; budget against runtime. |
deadline | 200 ms | Per-test time budget; None to disable. |
derandomize | False | True = same seed each run; useful for CI determinism. |
phases | all | Disable Phase.shrink to skip shrinking on slow tests. |
verbosity | normal | quiet / normal / verbose / debug. |
For CI, set derandomize=True to make failures reproducible across runs (vs random seed = same property test passes locally, fails on CI mysteriously).
Step 7 - Round-trip and metamorphic properties
Two of the most useful property patterns:
Round-trip
import json
@given(st.dictionaries(st.text(), st.integers()))
def test_json_round_trip(d):
assert json.loads(json.dumps(d)) == dIf decode(encode(x)) == x for all valid x, the encode/decode pair is correct.
Metamorphic (relating two computations on related inputs)
@given(st.lists(st.integers()))
def test_sort_idempotent(xs):
assert sorted(sorted(xs)) == sorted(xs)
@given(st.lists(st.integers()), st.integers())
def test_sort_commutes_with_offset(xs, offset):
sorted_xs = sorted(xs)
sorted_offset = sorted(x + offset for x in xs)
assert all(s + offset == o for s, o in zip(sorted_xs, sorted_offset))Metamorphic tests are powerful when the function's "correct output" is hard to specify but its relationship to other inputs is easy.
Step 8 - pytest integration
Per hyp-quickstart: "Hypothesis works seamlessly with pytest fixtures and parametrize decorators."
@pytest.fixture
def db_conn():
# ... setup ...
yield conn
# ... teardown ...
@given(st.integers(0, 1000))
def test_with_fixture(db_conn, n):
db_conn.insert(id=n)
assert db_conn.find(id=n) == nHypothesis re-runs the test body with new n each time; the fixture is set up once per test (per pytest's normal scope rules, unless it's function-scoped - then once per generated case).
Step 9 - CI integration
- run: pytest --hypothesis-seed=42 # deterministic seed for reproducibility
# OR via @settings(derandomize=True)When a property test fails, the failure includes the falsifying example - copy that into a regression test:
@given(...)
@example(n=42) # the falsifying example from the prior run
def test_my_property(n):
...@example cases run before generated ones; locks the regression in permanently.
Anti-patterns
| Anti-pattern | Why it fails | Fix |
|---|---|---|
Heavy assume() filtering (>50% rejection rate) | Slow; Hypothesis warns; sometimes fails the test entirely. | Restructure the strategy (Step 5). |
| Random seed in CI (default Hypothesis behavior) | Tests pass locally, fail on CI; un-reproducible. | derandomize=True or --hypothesis-seed=<fixed> (Step 6). |
| Asserting on specific generated values | Defeats the property; regression tests should use @example. | Property tests assert relationships; specifics go in @example (Step 9). |
Overly broad strategies (st.text() for an email field) | Wastes generation budget on non-meaningful inputs. | Use st.from_regex(EMAIL_PATTERN) or domain-specific composite. |
Property that's secretly an example test (one assertion on n=10) | No property; just an example. | Re-formulate as a real property (round-trip / metamorphic / invariant). |
max_examples=10000 for a 5-second-per-case test | CI never finishes. | Budget per Total runtime / max_examples calculation. |
| Mocking inside the property test | Mocks don't satisfy properties; defeats PBT. | Use real implementations OR property-test pure functions. |