papermill-tests
Use Papermill to parameterize and execute notebooks in CI as regression tests - `papermill input.ipynb output.ipynb -p alpha 0.6` (CLI) or `pm.execute_notebook(...)` (Python API). Pairs with nbval (output assertion) and testbook (function unit tests) for full-coverage notebook QA.
papermill-tests
Papermill executes notebooks programmatically with injected parameters, producing an output notebook with results. Per the Papermill execute docs, it pairs naturally with regression testing: run a parameterized notebook in CI, assert on outputs.
When to use
Step 1 - Install
pip install papermillPer the Papermill execute docs.
Step 2 - Tag the parameters cell
In your notebook, tag one cell with parameters:
# Cell tagged "parameters"
alpha = 0.5
ratio = 0.2
input_path = "data/sales.parquet"Papermill replaces these with injected values at execution time (adds an injected-parameters cell after the tagged cell).
Step 3 - Python API execution
import papermill as pm
pm.execute_notebook(
'path/to/input.ipynb',
'path/to/output.ipynb',
parameters=dict(alpha=0.6, ratio=0.1)
)Per the Papermill execute docs.
Step 4 - CLI execution
# Local in/out
papermill local/input.ipynb local/output.ipynb -p alpha 0.6 -p ratio 0.1
# S3 output
papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1Parameter flags per the Papermill execute docs:
| Flag | Meaning |
|---|---|
-p NAME VAL | Simple parameter (auto-typed) |
-r NAME VAL | Raw string (preserve as string) |
-f file.yaml | Parameters from YAML file |
-y "key: val" | Inline YAML (supports lists, dicts) |
-b base64yaml | Base64-encoded YAML |
Step 5 - Use as regression test
import json
import papermill as pm
import nbformat
def test_analysis_with_known_inputs(tmp_path):
out_path = tmp_path / "out.ipynb"
pm.execute_notebook(
'analysis.ipynb',
str(out_path),
parameters=dict(seed=42, n_samples=1000),
)
nb = nbformat.read(str(out_path), as_version=4)
final_cell = nb.cells[-1]
output_text = final_cell.outputs[0]['text']
result = json.loads(output_text)
assert abs(result['mean'] - 0.5) < 0.01
assert result['n'] == 1000The output notebook is artifact-friendly - attach to CI runs for review when assertions fail.
Step 6 - Parameter sweeps in CI
# GitHub Actions matrix sweep
strategy:
matrix:
seed: [42, 123, 7]
n_samples: [100, 1000]
steps:
- run: |
papermill analysis.ipynb out-${{ matrix.seed }}-${{ matrix.n_samples }}.ipynb \
-p seed ${{ matrix.seed }} \
-p n_samples ${{ matrix.n_samples }}
- uses: actions/upload-artifact@v4
with:
name: papermill-output-${{ matrix.seed }}-${{ matrix.n_samples }}
path: out-${{ matrix.seed }}-${{ matrix.n_samples }}.ipynbStep 7 - Pair with nbval / testbook
| Tool | Strength | Pair with papermill how |
|---|---|---|
| nbval | Full-notebook output regression | Run papermill first (parameter inject) → run nbval on output |
| testbook | Function-level unit tests | testbook can use papermill's executor under the hood - see testbook configuration for execute_kwargs |
Papermill is the engine; nbval and testbook are the assertion layers. Use all three for production notebook QA.
Step 8 - TQDM progress descriptions
Add comments at cell start:
#papermill_description=load_data
df = load_dataset()
#papermill_description=train_model
model.fit(df)Per the Papermill execute docs: integrates with TQDM for meaningful CI progress indicators.
Anti-patterns
| Anti-pattern | Why it fails | Fix |
|---|---|---|
Forget the parameters cell tag | Parameters never inject; notebook runs with defaults | Tag the cell explicitly (Step 2) |
Mix -p and -r types incorrectly | -p version 1.0 becomes float 1.0; loses leading zeros etc. | Use -r for strings (Step 4) |
| Run papermill against side-effect notebooks (writes to prod DB) | Papermill is non-transactional; partial failures leave bad state | Use ephemeral workdirs / staging credentials in test runs |
| Ignore the output notebook (only check exit code) | Subtle errors visible only in cell outputs | Save + inspect output notebook (Step 5); upload as artifact (Step 6) |
| Skip seed parameterization | Tests flake on stochastic models | Always -p seed N for reproducible runs |