giskard-tests
Test ML models with Giskard's scan() vulnerability detector + test catalog (performance, robustness, fairness, data leakage, ethical issues) for tabular and NLP models. Wrap a prediction function in giskard.Model + a DataFrame in giskard.Dataset; emit test suites that pass/fail in CI.
giskard-tests
Giskard wraps any prediction function and DataFrame, then runs a scan() that surfaces "performance biases, unrobustness, data leakage, stochasticity, underconfidence, ethical issues" per the Giskard tabular quickstart.
When to use
Step 1 - Install
pip install giskard --upgradePer the Giskard tabular quickstart.
Step 2 - Wrap the dataset
from giskard import Dataset
giskard_dataset = Dataset(
df=raw_data,
target=TARGET_COLUMN,
name="Titanic dataset",
cat_columns=CATEGORICAL_COLUMNS,
)cat_columns matters - Giskard treats categoricals differently for slicing + drift detection.
Step 3 - Wrap the model
from giskard import Model
import numpy as np
import pandas as pd
def prediction_function(df: pd.DataFrame) -> np.ndarray:
preprocessed_df = preprocessing_function(df)
return classifier.predict_proba(preprocessed_df)
giskard_model = Model(
model=prediction_function,
model_type="classification",
name="Titanic model",
classification_labels=classifier.classes_,
feature_names=FEATURE_NAMES,
)The prediction_function returns probabilities (not class labels) for classification - required by Giskard's calibration checks.
Step 4 - Scan for vulnerabilities
from giskard import scan
results = scan(giskard_model, giskard_dataset)
results.to_html("scan_report.html")Per the Giskard tabular quickstart, scan covers categories: performance bias, unrobustness, data leakage, stochasticity, underconfidence, ethical issues. HTML report is artifact-friendly for CI.
Step 5 - Generate a test suite from scan
test_suite = results.generate_test_suite("My first test suite")
suite_results = test_suite.run()
if not suite_results.passed:
raise SystemExit("Giskard test suite failed; see report")Step 6 - Add specific tests from catalog
from giskard import testing
test_suite.add_test(
testing.test_f1(
model=giskard_model,
dataset=giskard_dataset,
threshold=0.7,
)
)
# Slicing test: F1 must hold on a subset
female_slice = giskard_dataset.slice(lambda df: df[df.sex == "female"])
test_suite.add_test(
testing.test_f1(
model=giskard_model,
dataset=female_slice,
threshold=0.65,
)
)
test_suite.run()Catalog includes test_f1, test_accuracy, test_recall, test_drift_*, metamorphic transformations. Reference the Giskard tabular quickstart for the current full list.
Step 7 - CI integration
- name: Giskard scan
run: |
python ml/giskard_scan.py
# Script raises SystemExit on failure
- name: Upload Giskard report
uses: actions/upload-artifact@v4
with:
name: giskard-report
path: scan_report.htmlAnti-patterns
| Anti-pattern | Why it fails | Fix |
|---|---|---|
Skip cat_columns parameter | Categorical features treated as numeric; bogus drift | Always pass cat_columns (Step 2) |
Wrap a predict() (classes) instead of predict_proba() (probs) | Calibration tests cannot run | Use predict_proba for classification (Step 3) |
| Run scan once, don't add to suite | One-off finding never re-checked | Generate suite from scan (Step 5); CI gates re-run |
| Block CI on every minor scan finding | Noise; team disables Giskard | Set per-test threshold; gate on critical+major only |
| Reuse training dataset for scan | False sense of robustness; scan needs unseen data | Use held-out test split |