deepchecks-tests
Run Deepchecks suites (data integrity, train-test validation, model evaluation) on tabular / NLP / vision data + models. Pass `result.passed_conditions()` to CI to gate on regressions; the same checks run during research, CI, and production monitoring per the Deepchecks lifecycle posture.
deepchecks-tests
Deepchecks is "a holistic open-source solution for all of your AI & ML validation needs" per the Deepchecks welcome. Validates data integrity, train-test splits, model evaluation, end-to-end model development from research through production.
When to use
Step 1 - Install
pip install deepchecksPer the Deepchecks welcome page.
Step 2 - Wrap data
For tabular models:
from deepchecks.tabular import Dataset
train_ds = Dataset(
train_df,
label="target",
cat_features=CATEGORICAL_COLUMNS,
)
test_ds = Dataset(
test_df,
label="target",
cat_features=CATEGORICAL_COLUMNS,
)cat_features matters for distribution checks. The Vision and NLP APIs differ - see the Deepchecks welcome section linking quickstarts for each data type.
Step 3 - Run the data integrity suite
from deepchecks.tabular.suites import data_integrity
integrity = data_integrity()
result = integrity.run(train_ds)
result.save_as_html("data_integrity.html")Catches: duplicate rows, missing values, mixed types, conflicting labels, single-value features, string mismatches.
Step 4 - Run the train-test validation suite
from deepchecks.tabular.suites import train_test_validation
validation = train_test_validation()
result = validation.run(train_ds, test_ds)
result.save_as_html("train_test_validation.html")Catches: target drift, feature drift, train-test data leakage, label imbalance, dataset size mismatch.
Step 5 - Run the model evaluation suite
from deepchecks.tabular.suites import model_evaluation
evaluation = model_evaluation()
result = evaluation.run(train_ds, test_ds, model)
result.save_as_html("model_evaluation.html")
if not result.passed_conditions():
raise SystemExit("Deepchecks model evaluation failed")Catches: performance regression vs baseline, weak segments, calibration issues, prediction drift between train and test.
Step 6 - Per-check thresholds
from deepchecks.tabular.checks import FeatureDrift
check = FeatureDrift().add_condition_drift_score_less_than(
max_allowed_categorical_score=0.2,
max_allowed_numeric_score=0.1,
)
result = check.run(train_ds, test_ds)
if not result.passed_conditions():
print(result.value)
raise SystemExit("FeatureDrift failed threshold")Each check has add_condition_* methods; chain them for per-check gating.
Step 7 - CI integration
- name: Deepchecks suite
run: |
python ml/deepchecks_suite.py
- name: Upload Deepchecks reports
uses: actions/upload-artifact@v4
with:
name: deepchecks-reports
path: "*.html"Anti-patterns
| Anti-pattern | Why it fails | Fix |
|---|---|---|
Skip cat_features | All categorical checks misfire | Always specify (Step 2) |
Skip data_integrity suite | Train on leaky / dup-heavy data | Run before train_test_validation (Step 3) |
| Block CI on every check | Hundreds of warning conditions; team disables | Define per-check thresholds (Step 6); gate Critical only |
| Re-run on the SAME test split each PR | Fixed split → fixed results; no drift detection | Use rolling/cross-validation splits |
| Reuse training data as "current" for production monitoring | Always passes drift; blind to real drift | Use real production samples |