data-quality-gate
Builds a release-readiness gate for a data pipeline by gathering check results from one or more engines (dbt, Great Expectations, Soda), applying severity-aware pass/fail thresholds, and emitting a single go / no-go decision with per-check rationale. Use when authoring a CI step that must fail the build when data quality drops below thresholds.
data-quality-gate
Overview
The skills in this plugin (dbt-testing, great-expectations, soda-checks) each produce their own per-check output: dbt writes run_results.json (dbt-run-results), Great Expectations returns a JSON result with a success flag and a results list (gx-run-validation-definition), and Soda emits a per-check pass/fail summary in stdout (and to Soda Cloud) (sodacl-overview).
This skill builds a single release-readiness gate that consumes those heterogeneous outputs, applies severity-aware thresholds, and emits one go / no-go decision the CI pipeline can act on. The skill is deliberately engine-agnostic at the surface so a project can mix engines (e.g. dbt for transformation tests + GX for ingestion validation + Soda for cross-team observability) and still ship a single gate.
When to use
If a project uses only one engine and does not need severity tiering, prefer that engine's native CI integration directly (see the relevant skill's "CI integration" section) - this gate adds machinery you do not need.
Step 1 - Identify your sources
Enumerate every check-emitting engine the gate must consume. For each:
| Engine | Result artifact | Schema |
|---|---|---|
| dbt | target/run_results.json | .results[] with unique_id, status, failures, message (dbt-run-results) |
| GX | Python object from validation_definition.run() or checkpoint.run() - has success: bool plus results[] of per-expectation outcomes (gx-run-validation-definition) | |
| Soda | stdout summary from soda scan; non-zero exit on any failure (sodacl-overview) | |
| Other | custom - must be flattened into the unified shape below |
Persist each engine's raw artifact as a CI build artifact (matching the pattern from each skill's "CI integration" section) so the gate input is reproducible and triageable.
Step 2 - Define the unified check record
Flatten every engine's result into one record shape:
{
"check_id": "dbt.test.orders.unique_order_id",
"engine": "dbt",
"subject": "orders.order_id",
"status": "fail",
"severity": "error",
"failures": 12,
"message": "12 unique-key violations",
"ratchet": false,
"owner": "@data-platform"
}| Field | Source |
|---|---|
check_id | engine-prefixed unique ID (dbt.test.<unique_id>, gx.<suite>.<expectation_type>, soda.<dataset>.<check_text>). |
engine | dbt / gx / soda / custom. |
subject | <dataset>.<column> or <dataset> for table-level checks. |
status | pass / fail / warn / error (engine-specific values normalized). |
severity | error (gate-blocking) or warn (gate-tolerable). Source: dbt severity: config; GX meta block convention; Soda warn: / fail: blocks. |
failures | row count for row-level checks; 0/1 for boolean checks. |
message | human-readable failure message. |
ratchet | optional - true if the failure existed before the ratchet date and is grandfathered. |
owner | optional - team/handle responsible for the dataset. |
Step 3 - Apply the gate decision rule
Pseudocode:
def gate_decision(records, *, allow_warn_failures=True):
blockers = [
r for r in records
if r["status"] in ("fail", "error")
and r["severity"] == "error"
and not r.get("ratchet", False)
]
warnings = [
r for r in records
if (r["status"] in ("fail", "error") and r["severity"] == "warn")
or r["status"] == "warn"
]
return {
"verdict": "no-go" if blockers else "go",
"blocker_count": len(blockers),
"warning_count": len(warnings),
"blockers": blockers,
"warnings": warnings,
}The default is strict-but-warn-tolerant: any non-ratcheted error-severity failure blocks; warn-severity failures and ratcheted records show in the report but do not block.
For a stricter mode (no warn tolerance), set allow_warn_failures=False and treat warning_count > 0 as a blocker.
Step 4 - Emit the artifact
The gate produces a markdown summary suitable for both $GITHUB_STEP_SUMMARY and Soda Cloud / Slack pipelines:
# Data Quality Gate — verdict: NO-GO
**Blockers: 2**
| Engine | Subject | Check | Failures | Owner |
|--------|----------------------|------------------------|---------:|------------------|
| dbt | orders.order_id | unique | 12 | @data-platform |
| gx | orders.discount_pct | ExpectColumnValuesToBeBetween | 4 | @analytics-eng |
**Warnings: 1**
| Engine | Subject | Check | Failures | Owner |
|--------|-------------|---------------------|---------:|-------------|
| soda | customers | row_count > 0 | 0 | @platform |Plus a JSON sibling for downstream consumers:
{
"verdict": "no-go",
"blocker_count": 2,
"warning_count": 1,
"blockers": [...],
"warnings": [...]
}A no-go verdict exits non-zero so the CI pipeline halts.
Worked example: minimal Python implementation
# scripts/run_quality_gate.py
import json, sys, subprocess
from pathlib import Path
records = []
# Source: dbt run_results.json
dbt_path = Path("target/run_results.json")
if dbt_path.exists():
rr = json.loads(dbt_path.read_text())
for r in rr.get("results", []):
if not r["unique_id"].startswith("test."):
continue
records.append({
"check_id": f"dbt.{r['unique_id']}",
"engine": "dbt",
"subject": r["unique_id"].split(".")[-1],
"status": "fail" if r["status"] == "fail" else "pass",
"severity": "error", # dbt severity defaults to error
"failures": r.get("failures") or 0,
"message": r.get("message") or "",
})
# Source: Great Expectations result (deserialized from JSON dump)
gx_path = Path("gx/result.json")
if gx_path.exists():
gx = json.loads(gx_path.read_text())
for r in gx.get("results", []):
cfg = r.get("expectation_config", {})
records.append({
"check_id": f"gx.{cfg.get('type', 'unknown')}",
"engine": "gx",
"subject": cfg.get("kwargs", {}).get("column", ""),
"status": "pass" if r.get("success") else "fail",
"severity": cfg.get("meta", {}).get("severity", "error"),
"failures": r.get("result", {}).get("unexpected_count", 0),
"message": cfg.get("type", ""),
})
# Source: Soda scan stdout (parsed line-by-line)
soda_path = Path("scan.log")
if soda_path.exists():
for line in soda_path.read_text().splitlines():
if line.strip().startswith("FAIL"):
records.append({
"check_id": f"soda.{line.strip()}",
"engine": "soda",
"subject": "",
"status": "fail",
"severity": "error",
"failures": 1,
"message": line.strip(),
})
# Apply gate
blockers = [r for r in records if r["status"] == "fail" and r["severity"] == "error"]
verdict = "no-go" if blockers else "go"
print(f"# Data Quality Gate — verdict: {verdict.upper()}")
print(f"\nBlockers: {len(blockers)}\n")
for r in blockers:
print(f"- {r['engine']} :: {r['subject']} :: {r['check_id']} ({r['failures']} failures)")
sys.exit(0 if verdict == "go" else 1)Wire into CI after every engine step has produced its artifact:
# .github/workflows/quality-gate.yml (excerpt)
- run: dbt build || true # don't fail yet — let gate decide
- run: python scripts/run_gx_gate.py || true
- run: soda scan -d warehouse -c configuration.yml checks.yml > scan.log || true
- run: python scripts/run_quality_gate.pyThe || true lets each engine emit its artifact even on failure; the final gate is the single source of CI truth.