Testland
Browse all skills & agents

sqlmesh-migrations

Authors and runs SQLMesh - data-transformation framework with version control, virtual data environments, automatic breaking-vs-non-breaking change classification, and downstream impact analysis; supports `sqlmesh init` / `plan` / `apply` / `run` / `audit` / `test` lifecycle; covers DuckDB, Postgres, Snowflake, BigQuery, Redshift, Databricks. Use when the user works with SQL data pipelines (warehouse + dbt-adjacent ELT), needs safer model evolution than dbt's deploy-and-pray, or wants the strongest impact-analysis story in the OSS data tooling space.

sqlmesh-migrations

Overview

Per sqlmesh.readthedocs.io/en/stable/quickstart/cli/:

SQLMesh is a data transformation tool that enables version control and testing for SQL pipelines. Its distinguishing features vs schema-only tools (Flyway / Liquibase / Atlas):

  • Virtual data environments: per sm-cli, "A SQLMesh environment is an isolated namespace containing models and the data they generated." Develop changes in a dev env without touching prod data.
  • Breaking-change classification: per sm-cli, SQLMesh categorizes changes as "breaking or non-breaking. Breaking changes invalidate existing data (e.g., adding a WHERE clause). Non-breaking changes preserve validity of existing data (e.g., adding a new column)."
  • Downstream impact analysis: per sm-cli, "SQLMesh automatically detects when modifications to upstream models affect downstream dependents." Both "Directly Modified" and "Indirectly Modified" models surfaced before deploy.

When to use

  • The repo has SQL/Python data models (warehouse-side ELT) and needs version control + impact analysis beyond dbt.
  • The team migrates from dbt and wants stronger schema-evolution semantics (vs dbt's "rebuild-everything" default).
  • The user works with DuckDB, Postgres, Snowflake, BigQuery, Redshift, or Databricks.
  • A staging-to-prod promotion needs explicit breaking-vs-non-breaking review before apply.

Step 1 - Install

Per sm-cli:

pip install sqlmesh

Optionally with extras for specific engines (e.g., pip install sqlmesh[bigquery]).

Step 2 - Initialize a project

Per sm-cli:

sqlmesh init <dialect>
# example:
sqlmesh init duckdb

Generates project skeleton: config.yaml, models/, macros/, tests/, audits/, seeds/.

Step 3 - Author a model

A SQLMesh model is a SQL (or Python) file in models/ with a MODEL directive header:

-- models/sales/orders_summary.sql
MODEL (
  name sales.orders_summary,
  kind FULL,
  cron '@daily',
  owner 'data-team@example.com',
  description 'Daily summary of orders by customer'
);

SELECT
  customer_id,
  COUNT(*) AS order_count,
  SUM(amount) AS total_amount
FROM sales.orders
GROUP BY customer_id;

Model kinds (per SQLMesh docs): FULL (rebuild every run), INCREMENTAL_BY_TIME_RANGE (process new time partitions), INCREMENTAL_BY_UNIQUE_KEY (upsert by key), VIEW (no persisted table), SEED (static data).

Step 4 - Plan + apply (the core workflow)

Per sm-cli:

sqlmesh plan dev

plan shows:

  • Modified models (Directly + Indirectly)
  • Each change classified as breaking or non-breaking
  • Backfill plan (how much data needs reprocessing)

The user reviews the plan, then confirms - plan is integrated with apply: confirming the plan applies it.

sqlmesh plan prod  # promote dev to prod

The promotion is virtual until apply: prod continues serving existing data until the new env is built.

Step 5 - Run scheduled execution

sqlmesh run

Runs models per their cron schedule. Typically scheduled in CI/CD (daily / hourly), sqlmesh run checks each model and executes if its cron is due.

Step 6 - Audits

Audits are SQL-based data-quality checks attached to models:

-- audits/no_null_amounts.sql
AUDIT (
  name no_null_amounts,
);
SELECT * FROM @this_model WHERE amount IS NULL;
sqlmesh audit

Returns failures if the audit query returns any rows.

Compare with great-expectations and soda-checks: SQLMesh audits are tightly coupled to SQLMesh models; GE/Soda are standalone data-quality frameworks. Choose audits when you're already in SQLMesh; GE/Soda for cross-framework data quality.

Step 7 - Tests (unit tests on models)

Unlike audits (which run on real data), tests run on synthetic input → synthetic output:

# tests/test_orders_summary.yaml
test_orders_summary:
  model: sales.orders_summary
  inputs:
    sales.orders:
      - { customer_id: 1, amount: 100.00 }
      - { customer_id: 1, amount: 200.00 }
      - { customer_id: 2, amount: 50.00 }
  outputs:
    query:
      - { customer_id: 1, order_count: 2, total_amount: 300.00 }
      - { customer_id: 2, order_count: 1, total_amount: 50.00 }
sqlmesh test

Step 8 - CI integration

- run: pip install sqlmesh
- run: sqlmesh test                                 # unit tests
- run: sqlmesh plan ci-${{ github.run_id }} --no-prompts  # build a per-PR env
- run: sqlmesh audit                                # data-quality checks
# Promotion to prod is a separate workflow with manual approval gate

The per-PR env approach gives full data-pipeline isolation: each PR materializes its own copy of the models, audits run against real PR data, and merge-then-promote is the production path.

Step 9 - Composition with sister tools

Pair with migration-blast-radius-reviewer for adversarial review of breaking changes - sqlmesh plan already classifies, but the reviewer adds estimation of downstream consumer impact (BI dashboards, downstream services) that SQLMesh's model graph alone doesn't capture.

For underlying schema migrations (DDL on the warehouse, separate from SQLMesh model changes), use flyway-migrations or atlas-migrations.

Anti-patterns

Anti-patternWhy it failsFix
sqlmesh apply directly to prod without dev planNo review of breaking-vs-non-breaking; broken pipelinesAlways plan dev first (Step 4)
Use kind FULL for everythingFull rebuilds expensive on large tablesPick INCREMENTAL_BY_TIME_RANGE / BY_UNIQUE_KEY per model semantics
Skip audits on critical modelsData drift goes undetectedAudit every transformation (Step 6)
Write tests against production data instead of syntheticTests pass-by-accident; brittleUse the inputs/outputs test format (Step 7)
Treat SQLMesh as schema-migration toolSQLMesh manages data models, not raw DDLPair with Flyway/Atlas for raw schema (Step 9)

Limitations

  • Newer than dbt; smaller community + ecosystem (extensions, adapters).
  • Steeper learning curve for the virtual environments concept (vs dbt's "single shared dev/prod" default).
  • Engine support is broader than dbt's (DuckDB-first), but vendor-specific feature coverage varies by engine.
  • Migration-from-dbt is non-trivial - model metadata maps roughly but tests / macros / sources / packages need translation.

References