12 Steps to Better Data Engineering

Let me tell you about the moment I stopped trusting architecture diagrams.

I was three days into a new role, getting up to speed with the data team. Smart people. Modern stack. On paper, everything looked right. They walked me through a beautiful data platform diagram: clean lines, labelled layers, colour-coded domains. It looked like something you’d see in a data conference.

Then I asked a question that changed everything: “Can you rebuild your finance table from scratch right now?”

The room went quiet. One person started explaining that it was “mostly possible” but there were “a few manual steps” and “some seeds that someone uploads”. Someone else mentioned an incremental model that “hasn’t been full-refreshed in about eight months because last time it broke.”

That gap — between the diagram on the wall and the reality in the warehouse — is where I’ve spent most of my career. And over the years, working across team after team, I started noticing something. The teams that were struggling weren’t missing knowledge. They had smart engineers who knew about testing, CI/CD, documentation, data contracts. They’d read the blog posts. They’d watched the conference talks. What they didn’t have was a way to honestly assess where they stood and what to fix first.

I kept wishing someone would build a simple diagnostic. Maybe they just needed a handful of honest questions that would tell you, in ten minutes, whether your team was engineering or firefighting.

The moment that turned wishing into doing came about six months later, with a team I was leading. We implemented just two practices — CI testing on pull requests and mandatory code review — and I watched the compound effect over twelve weeks. Our incidents dropped. The analysts stopped asking “which table do I use?” A new hire shipped a model on day four. One of the senior engineers pulled me aside and said, “I wish we’d known how bad things were before you got here. We thought we were fine because nothing was on fire.”

Nothing was on fire. But everything was smouldering.

That’s why I built this test. Not because the world needs another framework — it doesn’t. But because the gap between thinking you’re fine and knowing where you stand is where data teams lose months of momentum. And the only way to close that gap is to ask questions honest enough that the answers sting a little.

The twelve questions

Before we get into the detail, here they are. Score yourself. One point for each “yes.”

Can you rebuild any table from raw data in one command?
Do you have a data catalog that people actually use?
Can a new analyst find the data they need without asking you?
Do you test transformations before deploying them?
Do you fix data quality issues before building new pipelines?
Do you have SLAs for your critical tables?
Do you have a single source of truth for business definitions?
Can you explain the lineage of any metric in under 2 minutes?
Do data producers know when they break downstream consumers?
Do you do code review on SQL and dbt models?
Do new hires build a real pipeline in their first week?
Do you regularly talk to the people who actually use your data?

A score of 12 is unicorn territory. I’ve never seen it in the wild. Most teams I’ve worked with score 3 or 4. Anything below 6 and you’re not doing data engineering — you’re doing data triage.

Now let’s break each one down. For every question, I’ll show you what good looks like and what amazing looks like — with concrete implementations using dbt, Snowflake, GitHub Actions, and AWS. Because “yes” isn’t binary in practice. There’s a massive gap between “sort of” and “absolutely.”

1. Can you rebuild any table from raw data in one command?

This is the foundation. If you can’t reproduce your outputs from your inputs, you don’t have a pipeline — you have a prayer.

The core principle here is idempotency: running the same operation twice produces identical results. dbt is designed around this idea, but achieving it requires deliberate choices, especially with incremental models. Too many teams build incremental models that silently drift from what a full refresh would produce, and they don’t discover the discrepancy until something breaks spectacularly.

What good looks like

Individual tables rebuild cleanly with dbt run --full-refresh --select model_name. Your incremental models are tested against full refreshes during development. You’ve configured on_schema_change: sync_all_columns so schema evolution doesn’t silently break things. And you’ve got a CI pipeline in GitHub Actions that runs dbt build on pull requests.

That’s good. That’s better than most. But it’s still table-by-table.

What amazing looks like

Your entire warehouse rebuilds from raw data using a Write-Audit-Publish (WAP) pattern. If you’re not familiar with WAP, I wrote a deep dive on implementing it with Airflow — the core idea is that new data gets written to a staging environment, audited against quality checks, and only published to production once it passes. It’s the data engineering equivalent of a preflight checklist: nothing reaches consumers until it’s been verified. And if you’re on Snowflake with Iceberg tables, the game has changed — WAP with Iceberg branching gives you git-like isolation at the table level, which means you can stage, audit, and publish without creating separate schemas or databases at all.

The practical implementation: dbt always targets a staging schema or database (or an Iceberg branch), runs all models and tests, and only on success does the data get promoted to production. Combined with Snowflake’s zero-copy cloning, this becomes practical even at scale — cloning a multi-terabyte database takes seconds and costs nothing in storage. You’re not duplicating data; you’re creating metadata pointers.

And when things do go wrong — and they will — the question becomes how quickly you can recover. If your incremental models have drifted, or a source system silently changed its schema three weeks ago, you need a backfill strategy that doesn’t involve rebuilding everything from scratch. I wrote about the pitfalls of day-by-day backfills and how to heal tables properly — it’s the companion piece to idempotency, because reproducibility isn’t just about building forward. It’s about being able to go back.

The really mature teams take it further with Slim CI. Instead of rebuilding everything on every PR, they store the production manifest.json in S3 and compare it against the new build. Only modified models and their downstream dependencies get rebuilt:

# .github/workflows/dbt-ci.yml
on:
  pull_request:
    types: [opened, reopened, synchronize]

steps:
  - name: Download production manifest
    run: aws s3 cp s3://dbt-artifacts/manifest.json ./state/manifest.json

  - name: Run dbt build (modified models only)
    run: dbt build -s 'state:modified+' --defer --state ./state --target ci

One team I worked with reported a 30% reduction in compute costs just by splitting their monolithic CI job into targeted workflows.

Watch out for this: Protect massive tables with full_refresh = false in your dbt config. One accidental --full-refresh on a billion-row fact table can ruin your morning. Override it with a variable when you actually mean it: full_refresh = var("force_full_refresh", false).

2. Do you have a data catalog that people actually use?

The emphasis is on “actually use.” I’ve lost count of how many teams have shown me a data catalog, only for me to discover that nobody’s opened it in months. The data catalog is the gym membership of the data world — everyone has one, nobody goes.

The failure mode is almost always the same: a team selects a tool, loads metadata, declares victory, and moves on. Six months later it’s a ghost town of stale descriptions and orphaned tables.

What good looks like

dbt Docs generated and hosted somewhere accessible. Your major tables have descriptions in schema.yml. The lineage graph exists and gets pulled up occasionally during incident response or onboarding. It’s not perfect, but it’s there.

What amazing looks like

The catalog is the default entry point for data questions — not Microsoft Teams, not the data engineer sitting three desks over.

Spotify’s internal tool Lexikon pushed data scientist adoption from 75% to 95% by doing something clever: they added personalised dataset recommendations, people/team pages showing who uses each dataset, common joins, and popular fields. It became a top-five internal tool because it was genuinely faster than asking a colleague.

Airbnb’s Metis serves over 1,000 data users weekly. Their secret? A Google-like search interface that works for every skill level, from SQL-fluent engineers to product managers who’ve never written a query.

The pattern that actually works for smaller teams: author descriptions in whatever UI is easiest, auto-generate a PR back to dbt’s schema.yml daily, review in Git. This keeps your definitions version-controlled without making people learn Git just to document a table.

And here’s the leading indicator that your catalog is working: track the decline in Microsoft Teams questions about data. Productboard used exactly this metric. Fewer “which table do I use for revenue?” messages meant the catalog was earning its keep.

3. Can a new analyst find the data they need without asking you?

This is the catalog question’s evil twin. A catalog helps, but self-service discovery is really about the cumulative effect of naming conventions, documentation-as-code, semantic layers, and project structure.

dbt Labs published guidance that should be printed and taped to every data engineer’s monitor: “Assume your end-user will have no other context than the model name.” Model names persist across databases, BI tools, DAGs, and docs. Folder names and schemas don’t follow the data the same way.

What good looks like

You’re using the canonical dbt naming convention: stg_ for staging, int_ for intermediate, fct_ for facts, dim_ for dimensions. Your project follows the three-layer structure — staging (1:1 with sources, materialised as views), intermediate (composable business logic), and marts (final business-ready datasets organised by domain). A reasonably technical person can navigate the DAG and find what they need within ten minutes.

What amazing looks like

A new analyst finds and understands any dataset within minutes, without messaging anyone.

Every model has both table-level and column-level descriptions. Schema field search works across all tables — “find every table with customer_id” returns results instantly. Personalised dataset recommendations surface relevant tables based on role.

And the crown jewel: a semantic layer that lets business users query metrics in plain language without writing SQL. The dbt Semantic Layer (powered by MetricFlow) centralises metric definitions in YAML alongside your models — entities, dimensions, and measures declared once, then consumed via API in Tableau, Power BI, Looker, Python notebooks, and increasingly, AI agents.

Bilt Rewards reported an 80% decrease in data costs after implementing it for embedded analytics. And dbt Labs’ testing showed 83% of natural-language questions answered correctly when routed through the semantic layer. That last number matters more than you’d think — it’s the difference between AI tooling that actually works and AI tooling that confidently gives the wrong answer.

4. Do you test transformations before deploying them?

This is probably the single highest-impact practice on the list. If I could only get a team to adopt one of these twelve, it would be this one.

But here’s what most teams get wrong: they think “testing” means slapping not_null on a few columns and calling it done. Real testing means understanding the dimensions of data quality — completeness, uniqueness, timeliness, validity, accuracy, consistency — and building checks that cover each one deliberately. I’ve written about why data quality is a deeper problem than most teams realise, and the short version is this: testing in CI is where quality becomes a habit rather than a hope. But only if your tests are actually measuring the things that matter.

The consensus across every practitioner I’ve studied is clear: CI for dbt is the most impactful thing you can do for data quality. Three complementary testing layers form a comprehensive strategy: generic dbt tests (not_null, unique, accepted_values, relationships), unit tests (introduced in dbt v1.8 for testing complex business logic with static inputs), and data diffing for value-level comparison between production and development.

What good looks like

Basic dbt tests on key models. Tests run as part of scheduled production jobs using dbt build (which runs tests immediately after each model, not in a separate pass). Pull requests exist for dbt changes, and someone eyeballs them before merging.

What amazing looks like

Full Slim CI with per-PR isolated Snowflake schemas, data diffing integrated into PR comments, and automated linting that catches style issues before a human ever looks at the code.

The numbers from real teams are staggering. Thumbtack — 50-plus analysts, five data engineers, over 100 PRs per month — previously spent one to two hours manually validating each pull request with SQL queries and spreadsheets. After integrating data diffing into their GitHub CI pipeline, they saved over 200 hours per month. That’s not a rounding error. That’s a full-time engineer’s worth of capacity recovered.

Dutchie caught timezone corruption on created_at fields and case-when logic errors that were silently shifting 20% of data between columns. Nutrafol caught a transformation that would have shown net revenue plummeting — before it reached production and before the CFO’s Monday morning dashboard refreshed.

Zscaler went even further. They built PRISM, a multi-agent AI PR review system that reduced manual review time by 90% — auto-approving conformant PRs and posting targeted comments on complex logic changes.

Here’s what the mature pipeline looks like in practice:

# On every PR: lint, compile, build modified models, diff against prod
steps:
  - name: SQLFluff lint
    run: sqlfluff lint models/ --dialect snowflake

  - name: dbt compile
    run: dbt compile

  - name: Slim CI build
    run: dbt build -s 'state:modified+' --defer --state ./state --target ci

  - name: Data Diff
    run: datafold ci submit --ci-config-id ${{ secrets.DATAFOLD_CI_CONFIG }}

5. Do you fix data quality issues before building new pipelines?

Here’s an uncomfortable stat: Monte Carlo’s 2023 State of Data Quality report found that 74% of organisations reported that business stakeholders identify data quality issues first — up from 47% the prior year. Most data teams learn about broken data from the people who are supposed to trust it. That’s not a technology problem. That’s a culture problem.

What good looks like

Basic dbt tests catch obvious issues. Tests run in production jobs. The team has an informal sense of which data matters most, usually earned through painful experience — someone got burned by bad revenue numbers in an executive review, and now those tables have tests.

What amazing looks like

Data quality SLAs enforced with error budgets, borrowed from the SRE playbook. A 99.5% data availability target allows approximately 3.6 hours of acceptable downtime per month. When the error budget is consumed, reliability work takes priority over feature delivery. Full stop. No new pipelines until you’ve earned back your quality margin.

Warner Bros. Discovery created something they call a Data Quality Forum — not a reactive incident-review meeting, but an operational bridge between teams. They used observability tooling to surface anomalies early, developed a priority matrix (P0/P1/P2), and made the forum’s patterns part of new hire onboarding. During Olympics livestreaming, custom SQL checks detected missing content metadata before it could break reporting.

Snowflake now offers native Data Metric Functions — FRESHNESS, NULL_COUNT, DUPLICATE_COUNT, UNIQUE_COUNT, ROW_COUNT — that can be scheduled to run on DML changes or time intervals. Results land in SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS. It’s not a replacement for dbt tests, but it provides a warehouse-level safety net that catches issues your transformation layer might miss.

The cultural shift matters more than the tooling. HelloFresh’s VP of Data drove a transformation through three stages: ad-hoc individual fixes, organised cleanup by a central team, and finally proactive quality at the source. The final stage required embedding data product owners within business domains and running data literacy programs. It’s not glamorous work. But it’s the work that changes outcomes.

6. Do you have SLAs for your critical tables?

“The dashboard is stale” is the Slack message that launches a thousand fire drills. SLAs turn that reactive scramble into a measured, prioritised response.

The breakthrough insight from practitioners: don’t aim for 100%. A 99.5% target gives you a realistic buffer for maintenance, edge cases, and the occasional Snowflake service hiccup. Different data products warrant different targets — which means you need a tiered classification system.

What good looks like

Your team knows which tables are critical, usually because they’ve been burned before. Some dbt source freshness checks are running. Basic Slack alerts fire when jobs fail. It’s reactive, but at least you know when things break.

What amazing looks like

A formal tiered classification published in the data catalog with specific, measurable commitments:

Tier 1 (Gold): ML systems, revenue reporting → PagerDuty on-call, must have unique + not_null tests + assigned owner, freshness SLA under 4 hours
Tier 2 (Silver): Executive dashboards, KPI reports → Slack team channel alerts, owner assigned, freshness SLA under 12 hours
Tier 3 (Bronze): Ad-hoc analytics, exploration tables → weekly digest, freshness SLA under 24 hours

The implementation lives in dbt meta config:

models:
  - name: fct_revenue
    meta:
      owner: "finance-data-team"
      criticality: "tier_1"
      sla_freshness: "4 hours"
    tests:
      - unique:
          column_name: order_id
      - not_null:
          column_name: order_id

SLA compliance dashboards track attainment percentage, breach counts, and trends over time. When the error budget is exhausted, feature development freezes — same principle as practice #5. Dedicated Snowflake warehouses isolate critical workloads from exploratory queries so that someone’s SELECT * on a billion-row table doesn’t cause your revenue report to miss its 8 AM deadline.

7. Do you have a single source of truth for business definitions?

Everyone’s had this conversation: “Which revenue number is right?” Two dashboards, two numbers, two teams who each think the other is wrong. This is what happens when metric definitions live inside BI tools, tribal knowledge, and the head of that one analyst who’s been here since 2019. If you’ve ever worked with dimensional models, you’ll recognise this as the conformed dimension problem — I covered why dimensional modeling still matters and how conformed dimensions are the integration backbone that prevents exactly this kind of mess.

What good looks like

Key metrics are defined in dbt docs or a wiki. The team knows which mart table is the canonical source for revenue, orders, or whatever your core entities are. When someone asks “which table do I use?”, there’s a consistent answer — even if it’s only communicated verbally.

What amazing looks like

A centralised semantic layer enforced as the only path to metrics. All definitions are version-controlled in dbt YAML, reviewed via PR, and validated in CI.

Here’s what a metric definition looks like in MetricFlow:

metrics:
  - name: cancellation_rate
    description: "Percentage of orders cancelled within 24 hours of placement"
    type: ratio
    type_params:
      numerator: cancellations
      denominator: order_total
    filter: |
      {{ TimeDimension('metric_time', 'day') }}

That definition gets consumed identically whether someone queries it from Tableau, Power BI, a Python notebook, or an AI agent. One definition. One answer. Everywhere.

Whatnot — the livestream marketplace growing at breakneck speed — solved a different version of this problem. They used Protobuf schemas as a single source of truth for event definitions, consolidating hundreds of chaotic Snowflake tables down to two clean “exposure” tables: backend_events and frontend_events. Brutal simplification. But it worked because everyone could find the data.

dbt sl validate in CI catches breaking changes to semantic definitions before merge. That’s the key differentiator — your metric definitions aren’t just documented, they’re enforced.

8. Can you explain the lineage of any metric in under 2 minutes?

When your CFO asks “where does this number come from?”, you need an answer faster than “let me check and get back to you.” Two minutes is generous. In most incident situations, you’ve got about thirty seconds before people start making assumptions.

What good looks like

The dbt DAG is viewable, hosted on S3 or an internal site. Your team can trace major mart tables back to their sources. During incident response, someone pulls up the lineage graph and walks through the chain. It takes some squinting, but the information is there.

What amazing looks like

Column-level lineage automated across the full stack — from source systems through dbt transformations to the BI dashboards consumers see.

dbt Cloud provides column-level lineage showing whether each column is “transformed” versus “passthrough/rename.” On the Snowflake side, the ACCESS_HISTORY view (Enterprise Edition) tracks both read and write operations with column-level mappings. Snowflake Labs published an open-source adapter that converts ACCESS_HISTORY data into OpenLineage JSON format, making cross-platform lineage possible.

But here’s where it gets practical. The open-source tool Recce compares two dbt environments and produces a “Lineage Diff” in CI. It categorises changes as breaking, partial-breaking, or non-breaking. PR reviewers get an instant risk assessment: “this change affects 47 downstream models, including three Tier 1 dashboards” versus “this change is isolated to a staging model with no downstream consumers.”

PRs get annotated automatically with impact analysis. Cross-project lineage via dbt Mesh connects multiple dbt projects. Lineage isn’t a pretty graph that nobody looks at — it’s used daily for onboarding, incident response, impact analysis, and cost optimisation.

9. Do data producers know when they break downstream consumers?

This is where data engineering starts borrowing seriously from software engineering. In a microservices world, you wouldn’t deploy an API change without knowing who calls your endpoint. But in data, teams routinely change source schemas, modify column semantics, or sunset tables with zero awareness of who’s consuming them downstream.

Data contracts change that dynamic.

What good looks like

dbt sources are defined with dbt source freshness running and alerting. The team is aware of major upstream dependencies. When Fivetran’s sync breaks or a source system changes its schema, someone notices within a few hours — usually because a test fails.

What amazing looks like

dbt model contracts (v1.5+) enforced on all gold-layer and public models:

models:
  - name: dim_users
    config:
      contract:
        enforced: true
    columns:
      - name: user_id
        data_type: int
        constraints:
          - type: primary_key
      - name: email
        data_type: varchar
        constraints:
          - type: not_null

This is a preflight check: dbt verifies the model’s compiled SQL returns columns matching the contract before building. Breaking changes — removed columns, changed data types, modified constraints — are caught by state:modified in CI. Model access levels (public, protected, private) control cross-project visibility.

Whatnot went further with Protobuf schemas enforced via Buf linter in CI. Every event producer runs against a common testing harness. Post-deployment monitoring catches semantic drift — when a field still exists but its meaning has changed.

Gotcha worth knowing: dbt contracts validate the compiled SQL, not the actual Snowflake table. If tools like Fivetran add columns directly to your raw layer, dbt won’t know until the next run. Monitor schema changes externally using Snowflake’s INFORMATION_SCHEMA or ACCESS_HISTORY to catch drift between runs.

10. Do you do code review on SQL and dbt models?

I’m constantly surprised by how many data teams still deploy SQL changes without review. In software engineering, unreviewed code going to production would be considered reckless. In data engineering, it’s Tuesday.

What good looks like

Pull requests are required for all dbt changes — branch protection is enabled on main. At least one human reviewer looks at each PR. There’s a basic PR description explaining what changed and why.

What amazing looks like

Automated CI running SQLFluff lint + dbt compile + dbt build + data diff on every PR. Pre-commit hooks catch issues before code is even committed. PR templates with structured sections: description, linked tickets, impact zone, testing evidence.

The recommended pre-commit stack combines SQLFluff for linting with dbt-checkpoint for structural validation:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/sqlfluff/sqlfluff
    hooks:
      - id: sqlfluff-lint
        args: [--dialect, snowflake]

  - repo: https://github.com/dbt-checkpoint/dbt-checkpoint
    hooks:
      - id: check-model-has-tests
      - id: check-model-columns-have-desc
      - id: check-source-has-freshness

That last hook — check-model-columns-have-desc — is quietly revolutionary. It means documentation isn’t optional. You literally cannot merge a model without column descriptions. The “we’ll document it later” excuse dies right there in the CI pipeline.

Surfline (700+ dbt models) reported that after integrating SQLFluff, SQL consistency improved dramatically and reviewer burden dropped. New engineers learned “good SQL” from day one because the linter enforced it automatically. Another team, Markerr, found that automated style enforcement freed up review time to focus on deeper logic questions rather than arguing about capitalisation and trailing commas.

The dbt_project_evaluator package is worth mentioning too — it audits your entire DAG structure against dbt Labs’ published best practices. Run it in CI and it’ll flag things like models that reference sources directly (bypassing staging), duplicate sources, or models with no tests.

11. Do new hires build a real pipeline in their first week?

The speed at which a new hire becomes productive tells you everything about the state of your documentation, tooling, and team culture. If it takes three weeks before someone can make a meaningful contribution, you don’t have an onboarding problem — you have a platform problem. I wrote a guide to navigating your first 90 days as a data engineer — and the teams that make those first 90 days count are almost always the ones who’ve invested in the infrastructure described here.

What good looks like

New hire has access to tools within a day or two. Some documentation exists. A buddy or mentor is assigned. They can run the dbt project locally by end of week one, even if they haven’t contributed anything yet.

What amazing looks like

Pre-arrival setup is complete before Day 1: machine provisioned, logins ready, Snowflake roles assigned, dbt Cloud account active. No engineer should spend their first morning installing things.

Each developer gets a personal Snowflake sandbox — dbt_<username> schema in the dev database, with write access only to dev and read-only access to raw/staging. Snowflake’s zero-copy cloning makes production data available instantly without duplicating storage costs. Productboard open-sourced dbt-snowflake-sandbox — a set of dbt macros that create isolated sandboxes by cloning only the specific model dependencies a developer needs.

The first-week project is structured and progressive:

Day 1-2: Run the existing dbt project locally. Explore the DAG. Read the key model documentation.
Day 3: Add a new source or staging model for a real (but low-risk) dataset.
Day 4: Build a staging model with tests and documentation.
Day 5: Open a PR, go through the review process, and get it merged.

By Friday, they’ve shipped something real. They understand the workflow. They’ve experienced CI, code review, and deployment. And critically, they feel like a contributor rather than a tourist.

7shifts estimated cutting onboarding time by over a week per new hire just by having documentation and data questions accessible in their data catalog.

12. Do you regularly talk to the people who actually use your data?

This is the question that separates data teams who build for their portfolio from data teams who build for their business. You can nail every technical practice on this list and still fail if you’re building the wrong things. I’ve written before about how to maximise your data team’s impact — and the through-line is always the same: the teams that create real value are the ones who stay close to the people consuming their work.

What good looks like

A Slack channel exists for data questions. Communication happens when issues arise. There are occasional stakeholder meetings, usually prompted by something breaking or a new request coming in.

What amazing looks like

Data office hours. Weekly or bi-weekly open sessions where anyone in the organisation can bring data questions. Not a presentation. Not a status update. An open door.

Holistics published a detailed playbook for running what they call “data clinics.” The core principles: teach, don’t serve. Show business users how to self-serve rather than just answering their question and sending them on their way. Montreal Analytics reported that after implementing data clinics, the number of self-serve business users on their BI tool grew tenfold. Tenfold.

Pair that with a monthly Data NPS survey — a single question: “How likely are you to recommend our data team’s products to a colleague?” It sounds corporate, but without this metric, you have no way to quantify whether your consumers are satisfied or just silently building workarounds in Excel.

dbt exposures formalise the connection between data models and their consumers:

exposures:
  - name: weekly_revenue_dashboard
    type: dashboard
    owner:
      name: Finance Team
      email: finance@company.com
    depends_on:
      - ref('fct_revenue')
      - ref('dim_date')

When a model changes, the PR shows which exposures are affected. Stakeholder impact becomes visible in code review. You can proactively notify the finance team that their revenue dashboard’s upstream model is changing before they discover it themselves.

Frame your communication in business terms. Not “model precision is up 12%” but “the sales team can now identify high-value leads 40% faster.” Nobody outside your team cares about your DAG. They care about whether they can trust the numbers. If you’re not sure how to bridge that gap, I wrote a guide on breaking down business context — because the hardest part of talking to stakeholders isn’t the talking, it’s knowing what they actually need to hear.

Scoring your team

Here’s the scoring framework, and I want you to be ruthless with yourself:

Score	What it means
10-12	You’re in elite territory. Your data platform is a competitive advantage.
7-9	You’ve got strong foundations with clear areas to improve.
4-6	You’re functional but fragile. One bad incident away from a crisis of trust.
1-3	You’re firefighting, not engineering. The business tolerates your team — it doesn’t trust it.

Most teams land at 3-4. That’s not a criticism — it’s where the industry is. The gap between knowing these practices exist and actually implementing them is where the real work lives.

Where to start

If you’re staring at a low score and feeling overwhelmed, here’s what I’d suggest: start with practices 4 and 10. CI testing and code review.

Here’s why. Once dbt changes go through pull requests with automated checks, you have the infrastructure to enforce everything else. Contracts? They’re a CI check. Quality gates? CI check. Documentation requirements? CI check. SLA validation? CI check. You’re not adopting twelve practices — you’re building one pipeline that gates on twelve things.

The canonical architecture appears across virtually every mature team I’ve studied: GitHub Actions triggering Slim CI with dbt build -s state:modified+ against per-PR Snowflake schemas, with production manifests stored in S3 for state comparison. Start there. Layer on the remaining practices as your team’s maturity grows.

Three patterns emerged from the research that I think are worth calling out explicitly:

“As code” wins everywhere. Documentation-as-code, style-guides-as-code, permissions-as-code, quality-checks-as-code, contracts-as-code. Every manual process that gets codified becomes version-controlled, reviewable, and enforceable. Every one that stays manual eventually drifts.

Snowflake’s zero-copy cloning is the force multiplier. Isolated PR environments, sandbox onboarding, rebuild testing — all of these become cheap and instant with cloning. If you’re on Snowflake and not using this feature aggressively, you’re leaving the most powerful tool in the shed.

Culture matters more than tooling. Catalog adoption fails without meeting users in their existing workflows. Data quality requires executive sponsorship and error budgets, not just more tests. Producer-consumer contracts are first and foremost a cultural change. You can buy every tool on this list and still score a 3 if the organisation doesn’t value the practices behind them.

The diagram and the warehouse

Remember that team — the one with the beautiful diagram and the revenue table nobody could rebuild?

I caught up with one of the engineers about four months after I’d moved on. They’d started with exactly what I’d pushed for: CI testing and code review. Then they added contracts on their gold-layer models. Then SLAs on their Tier 1 tables. Then a first-week onboarding project for new hires.

Their score when I first asked these twelve questions? Three. Four months later? Eight. Not perfect. But the difference wasn’t really the number. The difference was that when they pulled up that architecture diagram now, it matched what was actually running in Snowflake. The gap between the wall and the warehouse had closed.

That’s what this test is really measuring. Not whether you have the right tools — you probably do. Not whether you know the right practices — you’re reading this, so clearly you care. It’s measuring whether there’s a gap between what you think your data platform looks like and what it actually looks like.

These twelve questions just make you say it out loud.

So print it. Score yourself honestly. Share it with your team. And the next time someone asks you how your data platform is doing, give them a number between 1 and 12.

That number is worth more than any architecture diagram.

Chris Hillman

12 Steps to Better Data Engineering

The twelve questions

1. Can you rebuild any table from raw data in one command?

What good looks like

What amazing looks like

2. Do you have a data catalog that people actually use?

What good looks like

What amazing looks like

3. Can a new analyst find the data they need without asking you?

What good looks like

What amazing looks like

4. Do you test transformations before deploying them?

What good looks like

What amazing looks like

5. Do you fix data quality issues before building new pipelines?

What good looks like

What amazing looks like

6. Do you have SLAs for your critical tables?

What good looks like

What amazing looks like

7. Do you have a single source of truth for business definitions?

What good looks like

What amazing looks like

8. Can you explain the lineage of any metric in under 2 minutes?

What good looks like

What amazing looks like

9. Do data producers know when they break downstream consumers?

What good looks like

What amazing looks like

10. Do you do code review on SQL and dbt models?

What good looks like

What amazing looks like

11. Do new hires build a real pipeline in their first week?

What good looks like

What amazing looks like

12. Do you regularly talk to the people who actually use your data?

What good looks like

What amazing looks like

Scoring your team

Where to start

The diagram and the warehouse