Write-Audit-Publish with Iceberg Tables in Snowflake

It was a Tuesday afternoon when the analyst pinged me on Microsoft Teams: “Hey, the Total Portfolio numbers just jumped 40% overnight. Did we land a whale?”

We hadn’t.

What actually happened was more mundane and significantly more painful. A schema change in the source system introduced a currency conversion bug. Our pipeline dutifully loaded the corrupted data into production at 3 AM, the dashboards updated by 6 AM, and the Department Head opened her morning report to numbers that looked like champagne-worthy growth.

Here’s the thing — we had data quality tests. They ran after the data hit production. A classic case of closing the gate after the horse has bolted, kicked over the fence, and trampled the neighbour’s garden.

That incident changed how I think about data pipelines entirely. It led me to the Write-Audit-Publish pattern — a concept championed heavily by Zach Wilson. Zach frames it simply: write to a staging area with an identical schema to production, run your data quality checks, and only publish to production if they pass. It’s deceptively simple. And it works.

I’ve written about WAP before using Airflow as the orchestration layer, with traditional staging tables and swap mechanics. That approach is solid, battle-tested, and it’ll serve most teams well. But now that Apache Iceberg tables in Snowflake have matured — there’s a more elegant way to implement WAP that eliminates one of the pattern’s biggest practical headaches: the I/O overhead.

In this tutorial, we’re going to build a complete WAP test harness from scratch on Snowflake Iceberg tables. We’ll create synthetic data, simulate a change that introduces bad data, catch it with quality gates, and publish only the clean data to production. Along the way, I’ll show you why the zero-copy mechanics of Iceberg’s architecture make the publish step insanely fast — and why that speed changes the calculus on when WAP is worth the effort.

Why the Publish Step Is Where Traditional WAP Hurts

If you’ve worked in data engineering for more than a few months, you’ve probably built some version of WAP without calling it that. You load data into a staging table, run some checks, then copy the good stuff into production. That works. Sort of.

The problem is what happens at the publish step. In a traditional staging-table WAP implementation, publishing means physically moving data. You’re running an INSERT INTO production SELECT * FROM staging, or you’re doing a table swap that involves Snowflake creating new micro-partitions. Either way, you’re generating I/O. You’re writing data that already exists in a perfectly good format, to another location, in another format, just so it lives under the right table name.

For a table with a few million rows, nobody notices. For a table with a few billion rows partitioned across hundreds of files? That publish step becomes a pipeline bottleneck. I’ve seen WAP implementations where the audit takes 30 seconds and the publish takes 45 minutes. The quality gate is fast. The bureaucratic data shuffle is slow.

This is where Iceberg changes the equation.

Iceberg tables don’t store data inside Snowflake’s proprietary micro-partition format. The data lives as Parquet files in your external storage — S3, GCS, or Azure Blob. Snowflake manages the metadata that tells it which Parquet files belong to which snapshot of the table. When you do a CLONE of an Iceberg table in Snowflake, it creates a new metadata pointer to the same underlying Parquet files. No data moves. No files are copied. No I/O is generated.

And when you SWAP a validated clone back into production? Again — it’s a metadata operation. Snowflake updates which metadata pointer the production table name resolves to. The Parquet files don’t move. The snapshots don’t change. The swap is essentially instant regardless of table size.

Let me put that in concrete terms. Say you have a 500 GB Iceberg table with three years of order history. In a traditional WAP setup, publishing validated staging data means writing hundreds of gigabytes to new micro-partitions — that’s real compute cost, real wall-clock time, and a real window where downstream consumers might see partial data. With Iceberg’s zero-copy clone and swap, the same publish step takes seconds. Not minutes. Seconds. Because you’re flipping a pointer, not copying bytes.

This isn’t a marginal improvement. It fundamentally changes when WAP is worth implementing. When publish is expensive, you think twice about adding WAP to every pipeline. When publish is free, the question flips — why wouldn’t you gate every production write with quality checks?

The Conceptual Model: How WAP Works on Iceberg

Here’s the flow we’re going to build. Every step maps to actual SQL you can run in Snowflake today:

The beauty of this model is that production is never exposed to unvalidated data. The clone is a sandbox. Downstream consumers querying fct_orders see the same clean data they saw yesterday, right up until the moment we swap — and the swap is atomic. There’s no window of inconsistency. No partial writes. No “oops, the pipeline failed halfway through the publish.”

Setting Up the Foundation

Let’s build this thing. You’ll need a Snowflake account with access to create Iceberg tables, which means an external volume pointing to your cloud storage and the appropriate privileges.

Prerequisites

-- Create our tutorial workspace
CREATE DATABASE IF NOT EXISTS wap_tutorial;
CREATE SCHEMA IF NOT EXISTS wap_tutorial.iceberg_wap;
USE DATABASE wap_tutorial;
USE SCHEMA iceberg_wap;

-- Create external volume (adjust for your cloud provider)
-- You'll need to configure the IAM trust relationship first
-- See Snowflake docs for the cloud-specific IAM setup
CREATE OR REPLACE EXTERNAL VOLUME wap_ext_vol
  STORAGE_LOCATIONS = (
    (
      NAME = 'wap-iceberg-storage'
      STORAGE_BASE_URL = 's3://your-bucket/iceberg-wap/'
      STORAGE_PROVIDER = 'S3'
      STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/snowflake-iceberg-role'
    )
  );

-- Verify the volume is healthy
DESC EXTERNAL VOLUME wap_ext_vol;

If the external volume setup feels unfamiliar, don’t stress — the Snowflake documentation for this is thorough. The IAM configuration is the fiddly part, but it’s a one-time cost. Once the volume exists, everything downstream is straightforward SQL.

Step 1: WRITE — Creating the Iceberg Table and Synthetic Data

We’re going to simulate an e-commerce order pipeline. This is a scenario practically every data engineer has encountered — daily order data flowing from a transactional system into the warehouse.

First, create the Iceberg table using Snowflake as the catalog:

-- Create the Iceberg table
-- Using Snowflake as the catalog means Snowflake handles
-- all lifecycle maintenance (compaction, metadata updates)
CREATE OR REPLACE ICEBERG TABLE fct_orders (
    order_id        INT,
    customer_id     INT,
    order_date      DATE,
    product_id      INT,
    quantity        INT,
    unit_price      DECIMAL(10,2),
    currency_code   VARCHAR(3),
    total_amount    DECIMAL(12,2),
    order_status    VARCHAR(20),
    region          VARCHAR(50),
    loaded_at       TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
)
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'wap_ext_vol'
  BASE_LOCATION = 'fct_orders/';

Now let’s generate our initial “production” data. This represents the clean, trusted baseline that’s been accumulating for the past 30 days. Five thousand orders, all in USD, no nulls, no negative amounts — a pristine starting point:

-- Generate 30 days of clean, synthetic order data
-- This is our trusted production baseline
INSERT INTO fct_orders (
    order_id, customer_id, order_date, product_id,
    quantity, unit_price, currency_code, total_amount,
    order_status, region
)
SELECT
    ROW_NUMBER() OVER (ORDER BY SEQ4()) + 10000 AS order_id,
    UNIFORM(1, 500, RANDOM()) AS customer_id,
    DATEADD('day', -UNIFORM(0, 29, RANDOM()), CURRENT_DATE()) AS order_date,
    UNIFORM(1, 50, RANDOM()) AS product_id,
    UNIFORM(1, 10, RANDOM()) AS quantity,
    ROUND(UNIFORM(5.00, 200.00, RANDOM())::DECIMAL(10,2), 2) AS unit_price,
    'USD' AS currency_code,
    -- total_amount = quantity * unit_price (clean, correct calculation)
    ROUND(
        UNIFORM(1, 10, RANDOM()) * UNIFORM(5.00, 200.00, RANDOM())::DECIMAL(12,2),
        2
    ) AS total_amount,
    CASE UNIFORM(1, 5, RANDOM())
        WHEN 1 THEN 'PENDING'
        WHEN 2 THEN 'CONFIRMED'
        WHEN 3 THEN 'SHIPPED'
        WHEN 4 THEN 'DELIVERED'
        ELSE 'CANCELLED'
    END AS order_status,
    CASE UNIFORM(1, 5, RANDOM())
        WHEN 1 THEN 'APAC'
        WHEN 2 THEN 'EMEA'
        WHEN 3 THEN 'AMERICAS'
        WHEN 4 THEN 'LATAM'
        ELSE 'ANZ'
    END AS region
FROM TABLE(GENERATOR(ROWCOUNT => 5000));

Let’s verify our baseline is healthy before we mess it up:

-- Quick sanity check on our production data
SELECT
    COUNT(*) AS total_orders,
    COUNT(DISTINCT customer_id) AS unique_customers,
    MIN(order_date) AS earliest_order,
    MAX(order_date) AS latest_order,
    ROUND(AVG(total_amount), 2) AS avg_order_value,
    COUNT(CASE WHEN total_amount <= 0 THEN 1 END) AS negative_amounts,
    COUNT(CASE WHEN currency_code != 'USD' THEN 1 END) AS non_usd_orders
FROM fct_orders;

You should see 5,000 orders, zero negative amounts, and zero non-USD orders. This is clean production data. Remember these numbers — because we’re about to introduce some chaos.

Simulating the Change: When Bad Data Arrives

Now the realistic part. A new batch of 500 orders arrives from the source system. But this batch has problems — the kind of subtle, infuriating problems that slip past casual inspection but wreck downstream analytics.

First, we create the zero-copy clone. This is our staging sandbox:

-- Create a zero-copy clone of production
-- This is a METADATA operation — no Parquet files are copied
-- The clone shares the same underlying data files as production
CREATE OR REPLACE ICEBERG TABLE fct_orders_staging
  CLONE fct_orders;

That CLONE just happened in under a second regardless of whether your table is 5,000 rows or 500 million. No data moved. The staging clone points to the same Parquet files in your external storage. Snowflake just created a new metadata entry that says “this table currently looks like that table.”

Now let’s load the problematic batch into our staging clone:

-- Simulate incoming data with realistic bugs:
-- 1. Some NULL currency codes (source system migration issue)
-- 2. Some invalid currency codes (mapping error)
-- 3. Some negative total_amounts (conversion bug)
-- 4. Some EUR amounts that weren't actually converted
INSERT INTO fct_orders_staging (
    order_id, customer_id, order_date, product_id,
    quantity, unit_price, currency_code, total_amount,
    order_status, region
)
SELECT
    ROW_NUMBER() OVER (ORDER BY SEQ4()) + 20000 AS order_id,
    UNIFORM(1, 500, RANDOM()) AS customer_id,
    CURRENT_DATE() AS order_date,
    UNIFORM(1, 50, RANDOM()) AS product_id,
    UNIFORM(1, 10, RANDOM()) AS quantity,
    ROUND(UNIFORM(5.00, 200.00, RANDOM())::DECIMAL(10,2), 2) AS unit_price,
    -- HERE'S THE BUG: some orders arrive with wrong currency codes
    CASE
        WHEN UNIFORM(1, 100, RANDOM()) <= 85 THEN 'USD'
        WHEN UNIFORM(1, 100, RANDOM()) <= 5  THEN NULL        -- Null currency!
        WHEN UNIFORM(1, 100, RANDOM()) <= 5  THEN 'INVALID'   -- Invalid code!
        ELSE 'EUR'  -- EUR amount but never converted to USD
    END AS currency_code,
    -- And some totals went negative due to a sign error
    CASE
        WHEN UNIFORM(1, 100, RANDOM()) <= 90
        THEN ROUND(
            UNIFORM(1, 10, RANDOM())
            * UNIFORM(5.00, 200.00, RANDOM())::DECIMAL(12,2), 2)
        ELSE ROUND(
            -1 * UNIFORM(1, 10, RANDOM())
            * UNIFORM(5.00, 200.00, RANDOM())::DECIMAL(12,2), 2)
    END AS total_amount,
    CASE UNIFORM(1, 5, RANDOM())
        WHEN 1 THEN 'PENDING'
        WHEN 2 THEN 'CONFIRMED'
        WHEN 3 THEN 'SHIPPED'
        WHEN 4 THEN 'DELIVERED'
        ELSE 'CANCELLED'
    END AS order_status,
    CASE UNIFORM(1, 5, RANDOM())
        WHEN 1 THEN 'APAC'
        WHEN 2 THEN 'EMEA'
        WHEN 3 THEN 'AMERICAS'
        WHEN 4 THEN 'LATAM'
        ELSE 'ANZ'
    END AS region
FROM
    TABLE(GENERATOR(ROWCOUNT => 500));

The write step is done. We have 500 new orders in our staging clone — some clean, some dirty. And critically, production is completely untouched. Anyone querying fct_orders right now sees exactly what they saw before. No Department Head is going to message you on Teams this morning about phantom Total Portfolio growth.

Step 2: AUDIT — The Quality Gate Battery

This is where WAP earns its keep. We run every check we care about against the staging clone. If anything fails, we either fix it or drop the clone and production never sees the bad data. As Zach puts it — downstream pipelines wait on production data to be ready. Not just present.

Let’s build a comprehensive audit battery. Each check targets a different failure mode:

Check 1: Null Detection on Critical Fields

Nulls in currency or amount fields are deal-breakers. A null currency means we can’t convert the amount. A null amount means we can’t aggregate Total Portfolio numbers. Either one poisons everything downstream.

SELECT
    'NULL_CHECK' AS check_name,
    COUNT(CASE WHEN currency_code IS NULL THEN 1 END) AS null_currency,
    COUNT(CASE WHEN total_amount IS NULL THEN 1 END) AS null_amount,
    COUNT(CASE WHEN order_id IS NULL THEN 1 END) AS null_order_id,
    CASE
        WHEN COUNT(CASE WHEN currency_code IS NULL THEN 1 END) > 0
          OR COUNT(CASE WHEN total_amount IS NULL THEN 1 END) > 0
          OR COUNT(CASE WHEN order_id IS NULL THEN 1 END) > 0
        THEN 'FAIL'
        ELSE 'PASS'
    END AS check_result
FROM
    fct_orders_staging
WHERE
    order_date = CURRENT_DATE();

Check 2: Valid Currency Codes

Only accepted ISO currency codes should appear. Anything else means a mapping went sideways upstream.

SELECT
    'CURRENCY_VALIDATION' AS check_name,
    COUNT(*) AS invalid_currency_count,
    LISTAGG(DISTINCT currency_code, ', ') AS invalid_values,
    CASE
        WHEN COUNT(*) > 0 THEN 'FAIL'
        ELSE 'PASS'
    END AS check_result
FROM
    fct_orders_staging
WHERE
    order_date = CURRENT_DATE()
    AND currency_code NOT IN ('USD', 'EUR', 'GBP', 'AUD', 'CAD', 'JPY')
    AND currency_code IS NOT NULL;

Check 3: No Negative Amounts

A negative total_amount on an order that isn’t a refund is a sign of a calculation bug. This is exactly the kind of error that inflated our Total Portfolio numbers that Tuesday morning.

SELECT
    'NEGATIVE_AMOUNT_CHECK' AS check_name,
    COUNT(*) AS negative_count,
    MIN(total_amount) AS worst_amount,
    CASE
        WHEN COUNT(*) > 0 THEN 'FAIL'
        ELSE 'PASS'
    END AS check_result
FROM
    fct_orders_staging
WHERE
    order_date = CURRENT_DATE()
    AND total_amount < 0;

Check 4: Duplicate Detection

Duplicate order IDs mean the source system sent the same batch twice, or our pipeline ran a retry without proper idempotency.

SELECT
    'DUPLICATE_CHECK' AS check_name,
    COUNT(*) - COUNT(DISTINCT order_id) AS duplicate_count,
    CASE
        WHEN COUNT(*) - COUNT(DISTINCT order_id) > 0 THEN 'FAIL'
        ELSE 'PASS'
    END AS check_result
FROM
    fct_orders_staging
WHERE
    order_date = CURRENT_DATE();

Check 5: Volume Anomaly Detection

If today’s batch is wildly larger or smaller than the historical daily average, something structural has changed. Maybe a filter broke, maybe a new source came online, maybe someone’s test data leaked into production.

WITH daily_volumes AS (
    SELECT
        order_date,
        COUNT(*) AS daily_count
    FROM
        fct_orders_staging
    WHERE
        order_date < CURRENT_DATE()
    GROUP BY
        order_date
),
today_volume AS
(
    SELECT COUNT(*) AS today_count
    FROM
        fct_orders_staging
    WHERE
        order_date = CURRENT_DATE()
)
SELECT
    'VOLUME_ANOMALY' AS check_name,
    t.today_count,
    ROUND(AVG(d.daily_count), 0) AS avg_daily_count,
    CASE
        WHEN t.today_count > AVG(d.daily_count) * 3 THEN 'FAIL'
        WHEN t.today_count < AVG(d.daily_count) * 0.1 THEN 'FAIL'
        ELSE 'PASS'
    END AS check_result
FROM
    daily_volumes d

    CROSS JOIN today_volume t
GROUP BY
    t.today_count;

Aggregating the Verdict

In a real pipeline, you’d wrap these checks in a stored procedure or Snowflake Task. Here’s the aggregation logic that gives you a single pass/fail decision:

-- Aggregate all quality checks into a single verdict
CREATE OR REPLACE TEMPORARY TABLE audit_results AS
WITH checks AS (
    -- Null check
    SELECT 'NULL_CHECK' AS check_name,
        CASE WHEN COUNT(CASE WHEN currency_code IS NULL
                             OR total_amount IS NULL THEN 1 END) > 0
             THEN 'FAIL' ELSE 'PASS' END AS result
    FROM fct_orders_staging
    WHERE order_date = CURRENT_DATE()

    UNION ALL

    -- Negative amounts
    SELECT 'NEGATIVE_AMOUNTS',
        CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'PASS' END
    FROM fct_orders_staging
    WHERE order_date = CURRENT_DATE() AND total_amount < 0

    UNION ALL

    -- Invalid currency
    SELECT 'INVALID_CURRENCY',
        CASE WHEN COUNT(*) > 0 THEN 'FAIL' ELSE 'PASS' END
    FROM fct_orders_staging
    WHERE order_date = CURRENT_DATE()
      AND (currency_code NOT IN ('USD','EUR','GBP','AUD','CAD','JPY')
           OR currency_code IS NULL)

    UNION ALL

    -- Duplicates
    SELECT 'DUPLICATES',
        CASE WHEN COUNT(*) - COUNT(DISTINCT order_id) > 0
             THEN 'FAIL' ELSE 'PASS' END
    FROM fct_orders_staging
    WHERE order_date = CURRENT_DATE()
)
SELECT * FROM checks;

-- See the full verdict
SELECT * FROM audit_results;

-- Check if we're clear to publish
SELECT
    COUNT(*) AS total_checks,
    SUM(CASE WHEN result = 'PASS' THEN 1 ELSE 0 END) AS passed,
    SUM(CASE WHEN result = 'FAIL' THEN 1 ELSE 0 END) AS failed,
    CASE
        WHEN SUM(CASE WHEN result = 'FAIL' THEN 1 ELSE 0 END) = 0
        THEN 'CLEAR TO PUBLISH'
        ELSE 'BLOCKED — FIX REQUIRED'
    END AS verdict
FROM audit_results;

With our deliberately corrupted data, you’ll see multiple failures here. Null currencies, invalid codes, negative amounts — exactly the problems we planted. The audit caught everything. Production is still clean.

Step 3: PUBLISH — The Happy Path (When Audits Pass)

Let’s first walk through what happens when everything is clean. This is the path your pipeline will take 95% of the time — and it’s where the Iceberg zero-copy advantage really shines.

-- PUBLISH: Swap the validated clone into production
-- This is a METADATA OPERATION — no Parquet files are copied or moved
-- The swap is ATOMIC — downstream sees old data or new data, never partial
ALTER ICEBERG TABLE fct_orders
  SWAP WITH fct_orders_staging;

-- Clean up — the old production reference is now the staging table name
DROP TABLE IF EXISTS fct_orders_staging;

That’s it. Two statements. The SWAP updated which metadata pointer the table name fct_orders resolves to. The Parquet files sitting in your S3 bucket didn’t budge. No new files were written. No compute was burned copying data. No warehouse credits were consumed on I/O.

Think about what this means for pipeline design. In a traditional WAP implementation with standard Snowflake tables, the publish step is the expensive one — it’s the INSERT INTO ... SELECT * or the CREATE TABLE AS SELECT that physically writes data. That cost scales linearly with data volume. Double the table size, double the publish time and cost.

With Iceberg’s zero-copy swap, publish cost is constant regardless of table size. A 10 MB table and a 10 TB table swap in the same amount of time. That’s not a rounding error. That’s a fundamental shift in the economics of data quality enforcement.

For context, I timed this on a table with ~200 million rows across several hundred Parquet files. The clone took about a second. The audit checks took about 40 seconds. The swap? Sub-second. The most expensive part of the entire WAP pipeline was the actual data quality logic — which is exactly where you want the time to be spent.

Step 3 (Alternate): QUARANTINE — When Audits Fail

Now the realistic scenario. Our audit found problems. We have options.

Option A: Drop everything, investigate, retry. This is the nuclear option. Production stays on yesterday’s data until the source is fixed and a clean batch arrives.

-- Nuclear option — reject the entire batch
DROP TABLE IF EXISTS fct_orders_staging;
-- Production is exactly as it was. Nothing changed.

Option B: Quarantine the bad records, publish the clean ones. This is more practical for most teams. You log the failures for investigation, surgically remove the bad data, re-audit, and publish what’s clean.

-- First, capture the failures for investigation
-- This gives your team evidence to fix the source system
CREATE OR REPLACE TABLE quarantine_log AS
SELECT
    *,
    CURRENT_TIMESTAMP() AS quarantined_at,
    CASE
        WHEN currency_code IS NULL THEN 'NULL_CURRENCY'
        WHEN currency_code NOT IN ('USD','EUR','GBP','AUD','CAD','JPY')
            THEN 'INVALID_CURRENCY'
        WHEN total_amount < 0 THEN 'NEGATIVE_AMOUNT'
    END AS failure_reason
FROM
    fct_orders_staging
WHERE
    order_date = CURRENT_DATE()
    AND (
        currency_code IS NULL
        OR currency_code NOT IN ('USD','EUR','GBP','AUD','CAD','JPY')
        OR total_amount < 0
        );

-- See what we caught
SELECT failure_reason, COUNT(*) AS record_count
FROM quarantine_log
GROUP BY failure_reason
ORDER BY record_count DESC;

Now remove the bad records from the staging clone and re-validate:

-- Surgically remove the bad records
DELETE FROM fct_orders_staging
WHERE order_date = CURRENT_DATE()
  AND (
    currency_code IS NULL
    OR currency_code NOT IN ('USD','EUR','GBP','AUD','CAD','JPY')
    OR total_amount < 0
  );

-- Re-run the audit against the cleaned staging data
-- (Repeat the audit_results query from Step 2)
-- If it passes now, publish the clean subset
ALTER ICEBERG TABLE fct_orders
  SWAP WITH fct_orders_staging;

DROP TABLE IF EXISTS fct_orders_staging;

The quarantine log is important. It’s not enough to just filter out bad data — you need to understand why it was bad so you can fix the source. Every quarantined record is a symptom. The disease is upstream.

Putting It All Together: The Orchestrated Pipeline

In practice, you’d wrap this entire flow in a Snowflake Task or an external orchestrator. Here’s a stored procedure that encapsulates the full WAP pattern. This is something you can adapt and parameterise for any Iceberg table in your warehouse:

CREATE OR REPLACE PROCEDURE run_wap_pipeline(
    target_table VARCHAR,
    staging_table VARCHAR
)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
    fail_count INT;
    result_msg VARCHAR;
BEGIN
    -- WRITE: Clone production
    EXECUTE IMMEDIATE 'CREATE OR REPLACE ICEBERG TABLE '
        || staging_table || ' CLONE ' || target_table;

    -- (Your data loading logic goes here)
    -- INSERT INTO staging_table SELECT ... FROM source

    -- AUDIT: Run quality checks
    SELECT COUNT(*) INTO fail_count
    FROM (
        -- Null check
        SELECT 1 WHERE EXISTS (
            SELECT 1 FROM IDENTIFIER(:staging_table)
            WHERE order_date = CURRENT_DATE()
              AND (currency_code IS NULL OR total_amount IS NULL)
        )
        UNION ALL
        -- Negative amount check
        SELECT 1 WHERE EXISTS (
            SELECT 1 FROM IDENTIFIER(:staging_table)
            WHERE order_date = CURRENT_DATE()
              AND total_amount < 0
        )
        UNION ALL
        -- Invalid currency check
        SELECT 1 WHERE EXISTS (
            SELECT 1 FROM IDENTIFIER(:staging_table)
            WHERE order_date = CURRENT_DATE()
              AND currency_code NOT IN ('USD','EUR','GBP','AUD','CAD','JPY')
        )
    );

    IF (fail_count = 0) THEN
        -- PUBLISH: Swap (zero-copy, instant)
        EXECUTE IMMEDIATE 'ALTER ICEBERG TABLE '
            || target_table || ' SWAP WITH ' || staging_table;
        EXECUTE IMMEDIATE 'DROP TABLE IF EXISTS ' || staging_table;
        result_msg := 'PUBLISHED — All checks passed. Swap completed.';
    ELSE
        -- ROLLBACK: Drop the clone
        EXECUTE IMMEDIATE 'DROP TABLE IF EXISTS ' || staging_table;
        result_msg := 'BLOCKED — ' || fail_count
            || ' check(s) failed. Production unchanged.';
    END IF;

    RETURN result_msg;
END;
$$;

-- Run it
CALL run_wap_pipeline('fct_orders', 'fct_orders_staging');

A Note on Iceberg Branching — The Future Path

If you’ve been following the Iceberg ecosystem, you might be wondering about branching. Apache Iceberg’s spec includes a branching feature that’s purpose-built for WAP — you create a named branch within the same table, write to it in isolation, and fast-forward the main branch when audits pass. It’s like a Git pull request for your data.

The feature exists and it works beautifully in Spark and Flink. You can create branches with ALTER TABLE ... CREATE BRANCH, write to them via the spark.wap.branch session config, and publish with CALL system.fast_forward(). The semantics are cleaner than the clone-and-swap approach because the branch is the table — just a different lineage of snapshots within the same metadata.

As of writing this it appears Snowflake does not natively support Iceberg branching through its SQL engine today. The branching and tagging metadata operations are Iceberg spec features implemented in the Java library, with engine integrations for Spark and Flink. Snowflake’s Iceberg support gives you full DML, Time Travel, and schema evolution — but not yet the CREATE BRANCH or fast_forward operations.

For teams already running Spark workloads alongside Snowflake, the branching approach is worth exploring now. You run the WAP pipeline in Spark with native Iceberg branching, and Snowflake reads the published data as an externally managed Iceberg table. It’s a composable architecture where you push quality enforcement to cheaper compute and keep the warehouse for analytics.

For teams that are Snowflake-native, the clone-and-swap approach we’ve built in this tutorial gives you the vast majority of the benefits today. When Snowflake adds native branching support — and given the competitive landscape with Databricks, I’d expect it — we can revisit this with a cleaner implementation. The quality checks and audit battery you build now will carry forward regardless of the underlying publish mechanism.

Best Practices That’ll Save You Pain

After running WAP patterns in various forms across multiple teams, these are the lessons that kept surfacing.

Start with hard gates, evolve to soft gates. Initially, any check failure should block publication. Full stop. As your quality checks mature and you understand your data better, you’ll learn which checks are hard blockers — null primary keys, negative amounts, schema violations — and which are warnings worth logging but not blocking on — volume slightly above the historical average, a handful of records with an unusual but valid status. Build your audit results as structured data with severity levels so the publish decision can be nuanced.

Log quarantined records, always. Even when you drop a staging clone, capture the failing records somewhere for investigation. A quarantine table or a dead-letter S3 prefix gives your team the evidence they need to fix the source system — which is the real solution, not just filtering out bad data forever. Every quarantined record should answer three questions: what failed, which check caught it, and when it arrived.

Make your quality checks idempotent. The audit battery should be safe to run multiple times without side effects. No state mutations, no “mark as audited” flags that prevent re-runs. If the pipeline fails halfway through auditing, you should be able to restart from the top of the audit phase without creating a new clone.

Monitor your audit pass rates over time. A pipeline that fails WAP checks once a month is healthy — the pattern is catching occasional source issues. A pipeline that fails WAP checks every other day is telling you something about your upstream data source that needs fixing. Track the pass/fail ratio, the specific checks that fail most often, and the volume of quarantined records. This data is ammunition for the conversation with the source system owners.

Don’t forget the cleanup. Snowflake’s CLONE creates metadata that references the same underlying Parquet files. If you create clones and forget to drop them, you’re not duplicating storage (the Parquet files are shared), but you are accumulating metadata that Snowflake tracks. Expired clones should be dropped promptly. Set up a cleanup task that sweeps for orphaned staging tables older than your retention threshold.

Tag your successful publishes. Not Iceberg tags (Snowflake doesn’t support those natively yet), but a metadata log. Keep a table that records every successful WAP publish — the timestamp, the table name, the row count of the new batch, and the audit results. When something goes wrong downstream three weeks from now and someone asks “what changed?”, that log is your first stop.

The Bigger Picture: Data Quality as a Deploy Gate

Zach Wilson has a line that sticks with me: “Data engineering shouldn’t have any less engineering rigor than software engineering.” He’s right. We’ve spent years building CI/CD pipelines for application code — linting, unit tests, integration tests, staging environments, approval gates — and then we load data straight into production with a prayer and a Teams message that says “run looks good.”

WAP is the data equivalent of a deployment gate. You wouldn’t merge a pull request that fails tests. You shouldn’t publish data that fails quality checks. The mechanics are different — we’re swapping table metadata instead of deploying container images — but the principle is identical: validate before you publish, and make the publish step as cheap and reversible as possible.

Iceberg tables make the “as cheap as possible” part genuinely achievable. When publish is a zero-copy metadata operation that completes in under a second regardless of table size, the overhead of WAP effectively disappears. The only cost is the compute for your quality checks — and that’s compute you want to be spending.

That Tuesday afternoon with the phantom Total Portfolio growth? It cost us three days of investigation, two days of reprocessing historical data, and an unknowable amount of credibility with the Department Head and her team. Implementing WAP on that pipeline would have cost us nothing but a few lines of SQL and a staging clone that gets created and destroyed automatically every morning.

The pattern works. The tooling is ready. Start where you are, build the quality gates, and let the zero-copy publish mechanics handle the rest.

Your production data — and your Department Head — will thank you for it.

Chris Hillman