<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Data Freshness on Ghost in the data</title><link>https://ghostinthedata.info/tags/data-freshness/</link><description>Ghost in the data</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>Ghost in the data</copyright><lastBuildDate>Sat, 18 Apr 2026 09:00:00 +1100</lastBuildDate><atom:link href="https://ghostinthedata.info/tags/data-freshness/index.xml" rel="self" type="application/rss+xml"/><item><title>Why Your Pipeline Finishes Later Every Month</title><link>https://ghostinthedata.info/posts/2026/2026-04-18-pipeline-optimization/</link><pubDate>Sat, 18 Apr 2026 09:00:00 +1100</pubDate><guid>https://ghostinthedata.info/posts/2026/2026-04-18-pipeline-optimization/</guid><author>Chris Hillman</author><description>A practical guide to diagnosing pipeline bottlenecks, fixing unnecessary dependencies, and getting data to consumers faster — with Snowflake and AWS patterns you can apply today.</description><content:encoded>&lt;p>Let me tell you about a graph that changed how I think about data engineering.&lt;/p>
&lt;p>A junior engineer on my team — let&amp;rsquo;s call her Priya — had been tracking something nobody asked her to track. Every morning for two months, she&amp;rsquo;d noted the timestamp when our main analytics pipeline completed. She wasn&amp;rsquo;t trying to make a point. She was just curious, because the finance team kept mentioning their dashboards weren&amp;rsquo;t ready when they arrived at 8 AM anymore.&lt;/p>
&lt;p>One afternoon she pulled me aside and showed me a scatter plot on her laptop. Pipeline completion time, plotted daily over several months. The trend was unmistakable: a slow, steady drift to the right. What used to finish at 5:47 AM was now finishing at 7:23 AM. And the slope wasn&amp;rsquo;t flattening.&lt;/p>
&lt;p>&amp;ldquo;If this keeps going,&amp;rdquo; she said, &amp;ldquo;we&amp;rsquo;ll miss the 9 AM SLA in about six weeks.&amp;rdquo;&lt;/p>
&lt;p>She was right. And nobody else on the team — including me — had noticed. We were watching for failures. Green DAGs, clean logs, no alerts. But the pipeline wasn&amp;rsquo;t failing. It was &lt;em>slowing&lt;/em>. And slow is harder to see than broken, because slow doesn&amp;rsquo;t trigger an alert. Slow just quietly erodes trust until one day someone in finance builds their own spreadsheet and stops asking you for anything.&lt;/p>
&lt;br>
&lt;p>That&amp;rsquo;s the moment I understood that pipeline health isn&amp;rsquo;t about pass/fail. It&amp;rsquo;s about &lt;em>trajectory&lt;/em>. A pipeline that runs successfully but takes 5% longer every month is a ticking clock. And the people who notice first aren&amp;rsquo;t the engineers watching the DAG — they&amp;rsquo;re the consumers waiting for their data.&lt;/p>
&lt;p>I&amp;rsquo;m telling you this because pipeline optimisation sounds like a performance engineering problem, and it is. But underneath the technical work, it&amp;rsquo;s really about a commitment: the commitment to deliver data when you said you would, every single day. That&amp;rsquo;s what builds trust between data teams and the rest of the organisation. Not fancy architectures. Not real-time everything. Just showing up on time, reliably.&lt;/p>
&lt;p>This article is about diagnosing why your pipelines get slower, identifying the bottlenecks that actually matter, and fixing the patterns that cause data to arrive later than it should. Everything is grounded in Snowflake, AWS, Airflow, and dbt — with specific patterns you can apply immediately.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="measure-the-pipeline-in-stages-not-as-a-single-number">Measure the pipeline in stages, not as a single number&lt;/h3>
&lt;br>
&lt;p>When a pipeline is slow, the instinct is to look at the longest-running task. That&amp;rsquo;s often the wrong place to start.&lt;/p>
&lt;p>A pipeline is a chain of stages: extract, transform, load, test. The total runtime is the sum of all stages on the critical path — the longest chain of dependent tasks. But the &lt;em>bottleneck&lt;/em> might not be the longest task. It might be a 30-second task that blocks five parallel branches from starting.&lt;/p>
&lt;p>Before you optimise anything, instrument your pipeline to measure each stage independently. In Airflow, the task instance metadata already captures this.&lt;/p>
&lt;p>Two numbers matter here: &lt;code>duration_seconds&lt;/code> (how long the task actually ran) and &lt;code>queue_wait_seconds&lt;/code> (how long it waited before it could run). If queue wait is high, your problem isn&amp;rsquo;t the task — it&amp;rsquo;s resource contention. Too many tasks competing for too few Airflow worker slots, or too many queries competing for the same Snowflake warehouse.&lt;/p>
&lt;p>For dbt runs specifically, the &lt;code>run_results.json&lt;/code> that dbt generates after every invocation is a goldmine:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">import&lt;/span> json
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">with&lt;/span> open(&lt;span style="color:#e6db74">&amp;#39;target/run_results.json&amp;#39;&lt;/span>) &lt;span style="color:#66d9ef">as&lt;/span> f:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> results &lt;span style="color:#f92672">=&lt;/span> json&lt;span style="color:#f92672">.&lt;/span>load(f)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Find your slowest models&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>models &lt;span style="color:#f92672">=&lt;/span> sorted(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [r &lt;span style="color:#66d9ef">for&lt;/span> r &lt;span style="color:#f92672">in&lt;/span> results[&lt;span style="color:#e6db74">&amp;#39;results&amp;#39;&lt;/span>] &lt;span style="color:#66d9ef">if&lt;/span> r[&lt;span style="color:#e6db74">&amp;#39;status&amp;#39;&lt;/span>] &lt;span style="color:#f92672">==&lt;/span> &lt;span style="color:#e6db74">&amp;#39;success&amp;#39;&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> key&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#66d9ef">lambda&lt;/span> x: x[&lt;span style="color:#e6db74">&amp;#39;execution_time&amp;#39;&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> reverse&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#66d9ef">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">for&lt;/span> m &lt;span style="color:#f92672">in&lt;/span> models[:&lt;span style="color:#ae81ff">10&lt;/span>]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#e6db74">{&lt;/span>m[&lt;span style="color:#e6db74">&amp;#39;unique_id&amp;#39;&lt;/span>]&lt;span style="color:#e6db74">:&lt;/span>&lt;span style="color:#e6db74">60s&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74"> &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>m[&lt;span style="color:#e6db74">&amp;#39;execution_time&amp;#39;&lt;/span>]&lt;span style="color:#e6db74">:&lt;/span>&lt;span style="color:#e6db74">8.1f&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">s&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Run that after your next &lt;code>dbt build&lt;/code>. You&amp;rsquo;ll immediately see which models dominate your pipeline runtime. In my experience, 3–5 models account for 60–80% of total execution time. Those are the only models worth optimising.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="your-dag-probably-has-dependencies-it-doesnt-need">Your DAG probably has dependencies it doesn&amp;rsquo;t need&lt;/h3>
&lt;br>
&lt;p>This is the most common structural problem in data pipelines, and it comes in two forms. One is obvious once you know to look for it. The other is sneaky, because it starts as a completely reasonable engineering decision.&lt;/p>
&lt;p>&lt;strong>The first form: phantom dependencies.&lt;/strong>&lt;/p>
&lt;p>An engineer builds Model C and declares it depends on Model B — because Model B produces a table that Model C reads. Fair enough. But six months later, someone refactors Model C and it no longer reads from Model B. The &lt;code>ref()&lt;/code> gets removed from the SQL. But the dependency in the Airflow DAG? That stays. Or in dbt, someone adds a &lt;code>ref()&lt;/code> to a model they don&amp;rsquo;t actually need, just to &amp;ldquo;make sure it runs first.&amp;rdquo;&lt;/p>
&lt;p>The result is a DAG with phantom dependencies — tasks that wait for other tasks to complete even though they don&amp;rsquo;t use those tasks&amp;rsquo; outputs. Every phantom dependency adds serial wait time to your pipeline.&lt;/p>
&lt;p>&lt;strong>The second form: dependency monsters.&lt;/strong>&lt;/p>
&lt;p>This one is trickier, because it starts with good intentions.&lt;/p>
&lt;p>The customer team needs three new enrichment attributes on &lt;code>dim_customers&lt;/code> — regional segment codes sourced from a CRM export, tenure tier derived from a subscription history table, and a propensity score from a data science model. All reasonable requests. Each one approved and added without much ceremony.&lt;/p>
&lt;p>But &lt;code>dim_customers&lt;/code> is upstream of 40+ models across your DAG. Adding those three attributes means pulling in the CRM extract, joining the subscription history table — which has its own upstream dependencies — and waiting on the propensity score model to complete before any of those 40 downstream models can start. Your pipeline used to finish at 9 AM. Eighteen months and a dozen enrichment requests later, it finishes at 3 PM.&lt;/p>
&lt;p>Each addition was individually justified. Nobody modelled what they&amp;rsquo;d cost collectively. That&amp;rsquo;s how you build a dependency monster.&lt;/p>
&lt;p>The fix isn&amp;rsquo;t to refuse enrichment requests — it&amp;rsquo;s to stop embedding enrichment into your core spine model. Keep &lt;code>dim_customers&lt;/code> lean: identifiers, names, status, the attributes that nearly every consumer genuinely needs. Build a separate &lt;code>dim_customers_extended&lt;/code> model that joins in the expensive enrichment for the consumers who actually need it. Most of your downstream models will never touch the propensity score. There&amp;rsquo;s no reason to make them wait for it.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- dim_customers.sql: lean spine — runs fast, unblocks the DAG
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> customer_id,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> customer_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> email,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> status,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> created_at
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#960050;background-color:#1e0010">{{&lt;/span> &lt;span style="color:#66d9ef">ref&lt;/span>(&lt;span style="color:#e6db74">&amp;#39;stg_customers&amp;#39;&lt;/span>) &lt;span style="color:#960050;background-color:#1e0010">}}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- dim_customers_extended.sql: enrichment for consumers who need it
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- runs independently, doesn&amp;#39;t block the main pipeline
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">c&lt;/span>.customer_id,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">c&lt;/span>.customer_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">c&lt;/span>.status,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> crm.regional_segment,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sub.tenure_tier,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ps.propensity_score
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#960050;background-color:#1e0010">{{&lt;/span> &lt;span style="color:#66d9ef">ref&lt;/span>(&lt;span style="color:#e6db74">&amp;#39;dim_customers&amp;#39;&lt;/span>) &lt;span style="color:#960050;background-color:#1e0010">}}&lt;/span> &lt;span style="color:#66d9ef">c&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">LEFT&lt;/span> &lt;span style="color:#66d9ef">JOIN&lt;/span> &lt;span style="color:#960050;background-color:#1e0010">{{&lt;/span> &lt;span style="color:#66d9ef">ref&lt;/span>(&lt;span style="color:#e6db74">&amp;#39;stg_crm_segments&amp;#39;&lt;/span>) &lt;span style="color:#960050;background-color:#1e0010">}}&lt;/span> crm
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">on&lt;/span> crm.customer_id &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">c&lt;/span>.customer_id
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">LEFT&lt;/span> &lt;span style="color:#66d9ef">JOIN&lt;/span> &lt;span style="color:#960050;background-color:#1e0010">{{&lt;/span> &lt;span style="color:#66d9ef">ref&lt;/span>(&lt;span style="color:#e6db74">&amp;#39;int_subscription_tenure&amp;#39;&lt;/span>) &lt;span style="color:#960050;background-color:#1e0010">}}&lt;/span> sub
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">on&lt;/span> sub.customer_id &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">c&lt;/span>.customer_id
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">LEFT&lt;/span> &lt;span style="color:#66d9ef">JOIN&lt;/span> &lt;span style="color:#960050;background-color:#1e0010">{{&lt;/span> &lt;span style="color:#66d9ef">ref&lt;/span>(&lt;span style="color:#e6db74">&amp;#39;ml_propensity_scores&amp;#39;&lt;/span>) &lt;span style="color:#960050;background-color:#1e0010">}}&lt;/span> ps
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">on&lt;/span> ps.customer_id &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">c&lt;/span>.customer_id
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now &lt;code>dim_customers_extended&lt;/code> and its expensive upstream dependencies sit on their own branch of the DAG. Consumers who need the enrichment reference the extended model. The rest of the pipeline is unblocked.&lt;/p>
&lt;p>A useful governance rule of thumb before adding a field to a core model: if fewer than half your consumers will ever query that attribute, it probably doesn&amp;rsquo;t belong there. The team who needs it should own the enrichment themselves, as a downstream model they maintain.&lt;/p>
&lt;br>
&lt;p>&lt;strong>Auditing for both problems&lt;/strong>&lt;/p>
&lt;p>In dbt, the &lt;code>dbt_project_evaluator&lt;/code> package surfaces structural issues systematically. Install it:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># packages.yml&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">packages&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#f92672">package&lt;/span>: &lt;span style="color:#ae81ff">dbt-labs/dbt_project_evaluator&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">version&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;&amp;gt;=0.8.0 &amp;lt;1.0.0&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Then run:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>dbt build --select package:dbt_project_evaluator
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>It will flag models with fan-out issues (one model referenced by too many downstream models), unnecessary dependencies, and models that could be parallelised but aren&amp;rsquo;t. It won&amp;rsquo;t catch dependency monsters directly — those require a human asking &amp;ldquo;does every consumer of this model actually use every attribute in it?&amp;rdquo; — but it&amp;rsquo;s a good starting point for the structural audit.&lt;/p>
&lt;p>For Airflow DAGs, the visual DAG view tells you a lot at a glance. A healthy DAG looks like a tree with wide parallel branches. An unhealthy DAG looks like a chain — or worse, a funnel where everything converges through a single overloaded task before it can branch out again. That funnel shape is the visual signature of a dependency monster: many expensive upstreams flowing into one hub model, with dozens of downstream models queued behind it.&lt;/p>
&lt;p>If your DAG looks like a chain, ask this question for every dependency edge: &lt;strong>does downstream task B actually read data produced by upstream task A?&lt;/strong> If the answer is no, remove the dependency. Let them run in parallel.&lt;/p>
&lt;p>I once audited a DAG with 47 tasks running in a strict serial chain. After removing phantom dependencies and restructuring, 31 of those tasks could run in parallel. The pipeline went from 2 hours 15 minutes to 38 minutes. Same tasks, same compute, same data. Just fewer unnecessary wait states.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="the-critical-path-is-the-only-thing-worth-optimising">The critical path is the only thing worth optimising&lt;/h3>
&lt;br>
&lt;p>Here&amp;rsquo;s a concept from project management that applies directly to pipeline engineering: the &lt;strong>critical path&lt;/strong>.&lt;/p>
&lt;p>The critical path is the longest chain of dependent tasks through your DAG. It determines your pipeline&amp;rsquo;s minimum possible runtime. Every other path through the DAG has &amp;ldquo;float&amp;rdquo; — slack time where tasks can be delayed without affecting the overall completion time.&lt;/p>
&lt;p>This means: &lt;strong>optimising a task that isn&amp;rsquo;t on the critical path has zero impact on your pipeline&amp;rsquo;s end-to-end runtime.&lt;/strong> You could make a non-critical task 10x faster and your pipeline would finish at exactly the same time.&lt;/p>
&lt;p>Finding the critical path requires knowing each task&amp;rsquo;s duration and dependency structure. For a dbt project, you can approximate it:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Using dbt&amp;#39;s run results stored in Snowflake (if you log them)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Or query your orchestrator&amp;#39;s task instance table
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">WITH&lt;/span> task_durations &lt;span style="color:#66d9ef">AS&lt;/span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> task_id &lt;span style="color:#66d9ef">AS&lt;/span> model_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AVG&lt;/span>(&lt;span style="color:#66d9ef">EXTRACT&lt;/span>(EPOCH &lt;span style="color:#66d9ef">FROM&lt;/span> (end_date &lt;span style="color:#f92672">-&lt;/span> start_date))) &lt;span style="color:#66d9ef">AS&lt;/span> avg_duration
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">FROM&lt;/span> task_instance
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">WHERE&lt;/span> dag_id &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">&amp;#39;dbt_daily&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> &lt;span style="color:#66d9ef">state&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">&amp;#39;success&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> execution_date &lt;span style="color:#f92672">&amp;gt;=&lt;/span> &lt;span style="color:#66d9ef">CURRENT_DATE&lt;/span> &lt;span style="color:#f92672">-&lt;/span> INTERVAL &lt;span style="color:#e6db74">&amp;#39;7 days&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">GROUP&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span> task_id
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> model_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ROUND(avg_duration, &lt;span style="color:#ae81ff">1&lt;/span>) &lt;span style="color:#66d9ef">AS&lt;/span> avg_seconds,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ROUND(avg_duration &lt;span style="color:#f92672">/&lt;/span> &lt;span style="color:#ae81ff">60&lt;/span>, &lt;span style="color:#ae81ff">1&lt;/span>) &lt;span style="color:#66d9ef">AS&lt;/span> avg_minutes
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span> task_durations
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span> avg_duration &lt;span style="color:#66d9ef">DESC&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">LIMIT&lt;/span> &lt;span style="color:#ae81ff">20&lt;/span>;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The longest tasks are &lt;em>candidates&lt;/em> for the critical path, but only if they&amp;rsquo;re on the dependency chain that determines total runtime. A 20-minute model that runs in parallel with a 45-minute model isn&amp;rsquo;t the bottleneck — the 45-minute model is.&lt;/p>
&lt;p>Once you&amp;rsquo;ve identified the critical path, you have three options for shortening it: make the slow tasks faster (query optimisation, incremental models), reduce the number of tasks on the path (remove unnecessary dependencies), or parallelise sequential tasks (split a monolithic model into independent pieces).&lt;/p>
&lt;hr>
&lt;h3 id="shifting-right-diagnose-it-before-it-breaks-your-sla">Shifting right: diagnose it before it breaks your SLA&lt;/h3>
&lt;br>
&lt;p>Remember Priya&amp;rsquo;s scatter plot? That pattern — pipeline completion drifting later and later — has a name: &lt;strong>shifting right&lt;/strong>. And it has predictable causes.&lt;/p>
&lt;p>Track it with a simple query against your orchestrator&amp;rsquo;s metadata:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Airflow: track pipeline completion time drift
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> execution_date::DATE &lt;span style="color:#66d9ef">AS&lt;/span> run_date,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">MAX&lt;/span>(end_date)::TIME &lt;span style="color:#66d9ef">AS&lt;/span> completion_time,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">EXTRACT&lt;/span>(EPOCH &lt;span style="color:#66d9ef">FROM&lt;/span> (&lt;span style="color:#66d9ef">MAX&lt;/span>(end_date) &lt;span style="color:#f92672">-&lt;/span> &lt;span style="color:#66d9ef">MIN&lt;/span>(start_date))) &lt;span style="color:#f92672">/&lt;/span> &lt;span style="color:#ae81ff">60&lt;/span> &lt;span style="color:#66d9ef">AS&lt;/span> total_runtime_minutes
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> task_instance
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dag_id &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">&amp;#39;daily_analytics&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> &lt;span style="color:#66d9ef">state&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">&amp;#39;success&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> execution_date &lt;span style="color:#f92672">&amp;gt;=&lt;/span> &lt;span style="color:#66d9ef">CURRENT_DATE&lt;/span> &lt;span style="color:#f92672">-&lt;/span> INTERVAL &lt;span style="color:#e6db74">&amp;#39;60 days&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">GROUP&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> run_date
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> run_date;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Plot &lt;code>completion_time&lt;/code> over &lt;code>run_date&lt;/code>. If it trends upward, you&amp;rsquo;re shifting right. The four root causes, in order of how often I see them:&lt;/p>
&lt;p>&lt;strong>1. Data volume growth.&lt;/strong> Your table had 10 million rows when you built the model. Now it has 200 million. That JOIN that took 8 seconds now takes 3 minutes. This is the most common cause and the easiest to fix — switch to incremental materialisation and the growth stops mattering.&lt;/p>
&lt;p>&lt;strong>2. Resource contention.&lt;/strong> You&amp;rsquo;ve added more pipelines and they all run in the same window, competing for the same Snowflake warehouse. In Snowflake, you can see this directly:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Find warehouse queueing (queries waiting for compute)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> warehouse_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> DATE_TRUNC(&lt;span style="color:#e6db74">&amp;#39;hour&amp;#39;&lt;/span>, start_time) &lt;span style="color:#66d9ef">AS&lt;/span> hour,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">COUNT&lt;/span>(&lt;span style="color:#f92672">*&lt;/span>) &lt;span style="color:#66d9ef">AS&lt;/span> total_queries,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SUM&lt;/span>(&lt;span style="color:#66d9ef">CASE&lt;/span> &lt;span style="color:#66d9ef">WHEN&lt;/span> queued_overload_time &lt;span style="color:#f92672">&amp;gt;&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span> &lt;span style="color:#66d9ef">THEN&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span> &lt;span style="color:#66d9ef">ELSE&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span> &lt;span style="color:#66d9ef">END&lt;/span>) &lt;span style="color:#66d9ef">AS&lt;/span> queued_queries,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AVG&lt;/span>(queued_overload_time) &lt;span style="color:#f92672">/&lt;/span> &lt;span style="color:#ae81ff">1000&lt;/span> &lt;span style="color:#66d9ef">AS&lt;/span> avg_queue_seconds
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> snowflake.account_usage.query_history
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> start_time &lt;span style="color:#f92672">&amp;gt;=&lt;/span> DATEADD(&lt;span style="color:#e6db74">&amp;#39;day&amp;#39;&lt;/span>, &lt;span style="color:#f92672">-&lt;/span>&lt;span style="color:#ae81ff">7&lt;/span>, &lt;span style="color:#66d9ef">CURRENT_TIMESTAMP&lt;/span>())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> warehouse_name &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">&amp;#39;TRANSFORM_WH&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">GROUP&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> warehouse_name, hour
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">HAVING&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> queued_queries &lt;span style="color:#f92672">&amp;gt;&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> hour;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>If &lt;code>queued_queries&lt;/code> is non-zero during your pipeline window, queries are waiting for warehouse compute. Either scale up the warehouse during that window, split workloads across dedicated warehouses, or stagger pipeline start times to reduce concurrency.&lt;/p>
&lt;p>&lt;strong>3. Dependency chain lengthening.&lt;/strong> Someone added a new intermediate model between your staging and mart layers. That model takes 4 minutes. But because it&amp;rsquo;s on the critical path, the entire pipeline now finishes 4 minutes later. This is death by a thousand cuts — each addition is small, but they accumulate.&lt;/p>
&lt;p>&lt;strong>4. Upstream source delays.&lt;/strong> Your pipeline starts at 4 AM because the source system&amp;rsquo;s extract used to land in S3 by 3:45 AM. But the source system has grown too, and now the extract doesn&amp;rsquo;t land until 4:30 AM. Your pipeline sensors wait, and everything shifts right.&lt;/p>
&lt;p>For upstream delays in an AWS environment, replace time-based scheduling with event-driven triggers. Instead of scheduling your Airflow DAG at 4 AM and hoping the data is there, trigger it when the data actually arrives:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Airflow DAG: trigger on S3 file landing using AWS sensor&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">from&lt;/span> airflow.providers.amazon.aws.sensors.s3 &lt;span style="color:#f92672">import&lt;/span> S3KeySensor
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">from&lt;/span> airflow &lt;span style="color:#f92672">import&lt;/span> DAG
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">from&lt;/span> datetime &lt;span style="color:#f92672">import&lt;/span> datetime
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">with&lt;/span> DAG(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#39;daily_analytics&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> start_date&lt;span style="color:#f92672">=&lt;/span>datetime(&lt;span style="color:#ae81ff">2026&lt;/span>, &lt;span style="color:#ae81ff">1&lt;/span>, &lt;span style="color:#ae81ff">1&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> schedule_interval&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#66d9ef">None&lt;/span>, &lt;span style="color:#75715e"># triggered externally, not on a schedule&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> catchup&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#66d9ef">False&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) &lt;span style="color:#66d9ef">as&lt;/span> dag:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> wait_for_source &lt;span style="color:#f92672">=&lt;/span> S3KeySensor(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> task_id&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;wait_for_orders_extract&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> bucket_name&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;your-data-lake&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> bucket_key&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;raw/orders/dt={{ ds }}/orders.parquet&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> aws_conn_id&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;aws_default&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> mode&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;reschedule&amp;#39;&lt;/span>, &lt;span style="color:#75715e"># frees the worker slot while waiting&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> poke_interval&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">300&lt;/span>, &lt;span style="color:#75715e"># check every 5 minutes&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> timeout&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">7200&lt;/span>, &lt;span style="color:#75715e"># fail after 2 hours if file never arrives&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>mode='reschedule'&lt;/code> is critical. Without it, the sensor occupies a worker slot for the entire time it&amp;rsquo;s waiting. With &lt;code>reschedule&lt;/code>, it checks, releases the slot, and checks again later. This prevents the classic deadlock where all your worker slots are consumed by sensors and no actual work can run.&lt;/p>
&lt;p>Even better: use S3 event notifications to trigger a Lambda function that kicks off the DAG via Airflow&amp;rsquo;s REST API. Zero polling, zero wasted slots:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Lambda function triggered by S3 PutObject event&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">import&lt;/span> boto3
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">import&lt;/span> requests
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">import&lt;/span> os
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">def&lt;/span> &lt;span style="color:#a6e22e">handler&lt;/span>(event, context):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> bucket &lt;span style="color:#f92672">=&lt;/span> event[&lt;span style="color:#e6db74">&amp;#39;Records&amp;#39;&lt;/span>][&lt;span style="color:#ae81ff">0&lt;/span>][&lt;span style="color:#e6db74">&amp;#39;s3&amp;#39;&lt;/span>][&lt;span style="color:#e6db74">&amp;#39;bucket&amp;#39;&lt;/span>][&lt;span style="color:#e6db74">&amp;#39;name&amp;#39;&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> key &lt;span style="color:#f92672">=&lt;/span> event[&lt;span style="color:#e6db74">&amp;#39;Records&amp;#39;&lt;/span>][&lt;span style="color:#ae81ff">0&lt;/span>][&lt;span style="color:#e6db74">&amp;#39;s3&amp;#39;&lt;/span>][&lt;span style="color:#e6db74">&amp;#39;object&amp;#39;&lt;/span>][&lt;span style="color:#e6db74">&amp;#39;key&amp;#39;&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#75715e"># Trigger Airflow DAG via REST API&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> airflow_url &lt;span style="color:#f92672">=&lt;/span> os&lt;span style="color:#f92672">.&lt;/span>environ[&lt;span style="color:#e6db74">&amp;#39;AIRFLOW_API_URL&amp;#39;&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response &lt;span style="color:#f92672">=&lt;/span> requests&lt;span style="color:#f92672">.&lt;/span>post(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#e6db74">{&lt;/span>airflow_url&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">/api/v1/dags/daily_analytics/dagRuns&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> json&lt;span style="color:#f92672">=&lt;/span>{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;conf&amp;#34;&lt;/span>: {&lt;span style="color:#e6db74">&amp;#34;source_bucket&amp;#34;&lt;/span>: bucket, &lt;span style="color:#e6db74">&amp;#34;source_key&amp;#34;&lt;/span>: key}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> auth&lt;span style="color:#f92672">=&lt;/span>(os&lt;span style="color:#f92672">.&lt;/span>environ[&lt;span style="color:#e6db74">&amp;#39;AIRFLOW_USER&amp;#39;&lt;/span>], os&lt;span style="color:#f92672">.&lt;/span>environ[&lt;span style="color:#e6db74">&amp;#39;AIRFLOW_PASSWORD&amp;#39;&lt;/span>]),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> headers&lt;span style="color:#f92672">=&lt;/span>{&lt;span style="color:#e6db74">&amp;#34;Content-Type&amp;#34;&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;application/json&amp;#34;&lt;/span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">return&lt;/span> {&lt;span style="color:#e6db74">&amp;#34;statusCode&amp;#34;&lt;/span>: response&lt;span style="color:#f92672">.&lt;/span>status_code}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This eliminates the &amp;ldquo;safety margin&amp;rdquo; scheduling pattern entirely. Your pipeline runs as soon as the data is available — not 30 minutes after you &lt;em>hope&lt;/em> it will be available.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="partition-and-cluster-to-eliminate-full-table-scans">Partition and cluster to eliminate full table scans&lt;/h3>
&lt;br>
&lt;p>The single most effective performance optimisation in Snowflake is making sure your queries only read the data they need. Snowflake&amp;rsquo;s micro-partition pruning does this automatically — but only if your data is physically organised in a way that aligns with your query patterns.&lt;/p>
&lt;p>Clustering keys tell Snowflake how to organise data within micro-partitions. If you consistently filter on &lt;code>order_date&lt;/code>, clustering on that column means queries with a &lt;code>WHERE order_date = '2026-03-01'&lt;/code> clause scan a tiny fraction of the table instead of the whole thing.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Add clustering to a large fact table
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">ALTER&lt;/span> &lt;span style="color:#66d9ef">TABLE&lt;/span> analytics.fct_orders
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">CLUSTER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span> (order_date, customer_segment);
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Choose clustering keys based on how the table is actually queried, not how it&amp;rsquo;s loaded. The best candidates are columns that appear in WHERE clauses, JOIN conditions, and range filters. Limit yourself to 2–3 keys — more than that and Snowflake can&amp;rsquo;t maintain effective clustering.&lt;/p>
&lt;p>Check whether your existing tables benefit from clustering:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Check clustering depth (lower is better, 0 is perfect)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">table_name&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> clustering_key,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_constant_partition_count,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_partition_count,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> average_overlaps,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> average_depth
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> snowflake.account_usage.table_storage_metrics
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> table_catalog &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">&amp;#39;YOUR_DATABASE&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> clustering_key &lt;span style="color:#66d9ef">IS&lt;/span> &lt;span style="color:#66d9ef">NOT&lt;/span> &lt;span style="color:#66d9ef">NULL&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> active_bytes &lt;span style="color:#f92672">&amp;gt;&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> average_depth &lt;span style="color:#66d9ef">DESC&lt;/span>;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>An &lt;code>average_depth&lt;/code> above 5 means your clustering isn&amp;rsquo;t effective — queries are still scanning more partitions than they should. Either the clustering key doesn&amp;rsquo;t match query patterns, or the table has had so many small writes that the clustering has degraded. Running &lt;code>ALTER TABLE ... RECLUSTER&lt;/code> manually or relying on automatic clustering (which costs credits) addresses the latter.&lt;/p>
&lt;p>On the AWS side, if you&amp;rsquo;re using S3 as a data lake with Parquet or Iceberg, the equivalent is &lt;strong>partition layout&lt;/strong>. Partition your S3 data by the columns you filter most:&lt;/p>



&lt;div class="goat svg-container ">
 
 &lt;svg
 xmlns="http://www.w3.org/2000/svg"
 font-family="Menlo,Lucida Console,monospace"
 
 viewBox="0 0 312 121"
 >
 &lt;g transform='translate(8,16)'>
&lt;polygon points='176.000000,80.000000 164.000000,74.400002 164.000000,85.599998' fill='currentColor' transform='rotate(90.000000, 168.000000, 80.000000)'>&lt;/polygon>
&lt;text text-anchor='middle' x='0' y='4' fill='currentColor' style='font-size:1em'>s&lt;/text>
&lt;text text-anchor='middle' x='8' y='4' fill='currentColor' style='font-size:1em'>3&lt;/text>
&lt;text text-anchor='middle' x='16' y='4' fill='currentColor' style='font-size:1em'>:&lt;/text>
&lt;text text-anchor='middle' x='24' y='4' fill='currentColor' style='font-size:1em'>/&lt;/text>
&lt;text text-anchor='middle' x='32' y='4' fill='currentColor' style='font-size:1em'>/&lt;/text>
&lt;text text-anchor='middle' x='32' y='20' fill='currentColor' style='font-size:1em'>e&lt;/text>
&lt;text text-anchor='middle' x='40' y='4' fill='currentColor' style='font-size:1em'>y&lt;/text>
&lt;text text-anchor='middle' x='40' y='20' fill='currentColor' style='font-size:1em'>v&lt;/text>
&lt;text text-anchor='middle' x='48' y='4' fill='currentColor' style='font-size:1em'>o&lt;/text>
&lt;text text-anchor='middle' x='48' y='20' fill='currentColor' style='font-size:1em'>e&lt;/text>
&lt;text text-anchor='middle' x='56' y='4' fill='currentColor' style='font-size:1em'>u&lt;/text>
&lt;text text-anchor='middle' x='56' y='20' fill='currentColor' style='font-size:1em'>n&lt;/text>
&lt;text text-anchor='middle' x='64' y='4' fill='currentColor' style='font-size:1em'>r&lt;/text>
&lt;text text-anchor='middle' x='64' y='20' fill='currentColor' style='font-size:1em'>t&lt;/text>
&lt;text text-anchor='middle' x='64' y='36' fill='currentColor' style='font-size:1em'>y&lt;/text>
&lt;text text-anchor='middle' x='72' y='4' fill='currentColor' style='font-size:1em'>-&lt;/text>
&lt;text text-anchor='middle' x='72' y='20' fill='currentColor' style='font-size:1em'>s&lt;/text>
&lt;text text-anchor='middle' x='72' y='36' fill='currentColor' style='font-size:1em'>e&lt;/text>
&lt;text text-anchor='middle' x='80' y='4' fill='currentColor' style='font-size:1em'>d&lt;/text>
&lt;text text-anchor='middle' x='80' y='20' fill='currentColor' style='font-size:1em'>/&lt;/text>
&lt;text text-anchor='middle' x='80' y='36' fill='currentColor' style='font-size:1em'>a&lt;/text>
&lt;text text-anchor='middle' x='88' y='4' fill='currentColor' style='font-size:1em'>a&lt;/text>
&lt;text text-anchor='middle' x='88' y='36' fill='currentColor' style='font-size:1em'>r&lt;/text>
&lt;text text-anchor='middle' x='96' y='4' fill='currentColor' style='font-size:1em'>t&lt;/text>
&lt;text text-anchor='middle' x='96' y='36' fill='currentColor' style='font-size:1em'>=&lt;/text>
&lt;text text-anchor='middle' x='96' y='52' fill='currentColor' style='font-size:1em'>m&lt;/text>
&lt;text text-anchor='middle' x='104' y='4' fill='currentColor' style='font-size:1em'>a&lt;/text>
&lt;text text-anchor='middle' x='104' y='36' fill='currentColor' style='font-size:1em'>2&lt;/text>
&lt;text text-anchor='middle' x='104' y='52' fill='currentColor' style='font-size:1em'>o&lt;/text>
&lt;text text-anchor='middle' x='112' y='4' fill='currentColor' style='font-size:1em'>-&lt;/text>
&lt;text text-anchor='middle' x='112' y='36' fill='currentColor' style='font-size:1em'>0&lt;/text>
&lt;text text-anchor='middle' x='112' y='52' fill='currentColor' style='font-size:1em'>n&lt;/text>
&lt;text text-anchor='middle' x='120' y='4' fill='currentColor' style='font-size:1em'>l&lt;/text>
&lt;text text-anchor='middle' x='120' y='36' fill='currentColor' style='font-size:1em'>2&lt;/text>
&lt;text text-anchor='middle' x='120' y='52' fill='currentColor' style='font-size:1em'>t&lt;/text>
&lt;text text-anchor='middle' x='128' y='4' fill='currentColor' style='font-size:1em'>a&lt;/text>
&lt;text text-anchor='middle' x='128' y='36' fill='currentColor' style='font-size:1em'>6&lt;/text>
&lt;text text-anchor='middle' x='128' y='52' fill='currentColor' style='font-size:1em'>h&lt;/text>
&lt;text text-anchor='middle' x='128' y='68' fill='currentColor' style='font-size:1em'>d&lt;/text>
&lt;text text-anchor='middle' x='136' y='4' fill='currentColor' style='font-size:1em'>k&lt;/text>
&lt;text text-anchor='middle' x='136' y='36' fill='currentColor' style='font-size:1em'>/&lt;/text>
&lt;text text-anchor='middle' x='136' y='52' fill='currentColor' style='font-size:1em'>=&lt;/text>
&lt;text text-anchor='middle' x='136' y='68' fill='currentColor' style='font-size:1em'>a&lt;/text>
&lt;text text-anchor='middle' x='144' y='4' fill='currentColor' style='font-size:1em'>e&lt;/text>
&lt;text text-anchor='middle' x='144' y='52' fill='currentColor' style='font-size:1em'>0&lt;/text>
&lt;text text-anchor='middle' x='144' y='68' fill='currentColor' style='font-size:1em'>y&lt;/text>
&lt;text text-anchor='middle' x='152' y='4' fill='currentColor' style='font-size:1em'>/&lt;/text>
&lt;text text-anchor='middle' x='152' y='52' fill='currentColor' style='font-size:1em'>3&lt;/text>
&lt;text text-anchor='middle' x='152' y='68' fill='currentColor' style='font-size:1em'>=&lt;/text>
&lt;text text-anchor='middle' x='160' y='52' fill='currentColor' style='font-size:1em'>/&lt;/text>
&lt;text text-anchor='middle' x='160' y='68' fill='currentColor' style='font-size:1em'>1&lt;/text>
&lt;text text-anchor='middle' x='160' y='84' fill='currentColor' style='font-size:1em'>e&lt;/text>
&lt;text text-anchor='middle' x='160' y='100' fill='currentColor' style='font-size:1em'>e&lt;/text>
&lt;text text-anchor='middle' x='168' y='68' fill='currentColor' style='font-size:1em'>5&lt;/text>
&lt;text text-anchor='middle' x='168' y='100' fill='currentColor' style='font-size:1em'>v&lt;/text>
&lt;text text-anchor='middle' x='176' y='68' fill='currentColor' style='font-size:1em'>/&lt;/text>
&lt;text text-anchor='middle' x='176' y='84' fill='currentColor' style='font-size:1em'>e&lt;/text>
&lt;text text-anchor='middle' x='176' y='100' fill='currentColor' style='font-size:1em'>e&lt;/text>
&lt;text text-anchor='middle' x='184' y='84' fill='currentColor' style='font-size:1em'>n&lt;/text>
&lt;text text-anchor='middle' x='184' y='100' fill='currentColor' style='font-size:1em'>n&lt;/text>
&lt;text text-anchor='middle' x='192' y='84' fill='currentColor' style='font-size:1em'>t&lt;/text>
&lt;text text-anchor='middle' x='192' y='100' fill='currentColor' style='font-size:1em'>t&lt;/text>
&lt;text text-anchor='middle' x='200' y='84' fill='currentColor' style='font-size:1em'>s&lt;/text>
&lt;text text-anchor='middle' x='200' y='100' fill='currentColor' style='font-size:1em'>s&lt;/text>
&lt;text text-anchor='middle' x='208' y='84' fill='currentColor' style='font-size:1em'>_&lt;/text>
&lt;text text-anchor='middle' x='208' y='100' fill='currentColor' style='font-size:1em'>_&lt;/text>
&lt;text text-anchor='middle' x='216' y='84' fill='currentColor' style='font-size:1em'>0&lt;/text>
&lt;text text-anchor='middle' x='216' y='100' fill='currentColor' style='font-size:1em'>0&lt;/text>
&lt;text text-anchor='middle' x='224' y='84' fill='currentColor' style='font-size:1em'>0&lt;/text>
&lt;text text-anchor='middle' x='224' y='100' fill='currentColor' style='font-size:1em'>0&lt;/text>
&lt;text text-anchor='middle' x='232' y='84' fill='currentColor' style='font-size:1em'>1&lt;/text>
&lt;text text-anchor='middle' x='232' y='100' fill='currentColor' style='font-size:1em'>2&lt;/text>
&lt;text text-anchor='middle' x='240' y='84' fill='currentColor' style='font-size:1em'>.&lt;/text>
&lt;text text-anchor='middle' x='240' y='100' fill='currentColor' style='font-size:1em'>.&lt;/text>
&lt;text text-anchor='middle' x='248' y='84' fill='currentColor' style='font-size:1em'>p&lt;/text>
&lt;text text-anchor='middle' x='248' y='100' fill='currentColor' style='font-size:1em'>p&lt;/text>
&lt;text text-anchor='middle' x='256' y='84' fill='currentColor' style='font-size:1em'>a&lt;/text>
&lt;text text-anchor='middle' x='256' y='100' fill='currentColor' style='font-size:1em'>a&lt;/text>
&lt;text text-anchor='middle' x='264' y='84' fill='currentColor' style='font-size:1em'>r&lt;/text>
&lt;text text-anchor='middle' x='264' y='100' fill='currentColor' style='font-size:1em'>r&lt;/text>
&lt;text text-anchor='middle' x='272' y='84' fill='currentColor' style='font-size:1em'>q&lt;/text>
&lt;text text-anchor='middle' x='272' y='100' fill='currentColor' style='font-size:1em'>q&lt;/text>
&lt;text text-anchor='middle' x='280' y='84' fill='currentColor' style='font-size:1em'>u&lt;/text>
&lt;text text-anchor='middle' x='280' y='100' fill='currentColor' style='font-size:1em'>u&lt;/text>
&lt;text text-anchor='middle' x='288' y='84' fill='currentColor' style='font-size:1em'>e&lt;/text>
&lt;text text-anchor='middle' x='288' y='100' fill='currentColor' style='font-size:1em'>e&lt;/text>
&lt;text text-anchor='middle' x='296' y='84' fill='currentColor' style='font-size:1em'>t&lt;/text>
&lt;text text-anchor='middle' x='296' y='100' fill='currentColor' style='font-size:1em'>t&lt;/text>
&lt;/g>

 &lt;/svg>
 
&lt;/div>
&lt;p>When Snowflake external tables or AWS Athena query this structure with a date filter, they skip entire directories. A query for March 15th reads two files instead of scanning the entire &lt;code>events/&lt;/code> prefix. The savings compound with data volume — at a billion rows, proper partitioning can reduce query times from minutes to seconds.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="make-every-task-idempotent-or-debugging-becomes-a-nightmare">Make every task idempotent or debugging becomes a nightmare&lt;/h3>
&lt;br>
&lt;p>Here&amp;rsquo;s a pattern I see constantly: a pipeline fails midway through, the engineer reruns it, and now there are duplicate rows in the target table. Because the first run inserted half the data before failing, and the rerun inserted &lt;em>all&lt;/em> the data — including the half that already existed.&lt;/p>
&lt;p>Idempotency means a task produces the same result whether it runs once or ten times. This isn&amp;rsquo;t just a nice-to-have — it&amp;rsquo;s the foundation that makes everything else in pipeline engineering possible. Without it, you can&amp;rsquo;t safely retry. You can&amp;rsquo;t backfill. You can&amp;rsquo;t debug.&lt;/p>
&lt;p>In dbt, incremental models with a &lt;code>unique_key&lt;/code> are idempotent by default — the MERGE statement handles duplicates:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#960050;background-color:#1e0010">{{&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> config(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> materialized&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;incremental&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> unique_key&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;event_id&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> incremental_strategy&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;merge&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#960050;background-color:#1e0010">}}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> event_id,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> user_id,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> event_type,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> event_timestamp,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> properties
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#960050;background-color:#1e0010">{{&lt;/span> &lt;span style="color:#66d9ef">ref&lt;/span>(&lt;span style="color:#e6db74">&amp;#39;stg_events&amp;#39;&lt;/span>) &lt;span style="color:#960050;background-color:#1e0010">}}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#960050;background-color:#1e0010">{&lt;/span>&lt;span style="color:#f92672">%&lt;/span> &lt;span style="color:#66d9ef">if&lt;/span> is_incremental() &lt;span style="color:#f92672">%&lt;/span>&lt;span style="color:#960050;background-color:#1e0010">}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> event_timestamp &lt;span style="color:#f92672">&amp;gt;=&lt;/span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SELECT&lt;/span> DATEADD(&lt;span style="color:#e6db74">&amp;#39;day&amp;#39;&lt;/span>, &lt;span style="color:#f92672">-&lt;/span>&lt;span style="color:#ae81ff">3&lt;/span>, &lt;span style="color:#66d9ef">MAX&lt;/span>(event_timestamp))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">FROM&lt;/span> &lt;span style="color:#960050;background-color:#1e0010">{{&lt;/span> this &lt;span style="color:#960050;background-color:#1e0010">}}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#960050;background-color:#1e0010">{&lt;/span>&lt;span style="color:#f92672">%&lt;/span> endif &lt;span style="color:#f92672">%&lt;/span>&lt;span style="color:#960050;background-color:#1e0010">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>For non-dbt loads — say, a Lambda function loading data from an API into Snowflake — use MERGE instead of INSERT:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>MERGE &lt;span style="color:#66d9ef">INTO&lt;/span> raw.api_customers &lt;span style="color:#66d9ef">AS&lt;/span> target
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">USING&lt;/span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SELECT&lt;/span> &lt;span style="color:#960050;background-color:#1e0010">$&lt;/span>&lt;span style="color:#ae81ff">1&lt;/span>:id::STRING &lt;span style="color:#66d9ef">AS&lt;/span> customer_id,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#960050;background-color:#1e0010">$&lt;/span>&lt;span style="color:#ae81ff">1&lt;/span>:name::STRING &lt;span style="color:#66d9ef">AS&lt;/span> customer_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#960050;background-color:#1e0010">$&lt;/span>&lt;span style="color:#ae81ff">1&lt;/span>:email::STRING &lt;span style="color:#66d9ef">AS&lt;/span> email,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#960050;background-color:#1e0010">$&lt;/span>&lt;span style="color:#ae81ff">1&lt;/span>:updated_at::TIMESTAMP_NTZ &lt;span style="color:#66d9ef">AS&lt;/span> updated_at
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">FROM&lt;/span> &lt;span style="color:#f92672">@&lt;/span>raw.s3_stage&lt;span style="color:#f92672">/&lt;/span>customers&lt;span style="color:#f92672">/&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (FILE_FORMAT &lt;span style="color:#f92672">=&amp;gt;&lt;/span> &lt;span style="color:#e6db74">&amp;#39;json_format&amp;#39;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) &lt;span style="color:#66d9ef">AS&lt;/span> &lt;span style="color:#66d9ef">source&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">ON&lt;/span> target.customer_id &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">source&lt;/span>.customer_id
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">WHEN&lt;/span> MATCHED &lt;span style="color:#66d9ef">AND&lt;/span> &lt;span style="color:#66d9ef">source&lt;/span>.updated_at &lt;span style="color:#f92672">&amp;gt;&lt;/span> target.updated_at &lt;span style="color:#66d9ef">THEN&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">UPDATE&lt;/span> &lt;span style="color:#66d9ef">SET&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> customer_name &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">source&lt;/span>.customer_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> email &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">source&lt;/span>.email,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> updated_at &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">source&lt;/span>.updated_at
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">WHEN&lt;/span> &lt;span style="color:#66d9ef">NOT&lt;/span> MATCHED &lt;span style="color:#66d9ef">THEN&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">INSERT&lt;/span> (customer_id, customer_name, email, updated_at)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">VALUES&lt;/span> (&lt;span style="color:#66d9ef">source&lt;/span>.customer_id, &lt;span style="color:#66d9ef">source&lt;/span>.customer_name, &lt;span style="color:#66d9ef">source&lt;/span>.email, &lt;span style="color:#66d9ef">source&lt;/span>.updated_at);
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>WHEN MATCHED AND source.updated_at &amp;gt; target.updated_at&lt;/code> condition prevents overwriting newer data with older data during reruns. This matters when you&amp;rsquo;re replaying historical loads or running overlapping backfills.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="most-we-need-real-time-requests-actually-need-we-need-faster-batch">Most &amp;ldquo;we need real-time&amp;rdquo; requests actually need &amp;ldquo;we need faster batch&amp;rdquo;&lt;/h3>
&lt;br>
&lt;p>Before you reach for Kafka, Kinesis, or any streaming infrastructure, have this conversation with your stakeholders: &amp;ldquo;When you say real-time, what do you actually mean?&amp;rdquo;&lt;/p>
&lt;p>In my experience, the answers fall into three categories:&lt;/p>
&lt;p>&lt;strong>&amp;ldquo;I want data from today, not yesterday.&amp;rdquo;&lt;/strong> That&amp;rsquo;s daily batch with a morning refresh. You already have this. Maybe you need to move the refresh earlier.&lt;/p>
&lt;p>&lt;strong>&amp;ldquo;I want data that&amp;rsquo;s at most an hour old.&amp;rdquo;&lt;/strong> That&amp;rsquo;s micro-batch — running your existing pipeline every 15–60 minutes instead of once a day. No new infrastructure required. Your existing SQL, dbt, and Airflow tools work unchanged at shorter intervals.&lt;/p>
&lt;p>&lt;strong>&amp;ldquo;I need to see events within seconds of them happening.&amp;rdquo;&lt;/strong> &lt;em>This&lt;/em> is actual real-time. And it&amp;rsquo;s genuinely rare. Fraud detection, safety monitoring, real-time bidding — these need streaming. Your internal sales dashboard almost certainly does not.&lt;/p>
&lt;p>For the micro-batch pattern in Airflow, it&amp;rsquo;s a scheduling change:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">with&lt;/span> DAG(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#39;micro_batch_analytics&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> schedule_interval&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;*/15 * * * *&amp;#39;&lt;/span>, &lt;span style="color:#75715e"># every 15 minutes&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> catchup&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#66d9ef">False&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> max_active_runs&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">1&lt;/span>, &lt;span style="color:#75715e"># prevent overlap&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) &lt;span style="color:#66d9ef">as&lt;/span> dag:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#75715e"># ...tasks here&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>max_active_runs=1&lt;/code> is important. Without it, if a run takes longer than 15 minutes, the next run starts before the previous one finishes. They compete for the same warehouse, both run slower, and you&amp;rsquo;ve created the resource contention problem from the shifting-right section.&lt;/p>
&lt;p>In Snowflake, pair this with Snowpipe for continuous S3 ingestion:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Create a pipe for continuous loading from S3
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">CREATE&lt;/span> &lt;span style="color:#66d9ef">OR&lt;/span> &lt;span style="color:#66d9ef">REPLACE&lt;/span> PIPE raw.orders_pipe
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> AUTO_INGEST &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">TRUE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AS&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">COPY&lt;/span> &lt;span style="color:#66d9ef">INTO&lt;/span> raw.orders_stream
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">FROM&lt;/span> &lt;span style="color:#f92672">@&lt;/span>raw.s3_orders_stage
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> FILE_FORMAT &lt;span style="color:#f92672">=&lt;/span> (&lt;span style="color:#66d9ef">TYPE&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">&amp;#39;PARQUET&amp;#39;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> MATCH_BY_COLUMN_NAME &lt;span style="color:#f92672">=&lt;/span> CASE_INSENSITIVE;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Snowpipe loads files within minutes of them landing in S3. Your micro-batch dbt pipeline then transforms whatever arrived since the last run. The combination delivers data freshness measured in minutes — not seconds, but close enough for the vast majority of business use cases — without any streaming infrastructure.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="measure-time-to-consumer-not-just-pipeline-runtime">Measure time-to-consumer, not just pipeline runtime&lt;/h3>
&lt;br>
&lt;p>Pipeline runtime tells you how long your DAG takes. Time-to-consumer tells you how long a business event takes to reach a human decision-maker. These are different numbers, and the second one is what your stakeholders actually care about.&lt;/p>
&lt;p>Time-to-consumer decomposes into stages:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Source extraction delay&lt;/strong>: Time between the event occurring and the data landing in your lake&lt;/li>
&lt;li>&lt;strong>Pipeline processing time&lt;/strong>: Your DAG runtime (the part you control most directly)&lt;/li>
&lt;li>&lt;strong>Warehouse serving time&lt;/strong>: Query execution time when a dashboard or report reads the data&lt;/li>
&lt;li>&lt;strong>Cache/refresh delay&lt;/strong>: How often the BI tool refreshes its cache&lt;/li>
&lt;/ul>
&lt;p>Track it by stamping timestamps at each handoff:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Build a freshness tracking model in dbt
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- mart_data_freshness.sql
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#39;fct_orders&amp;#39;&lt;/span> &lt;span style="color:#66d9ef">AS&lt;/span> model_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">MAX&lt;/span>(order_timestamp) &lt;span style="color:#66d9ef">AS&lt;/span> latest_source_event,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">MAX&lt;/span>(_loaded_at) &lt;span style="color:#66d9ef">AS&lt;/span> latest_load_time,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">CURRENT_TIMESTAMP&lt;/span>() &lt;span style="color:#66d9ef">AS&lt;/span> measured_at,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> TIMESTAMPDIFF(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#39;minute&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">MAX&lt;/span>(order_timestamp),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">CURRENT_TIMESTAMP&lt;/span>()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) &lt;span style="color:#66d9ef">AS&lt;/span> minutes_since_latest_event,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> TIMESTAMPDIFF(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#39;minute&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">MAX&lt;/span>(_loaded_at),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">CURRENT_TIMESTAMP&lt;/span>()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) &lt;span style="color:#66d9ef">AS&lt;/span> minutes_since_latest_load
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#960050;background-color:#1e0010">{{&lt;/span> &lt;span style="color:#66d9ef">ref&lt;/span>(&lt;span style="color:#e6db74">&amp;#39;fct_orders&amp;#39;&lt;/span>) &lt;span style="color:#960050;background-color:#1e0010">}}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Schedule this model to run after every pipeline completion. Over time, you&amp;rsquo;ll see whether &lt;code>minutes_since_latest_event&lt;/code> is stable, improving, or drifting. If it&amp;rsquo;s drifting, you can pinpoint which stage is responsible by comparing the event timestamp, load timestamp, and measurement timestamp.&lt;/p>
&lt;p>In dbt, you can also use source freshness tests to alert when upstream data is stale:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># models/staging/_sources.yml&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">sources&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#f92672">name&lt;/span>: &lt;span style="color:#ae81ff">raw&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">database&lt;/span>: &lt;span style="color:#ae81ff">raw_db&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">schema&lt;/span>: &lt;span style="color:#ae81ff">public&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">loaded_at_field&lt;/span>: &lt;span style="color:#ae81ff">_loaded_at&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">freshness&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">warn_after&lt;/span>: { &lt;span style="color:#f92672">count: 2, period&lt;/span>: &lt;span style="color:#ae81ff">hour }&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">error_after&lt;/span>: { &lt;span style="color:#f92672">count: 6, period&lt;/span>: &lt;span style="color:#ae81ff">hour }&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">tables&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#f92672">name&lt;/span>: &lt;span style="color:#ae81ff">orders&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#f92672">name&lt;/span>: &lt;span style="color:#ae81ff">customers&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#f92672">name&lt;/span>: &lt;span style="color:#ae81ff">events&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Running &lt;code>dbt source freshness&lt;/code> checks whether the upstream data is as fresh as you expect. If &lt;code>raw.orders&lt;/code> hasn&amp;rsquo;t received new rows in 6 hours, the test fails — and that&amp;rsquo;s a signal that the source system&amp;rsquo;s extract is delayed, not that your pipeline is broken.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="the-view-on-view-problem-the-compound-interest-of-bad-performance">The view-on-view problem: the compound interest of bad performance&lt;/h3>
&lt;br>
&lt;p>This is the architectural anti-pattern I see most often in Snowflake environments, and it&amp;rsquo;s one of those problems that starts small and compounds quietly.&lt;/p>
&lt;p>A view in Snowflake doesn&amp;rsquo;t store any data — it re-executes its SQL every time it&amp;rsquo;s queried. That&amp;rsquo;s fine for a simple view. But when View C references View B, which references View A, which scans a large table — every query against View C triggers the entire chain from scratch.&lt;/p>
&lt;p>I&amp;rsquo;ve seen environments where analysts querying a &amp;ldquo;simple&amp;rdquo; dashboard view were unknowingly triggering five layers of nested views, each with its own JOINs and aggregations. The query took 4 minutes. After materialising the two most expensive intermediate layers as tables (refreshed daily), the same query took 3 seconds.&lt;/p>
&lt;p>The fix is to audit your materialisation strategy:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Find views that reference other views (potential nesting)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> referencing_object_name &lt;span style="color:#66d9ef">AS&lt;/span> downstream_view,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> referenced_object_name &lt;span style="color:#66d9ef">AS&lt;/span> upstream_object,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> referenced_object_type
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> snowflake.account_usage.object_dependencies
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> referencing_object_type &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">&amp;#39;VIEW&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> referenced_object_type &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">&amp;#39;VIEW&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> downstream_view;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>If this query returns results, you have view-on-view nesting. Not all of it is bad — a simple renaming view on top of a complex view is fine. But if you find three or more layers, or if any of the intermediate views involve heavy JOINs or aggregations, materialise the expensive layers as tables.&lt;/p>
&lt;p>In dbt terms, the decision framework is:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Ephemeral&lt;/strong>: Simple CTEs, column renaming, type casting. Zero cost.&lt;/li>
&lt;li>&lt;strong>View&lt;/strong>: Light transformations queried infrequently by a few consumers.&lt;/li>
&lt;li>&lt;strong>Table&lt;/strong>: Complex transformations queried frequently. Rebuilt every run.&lt;/li>
&lt;li>&lt;strong>Incremental&lt;/strong>: Large tables where only a fraction of rows change. The right choice for most fact tables.&lt;/li>
&lt;/ul>
&lt;p>If a model takes more than 30 seconds to build and is queried more than once between builds, it should be a table or incremental — not a view.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="coming-back-to-priyas-scatter-plot">Coming back to Priya&amp;rsquo;s scatter plot&lt;/h3>
&lt;br>
&lt;p>Six weeks after Priya showed me that graph, we&amp;rsquo;d fixed the three root causes: an intermediate model that had grown from 5 million to 180 million rows without being switched to incremental, a phantom dependency chain that added 12 minutes of serial wait time, and a warehouse that was being shared between transformation and BI queries during the morning refresh window.&lt;/p>
&lt;p>Pipeline completion moved from 7:23 AM back to 5:15 AM. The finance team got their dashboards before their morning coffee. And Priya? She built a monitoring dashboard that tracked completion time drift automatically — the thing I should have built from the start.&lt;/p>
&lt;p>Here&amp;rsquo;s what I learned from that experience, and what I hope you take from this article: &lt;strong>pipeline performance isn&amp;rsquo;t a problem you solve once.&lt;/strong> It&amp;rsquo;s a trajectory you manage. Data grows. Teams add models. Source systems change. The pipeline that&amp;rsquo;s fast today will be slow in six months unless someone is watching the trend line.&lt;/p>
&lt;p>The engineers who build reliable data platforms aren&amp;rsquo;t the ones who build the fastest pipeline on day one. They&amp;rsquo;re the ones who notice when completion time drifts by 3 minutes per week — and fix it before anyone else has to ask.&lt;/p>
&lt;p>Track your completion times. Audit your dependencies quarterly. Profile your critical path after every significant change. And teach your team that a green DAG doesn&amp;rsquo;t mean a healthy pipeline. Sometimes green just means it hasn&amp;rsquo;t failed &lt;em>yet&lt;/em>.&lt;/p>
&lt;p>&lt;br>&lt;br>&lt;/p></content:encoded><category>Data Engineering</category><category>Cloud Architecture</category><category>Snowflake</category><category>AWS</category><category>Airflow</category><category>Pipeline Optimization</category><category>dbt</category><category>Data Freshness</category></item></channel></rss>