<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Data Platform on Ghost in the data</title><link>https://ghostinthedata.info/tags/data-platform/</link><description>Ghost in the data</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>Ghost in the data</copyright><lastBuildDate>Sat, 25 Apr 2026 09:00:00 +1100</lastBuildDate><atom:link href="https://ghostinthedata.info/tags/data-platform/index.xml" rel="self" type="application/rss+xml"/><item><title>Your Data Platform Costs More Than It Should</title><link>https://ghostinthedata.info/posts/2026/2026-04-25-cost-management/</link><pubDate>Sat, 25 Apr 2026 09:00:00 +1100</pubDate><guid>https://ghostinthedata.info/posts/2026/2026-04-25-cost-management/</guid><author>Chris Hillman</author><description>A practical guide to understanding, measuring, and reducing your Snowflake and AWS data platform costs — starting with the habits that actually move the needle.</description><content:encoded>&lt;p>Let me tell you about the moment I stopped treating cloud costs as someone else&amp;rsquo;s problem.&lt;/p>
&lt;p>We were three months into a Snowflake migration. Everything was humming. Pipelines were green, dashboards were fast, the analytics team was happier than I&amp;rsquo;d seen them before. I felt good about the work we&amp;rsquo;d done.&lt;/p>
&lt;p>Then finance forwarded me the invoice.&lt;/p>
&lt;p>The number wasn&amp;rsquo;t catastrophic. But it was significantly higher than what we&amp;rsquo;d budgeted, and when I started digging, I couldn&amp;rsquo;t explain where most of it was going. I knew we had warehouses running. I knew we had pipelines executing. But I couldn&amp;rsquo;t tell you which warehouse was responsible for what cost, which pipelines were the expensive ones, or whether the money was well spent. I had built a platform I was proud of — and I had no idea what it actually cost to operate.&lt;/p>
&lt;p>That&amp;rsquo;s the moment that changed how I think about data engineering. Not because of the dollar amount, but because of the realisation underneath it: &lt;strong>I had built something I couldn&amp;rsquo;t explain to the people paying for it.&lt;/strong> And if I couldn&amp;rsquo;t explain it, I couldn&amp;rsquo;t defend it. And if I couldn&amp;rsquo;t defend it, someone else would make the decisions for me — someone who didn&amp;rsquo;t understand why the platform mattered.&lt;/p>
&lt;br>
&lt;p>I&amp;rsquo;m telling you this because cost management is one of those things that sounds like a finance problem until you experience the consequences firsthand. It&amp;rsquo;s not about being cheap. It&amp;rsquo;s about being intentional. It&amp;rsquo;s about knowing that every credit you spend is buying something valuable — and being able to prove it when someone asks.&lt;/p>
&lt;p>The data engineers who understand their costs don&amp;rsquo;t just save money. They earn trust. They get budget for the projects that matter. They sleep better because they&amp;rsquo;ve eliminated the waste that eventually becomes someone else&amp;rsquo;s excuse to cut headcount or freeze hiring.&lt;/p>
&lt;p>This article is about building that understanding. Not with a vendor&amp;rsquo;s optimisation tool or a consultant&amp;rsquo;s audit — but with the habits, queries, and mental models that let you own your platform&amp;rsquo;s economics from the inside. Everything here is grounded in Snowflake and AWS, with specific code you can run today.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="you-cant-optimise-what-you-cant-see">You can&amp;rsquo;t optimise what you can&amp;rsquo;t see&lt;/h3>
&lt;br>
&lt;p>Before you touch a single warehouse configuration, you need to answer one question: &lt;strong>where is the money going?&lt;/strong>&lt;/p>
&lt;p>Most teams skip this step. They read a blog post about auto-suspend settings, change a few defaults, and call it optimisation. That&amp;rsquo;s like going on a diet by switching to diet soda while eating three pizzas a day. The soda wasn&amp;rsquo;t the problem.&lt;/p>
&lt;p>Here&amp;rsquo;s the query I run first on every Snowflake environment I touch. It tells you which warehouses are consuming the most credits over the last 30 days:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> warehouse_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SUM&lt;/span>(credits_used) &lt;span style="color:#66d9ef">AS&lt;/span> total_credits,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SUM&lt;/span>(credits_used) &lt;span style="color:#f92672">*&lt;/span> &lt;span style="color:#ae81ff">3&lt;/span>.&lt;span style="color:#ae81ff">00&lt;/span> &lt;span style="color:#66d9ef">AS&lt;/span> estimated_cost_usd, &lt;span style="color:#75715e">-- adjust your credit price
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span> &lt;span style="color:#66d9ef">COUNT&lt;/span>(&lt;span style="color:#66d9ef">DISTINCT&lt;/span> DATE_TRUNC(&lt;span style="color:#e6db74">&amp;#39;day&amp;#39;&lt;/span>, start_time)) &lt;span style="color:#66d9ef">AS&lt;/span> active_days,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ROUND(&lt;span style="color:#66d9ef">SUM&lt;/span>(credits_used) &lt;span style="color:#f92672">/&lt;/span> &lt;span style="color:#66d9ef">COUNT&lt;/span>(&lt;span style="color:#66d9ef">DISTINCT&lt;/span> DATE_TRUNC(&lt;span style="color:#e6db74">&amp;#39;day&amp;#39;&lt;/span>, start_time)), &lt;span style="color:#ae81ff">2&lt;/span>) &lt;span style="color:#66d9ef">AS&lt;/span> credits_per_day
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span> snowflake.account_usage.warehouse_metering_history
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span> start_time &lt;span style="color:#f92672">&amp;gt;=&lt;/span> DATEADD(&lt;span style="color:#e6db74">&amp;#39;day&amp;#39;&lt;/span>, &lt;span style="color:#f92672">-&lt;/span>&lt;span style="color:#ae81ff">30&lt;/span>, &lt;span style="color:#66d9ef">CURRENT_TIMESTAMP&lt;/span>())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">GROUP&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span> warehouse_name
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span> total_credits &lt;span style="color:#66d9ef">DESC&lt;/span>;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Run that. Right now. I&amp;rsquo;ll wait.&lt;/p>
&lt;p>If you&amp;rsquo;re like most teams, 70–80% of your credits come from two or three warehouses. That&amp;rsquo;s your starting point. Not everything — just the expensive stuff.&lt;/p>
&lt;br>
&lt;p>Now do the same thing for queries. This one finds your top 20 most expensive queries by bytes scanned:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> query_id,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> query_text,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> warehouse_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> user_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_elapsed_time &lt;span style="color:#f92672">/&lt;/span> &lt;span style="color:#ae81ff">1000&lt;/span> &lt;span style="color:#66d9ef">AS&lt;/span> elapsed_seconds,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> bytes_scanned &lt;span style="color:#f92672">/&lt;/span> POWER(&lt;span style="color:#ae81ff">1024&lt;/span>, &lt;span style="color:#ae81ff">3&lt;/span>) &lt;span style="color:#66d9ef">AS&lt;/span> gb_scanned,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> partitions_scanned,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> partitions_total,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ROUND(partitions_scanned &lt;span style="color:#f92672">/&lt;/span> &lt;span style="color:#66d9ef">NULLIF&lt;/span>(partitions_total, &lt;span style="color:#ae81ff">0&lt;/span>) &lt;span style="color:#f92672">*&lt;/span> &lt;span style="color:#ae81ff">100&lt;/span>, &lt;span style="color:#ae81ff">1&lt;/span>) &lt;span style="color:#66d9ef">AS&lt;/span> pct_partitions_scanned
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> snowflake.account_usage.query_history
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> start_time &lt;span style="color:#f92672">&amp;gt;=&lt;/span> DATEADD(&lt;span style="color:#e6db74">&amp;#39;day&amp;#39;&lt;/span>, &lt;span style="color:#f92672">-&lt;/span>&lt;span style="color:#ae81ff">7&lt;/span>, &lt;span style="color:#66d9ef">CURRENT_TIMESTAMP&lt;/span>())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> bytes_scanned &lt;span style="color:#f92672">&amp;gt;&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> bytes_scanned &lt;span style="color:#66d9ef">DESC&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">LIMIT&lt;/span> &lt;span style="color:#ae81ff">20&lt;/span>;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Pay attention to that &lt;code>pct_partitions_scanned&lt;/code> column. If you see queries scanning 90–100% of a table&amp;rsquo;s partitions, those queries aren&amp;rsquo;t benefiting from clustering or partition pruning. That&amp;rsquo;s where the big wins hide.&lt;/p>
&lt;p>This 15-minute exercise — top warehouses, top queries — tells you more about your cost profile than any dashboard. It&amp;rsquo;s the equivalent of checking your bank statement before creating a budget. Obvious in hindsight. Almost nobody does it.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="the-idle-tax-is-your-single-biggest-waste">The idle tax is your single biggest waste&lt;/h3>
&lt;br>
&lt;p>Here&amp;rsquo;s what nobody tells you about Snowflake billing: you pay for compute in 60-second minimums. Every time a warehouse resumes from suspension, you&amp;rsquo;re billed for at least one minute — regardless of whether the query takes 2 seconds or 58 seconds.&lt;/p>
&lt;p>This matters more than it sounds. Picture a BI tool like Metabase or Tableau hitting your warehouse with 20 small metadata queries over 15 minutes. If your warehouse auto-suspends after 5 minutes (Snowflake&amp;rsquo;s default for many setups), it might suspend and resume multiple times during that window. Each resume triggers another 60-second charge.&lt;/p>
&lt;p>Twenty queries that take 3 seconds each? That&amp;rsquo;s 60 seconds of actual compute. But if the warehouse suspends and resumes 4 times, you&amp;rsquo;re billed for 240 seconds. A 4x overhead.&lt;/p>
&lt;p>The fix is straightforward but requires thought:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- For transformation warehouses (predictable, bursty workloads)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">ALTER&lt;/span> WAREHOUSE transform_wh
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SET&lt;/span> AUTO_SUSPEND &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#ae81ff">60&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> AUTO_RESUME &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">TRUE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> WAREHOUSE_SIZE &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">&amp;#39;MEDIUM&amp;#39;&lt;/span>;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- For BI/dashboard warehouses (frequent small queries)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">ALTER&lt;/span> WAREHOUSE bi_wh
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SET&lt;/span> AUTO_SUSPEND &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#ae81ff">300&lt;/span> &lt;span style="color:#75715e">-- 5 min keeps it warm between dashboard interactions
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span> AUTO_RESUME &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">TRUE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> WAREHOUSE_SIZE &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">&amp;#39;SMALL&amp;#39;&lt;/span>;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- For ad-hoc/analyst warehouses (unpredictable, intermittent)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">ALTER&lt;/span> WAREHOUSE adhoc_wh
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SET&lt;/span> AUTO_SUSPEND &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#ae81ff">60&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> AUTO_RESUME &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">TRUE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> WAREHOUSE_SIZE &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">&amp;#39;XSMALL&amp;#39;&lt;/span>;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The principle: &lt;strong>transformation warehouses should suspend aggressively&lt;/strong> because they run in defined windows with gaps between runs. BI warehouses should stay warm a bit longer because dashboard users generate clusters of queries with short pauses in between. Ad-hoc warehouses should be small and aggressive — analysts can tolerate a 1–2 second resume delay.&lt;/p>
&lt;p>Here&amp;rsquo;s how to find warehouses that are running but idle — the silent cost killer:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> warehouse_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SUM&lt;/span>(credits_used) &lt;span style="color:#66d9ef">AS&lt;/span> total_credits,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SUM&lt;/span>(credits_used_compute) &lt;span style="color:#66d9ef">AS&lt;/span> compute_credits,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SUM&lt;/span>(credits_used_cloud_services) &lt;span style="color:#66d9ef">AS&lt;/span> cloud_credits,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ROUND(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (&lt;span style="color:#66d9ef">SUM&lt;/span>(credits_used) &lt;span style="color:#f92672">-&lt;/span> &lt;span style="color:#66d9ef">SUM&lt;/span>(credits_used_compute))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">/&lt;/span> &lt;span style="color:#66d9ef">NULLIF&lt;/span>(&lt;span style="color:#66d9ef">SUM&lt;/span>(credits_used), &lt;span style="color:#ae81ff">0&lt;/span>) &lt;span style="color:#f92672">*&lt;/span> &lt;span style="color:#ae81ff">100&lt;/span>, &lt;span style="color:#ae81ff">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) &lt;span style="color:#66d9ef">AS&lt;/span> pct_idle_cost
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> snowflake.account_usage.warehouse_metering_history
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> start_time &lt;span style="color:#f92672">&amp;gt;=&lt;/span> DATEADD(&lt;span style="color:#e6db74">&amp;#39;day&amp;#39;&lt;/span>, &lt;span style="color:#f92672">-&lt;/span>&lt;span style="color:#ae81ff">30&lt;/span>, &lt;span style="color:#66d9ef">CURRENT_TIMESTAMP&lt;/span>())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">GROUP&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> warehouse_name
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">HAVING&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pct_idle_cost &lt;span style="color:#f92672">&amp;gt;&lt;/span> &lt;span style="color:#ae81ff">20&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_credits &lt;span style="color:#66d9ef">DESC&lt;/span>;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>If &lt;code>pct_idle_cost&lt;/code> is above 20% on any warehouse, you&amp;rsquo;re burning money on idle time. Tighten the auto-suspend, or investigate what&amp;rsquo;s keeping the warehouse awake between queries.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="the-most-expensive-query-is-the-one-that-scans-everything">The most expensive query is the one that scans everything&lt;/h3>
&lt;br>
&lt;p>Snowflake charges you for the compute time your queries consume, and nothing drives compute time like full table scans. The two most common culprits are queries that wrap filter columns in functions, and queries that select more columns than they need.&lt;/p>
&lt;p>Here&amp;rsquo;s what I mean. This query looks innocent:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Expensive: function on the filter column disables pruning
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> order_id, customer_id, total_amount
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> analytics.fct_orders
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> DATE(order_timestamp) &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">&amp;#39;2026-03-01&amp;#39;&lt;/span>;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>But it&amp;rsquo;s quietly terrible. Wrapping &lt;code>order_timestamp&lt;/code> in &lt;code>DATE()&lt;/code> forces Snowflake to evaluate every row before filtering. The query planner can&amp;rsquo;t use micro-partition metadata to skip irrelevant partitions.&lt;/p>
&lt;p>This version does the same thing, but lets Snowflake prune:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Cheap: range filter on raw column enables pruning
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> order_id, customer_id, total_amount
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> analytics.fct_orders
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> order_timestamp &lt;span style="color:#f92672">&amp;gt;=&lt;/span> &lt;span style="color:#e6db74">&amp;#39;2026-03-01&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> order_timestamp &lt;span style="color:#f92672">&amp;lt;&lt;/span> &lt;span style="color:#e6db74">&amp;#39;2026-03-02&amp;#39;&lt;/span>;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The difference can be enormous on large tables — I&amp;rsquo;ve seen this single change reduce bytes scanned by 95% on billion-row fact tables.&lt;/p>
&lt;p>The other silent killer is &lt;code>SELECT *&lt;/code> in intermediate transformations. In a dbt project, I often find staging models that look like this:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- stg_orders.sql (before)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">SELECT&lt;/span> &lt;span style="color:#f92672">*&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#960050;background-color:#1e0010">{{&lt;/span> &lt;span style="color:#66d9ef">source&lt;/span>(&lt;span style="color:#e6db74">&amp;#39;raw&amp;#39;&lt;/span>, &lt;span style="color:#e6db74">&amp;#39;orders&amp;#39;&lt;/span>) &lt;span style="color:#960050;background-color:#1e0010">}}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> _loaded_at &lt;span style="color:#f92672">&amp;gt;&lt;/span> (&lt;span style="color:#66d9ef">SELECT&lt;/span> &lt;span style="color:#66d9ef">MAX&lt;/span>(_loaded_at) &lt;span style="color:#66d9ef">FROM&lt;/span> &lt;span style="color:#960050;background-color:#1e0010">{{&lt;/span> this &lt;span style="color:#960050;background-color:#1e0010">}}&lt;/span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>That &lt;code>SELECT *&lt;/code> pulls every column from the source — including columns nobody downstream ever uses. In a columnar store like Snowflake, you only pay to scan the columns you reference. Trimming to the columns you actually need is free performance:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- stg_orders.sql (after)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> order_id,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> customer_id,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> order_timestamp,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_amount,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> status,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> _loaded_at
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#960050;background-color:#1e0010">{{&lt;/span> &lt;span style="color:#66d9ef">source&lt;/span>(&lt;span style="color:#e6db74">&amp;#39;raw&amp;#39;&lt;/span>, &lt;span style="color:#e6db74">&amp;#39;orders&amp;#39;&lt;/span>) &lt;span style="color:#960050;background-color:#1e0010">}}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> _loaded_at &lt;span style="color:#f92672">&amp;gt;&lt;/span> (&lt;span style="color:#66d9ef">SELECT&lt;/span> &lt;span style="color:#66d9ef">MAX&lt;/span>(_loaded_at) &lt;span style="color:#66d9ef">FROM&lt;/span> &lt;span style="color:#960050;background-color:#1e0010">{{&lt;/span> this &lt;span style="color:#960050;background-color:#1e0010">}}&lt;/span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This isn&amp;rsquo;t premature optimisation. It&amp;rsquo;s hygiene. Every &lt;code>SELECT *&lt;/code> in your transformation layer is a small tax you pay on every single run.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="full-refreshes-are-the-most-expensive-default-in-data-engineering">Full refreshes are the most expensive default in data engineering&lt;/h3>
&lt;br>
&lt;p>If you&amp;rsquo;re using dbt — and most teams on Snowflake are — the default materialisation is either &lt;code>view&lt;/code> or &lt;code>table&lt;/code>. Both have the same problem at scale: they rebuild everything, every time.&lt;/p>
&lt;p>A table materialisation on a 500-million-row fact table means Snowflake reads, transforms, and writes 500 million rows every run. Even if only 50,000 rows changed since yesterday. That&amp;rsquo;s a 10,000x overhead.&lt;/p>
&lt;p>Switching to incremental materialisation is the single highest-impact cost change most teams can make:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- fct_orders.sql
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#960050;background-color:#1e0010">{{&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> config(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> materialized&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;incremental&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> unique_key&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;order_id&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> incremental_strategy&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;merge&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> on_schema_change&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;append_new_columns&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#960050;background-color:#1e0010">}}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> order_id,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> customer_id,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> order_timestamp,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_amount,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> status,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> updated_at
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#960050;background-color:#1e0010">{{&lt;/span> &lt;span style="color:#66d9ef">ref&lt;/span>(&lt;span style="color:#e6db74">&amp;#39;stg_orders&amp;#39;&lt;/span>) &lt;span style="color:#960050;background-color:#1e0010">}}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#960050;background-color:#1e0010">{&lt;/span>&lt;span style="color:#f92672">%&lt;/span> &lt;span style="color:#66d9ef">if&lt;/span> is_incremental() &lt;span style="color:#f92672">%&lt;/span>&lt;span style="color:#960050;background-color:#1e0010">}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span> updated_at &lt;span style="color:#f92672">&amp;gt;&lt;/span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SELECT&lt;/span> DATEADD(&lt;span style="color:#e6db74">&amp;#39;hour&amp;#39;&lt;/span>, &lt;span style="color:#f92672">-&lt;/span>&lt;span style="color:#ae81ff">3&lt;/span>, &lt;span style="color:#66d9ef">MAX&lt;/span>(updated_at))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">FROM&lt;/span> &lt;span style="color:#960050;background-color:#1e0010">{{&lt;/span> this &lt;span style="color:#960050;background-color:#1e0010">}}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#960050;background-color:#1e0010">{&lt;/span>&lt;span style="color:#f92672">%&lt;/span> endif &lt;span style="color:#f92672">%&lt;/span>&lt;span style="color:#960050;background-color:#1e0010">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>A few things worth noting here. The &lt;code>DATEADD('hour', -3, ...)&lt;/code> creates a 3-hour lookback window. This catches late-arriving data and handles clock skew between source systems. Without it, you&amp;rsquo;ll miss rows that arrive slightly out of order — and your numbers will silently drift.&lt;/p>
&lt;p>The &lt;code>unique_key&lt;/code> with &lt;code>incremental_strategy='merge'&lt;/code> means Snowflake will update existing rows and insert new ones. This is essential for tables where source records get modified after initial load (order status changes, for example).&lt;/p>
&lt;p>The rule of thumb: &lt;strong>if a table has more than a million rows and less than 20% changes per run, make it incremental.&lt;/strong> The cost savings are usually 80–95% on that model&amp;rsquo;s compute.&lt;/p>
&lt;p>But — and this is important — schedule a periodic full refresh to correct any drift. I typically set up a weekly &lt;code>dbt build --full-refresh --select fct_orders&lt;/code> via a separate Airflow DAG or GitHub Actions workflow. Belt and suspenders.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="the-hidden-costs-your-snowflake-dashboard-wont-show-you">The hidden costs your Snowflake dashboard won&amp;rsquo;t show you&lt;/h3>
&lt;br>
&lt;p>Most teams monitor warehouse credits because that&amp;rsquo;s what the Snowflake UI makes visible. But there are cost vectors that don&amp;rsquo;t show up in the obvious places.&lt;/p>
&lt;p>&lt;strong>Cloud services credits&lt;/strong> accrue when queries use Snowflake&amp;rsquo;s coordination layer — query compilation, metadata operations, result set caching. Normally this is covered by a 10% &amp;ldquo;free&amp;rdquo; adjustment against your compute credits. But if you have BI tools making thousands of small metadata queries, cloud services can exceed that adjustment and start costing real money.&lt;/p>
&lt;p>Find them:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> DATE_TRUNC(&lt;span style="color:#e6db74">&amp;#39;day&amp;#39;&lt;/span>, usage_date) &lt;span style="color:#66d9ef">AS&lt;/span> &lt;span style="color:#66d9ef">day&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SUM&lt;/span>(credits_used) &lt;span style="color:#66d9ef">AS&lt;/span> total_credits,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SUM&lt;/span>(credits_adjustment_cloud_services) &lt;span style="color:#66d9ef">AS&lt;/span> cloud_services_adjustment,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">CASE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">WHEN&lt;/span> &lt;span style="color:#66d9ef">SUM&lt;/span>(credits_adjustment_cloud_services) &lt;span style="color:#f92672">&amp;lt;&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">THEN&lt;/span> &lt;span style="color:#66d9ef">ABS&lt;/span>(&lt;span style="color:#66d9ef">SUM&lt;/span>(credits_adjustment_cloud_services))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">ELSE&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">END&lt;/span> &lt;span style="color:#66d9ef">AS&lt;/span> excess_cloud_services_cost
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> snowflake.account_usage.metering_daily_history
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> usage_date &lt;span style="color:#f92672">&amp;gt;=&lt;/span> DATEADD(&lt;span style="color:#e6db74">&amp;#39;day&amp;#39;&lt;/span>, &lt;span style="color:#f92672">-&lt;/span>&lt;span style="color:#ae81ff">30&lt;/span>, &lt;span style="color:#66d9ef">CURRENT_DATE&lt;/span>())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">GROUP&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">day&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">day&lt;/span> &lt;span style="color:#66d9ef">DESC&lt;/span>;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>Serverless feature credits&lt;/strong> — Snowpipe, automatic clustering, materialized view maintenance, search optimisation — all consume credits outside your warehouse billing. They don&amp;rsquo;t show up in &lt;code>warehouse_metering_history&lt;/code>. You need to check separately:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Snowpipe costs
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pipe_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SUM&lt;/span>(credits_used) &lt;span style="color:#66d9ef">AS&lt;/span> total_credits
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> snowflake.account_usage.pipe_usage_history
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> start_time &lt;span style="color:#f92672">&amp;gt;=&lt;/span> DATEADD(&lt;span style="color:#e6db74">&amp;#39;day&amp;#39;&lt;/span>, &lt;span style="color:#f92672">-&lt;/span>&lt;span style="color:#ae81ff">30&lt;/span>, &lt;span style="color:#66d9ef">CURRENT_TIMESTAMP&lt;/span>())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">GROUP&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pipe_name
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_credits &lt;span style="color:#66d9ef">DESC&lt;/span>;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Automatic clustering costs
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">table_name&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SUM&lt;/span>(credits_used) &lt;span style="color:#66d9ef">AS&lt;/span> total_credits
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> snowflake.account_usage.automatic_clustering_history
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> start_time &lt;span style="color:#f92672">&amp;gt;=&lt;/span> DATEADD(&lt;span style="color:#e6db74">&amp;#39;day&amp;#39;&lt;/span>, &lt;span style="color:#f92672">-&lt;/span>&lt;span style="color:#ae81ff">30&lt;/span>, &lt;span style="color:#66d9ef">CURRENT_TIMESTAMP&lt;/span>())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">GROUP&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">table_name&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_credits &lt;span style="color:#66d9ef">DESC&lt;/span>;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>On the AWS side, the sneaky costs tend to be data transfer. Moving data between S3 regions, or from S3 out to the internet, adds up quietly. If your Snowflake account is in &lt;code>us-east-1&lt;/code> but your S3 landing zone is in &lt;code>ap-southeast-2&lt;/code>, every byte of ingestion carries a cross-region transfer charge. Check your AWS Cost Explorer with the &amp;ldquo;Data Transfer&amp;rdquo; service filter — the numbers are often surprising.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="tag-everything-or-youll-optimise-blind">Tag everything or you&amp;rsquo;ll optimise blind&lt;/h3>
&lt;br>
&lt;p>You cannot reduce costs you can&amp;rsquo;t attribute. The simplest and most underused tool in Snowflake is the query tag — a piece of metadata you attach to every query that tells you &lt;em>what&lt;/em> generated it.&lt;/p>
&lt;p>If you&amp;rsquo;re using dbt, this takes about five minutes to set up. Create a macro:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- macros/set_query_tag.sql
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#960050;background-color:#1e0010">{&lt;/span>&lt;span style="color:#f92672">%&lt;/span> macro set_query_tag() &lt;span style="color:#f92672">%&lt;/span>&lt;span style="color:#960050;background-color:#1e0010">}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#960050;background-color:#1e0010">{&lt;/span>&lt;span style="color:#f92672">%&lt;/span> &lt;span style="color:#66d9ef">set&lt;/span> query_tag &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#960050;background-color:#1e0010">{&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;dbt_model&amp;#34;&lt;/span>: model.name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;dbt_schema&amp;#34;&lt;/span>: model.&lt;span style="color:#66d9ef">schema&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;dbt_materialized&amp;#34;&lt;/span>: model.config.materialized,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;dbt_invocation_id&amp;#34;&lt;/span>: invocation_id,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;environment&amp;#34;&lt;/span>: target.name
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#960050;background-color:#1e0010">}&lt;/span> &lt;span style="color:#f92672">%&lt;/span>&lt;span style="color:#960050;background-color:#1e0010">}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#960050;background-color:#1e0010">{&lt;/span>&lt;span style="color:#f92672">%&lt;/span> &lt;span style="color:#66d9ef">do&lt;/span> run_query(&lt;span style="color:#e6db74">&amp;#34;ALTER SESSION SET QUERY_TAG = &amp;#39;{}&amp;#39;&amp;#34;&lt;/span>.format(query_tag &lt;span style="color:#f92672">|&lt;/span> tojson)) &lt;span style="color:#f92672">%&lt;/span>&lt;span style="color:#960050;background-color:#1e0010">}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#960050;background-color:#1e0010">{&lt;/span>&lt;span style="color:#f92672">%&lt;/span> endmacro &lt;span style="color:#f92672">%&lt;/span>&lt;span style="color:#960050;background-color:#1e0010">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Then add it as a pre-hook in your &lt;code>dbt_project.yml&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># dbt_project.yml&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">models&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">your_project&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">+pre-hook&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;{{ set_query_tag() }}&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now every query dbt runs is tagged with the model name, materialisation type, and environment. You can query cost by model:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> PARSE_JSON(query_tag):dbt_model::STRING &lt;span style="color:#66d9ef">AS&lt;/span> model_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> PARSE_JSON(query_tag):dbt_materialized::STRING &lt;span style="color:#66d9ef">AS&lt;/span> materialization,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">COUNT&lt;/span>(&lt;span style="color:#f92672">*&lt;/span>) &lt;span style="color:#66d9ef">AS&lt;/span> query_count,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SUM&lt;/span>(total_elapsed_time) &lt;span style="color:#f92672">/&lt;/span> &lt;span style="color:#ae81ff">1000&lt;/span> &lt;span style="color:#66d9ef">AS&lt;/span> total_seconds,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SUM&lt;/span>(bytes_scanned) &lt;span style="color:#f92672">/&lt;/span> POWER(&lt;span style="color:#ae81ff">1024&lt;/span>, &lt;span style="color:#ae81ff">4&lt;/span>) &lt;span style="color:#66d9ef">AS&lt;/span> tb_scanned
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> snowflake.account_usage.query_history
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> query_tag &lt;span style="color:#66d9ef">IS&lt;/span> &lt;span style="color:#66d9ef">NOT&lt;/span> &lt;span style="color:#66d9ef">NULL&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> TRY_PARSE_JSON(query_tag) &lt;span style="color:#66d9ef">IS&lt;/span> &lt;span style="color:#66d9ef">NOT&lt;/span> &lt;span style="color:#66d9ef">NULL&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> start_time &lt;span style="color:#f92672">&amp;gt;=&lt;/span> DATEADD(&lt;span style="color:#e6db74">&amp;#39;day&amp;#39;&lt;/span>, &lt;span style="color:#f92672">-&lt;/span>&lt;span style="color:#ae81ff">7&lt;/span>, &lt;span style="color:#66d9ef">CURRENT_TIMESTAMP&lt;/span>())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">GROUP&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> model_name, materialization
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> tb_scanned &lt;span style="color:#66d9ef">DESC&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">LIMIT&lt;/span> &lt;span style="color:#ae81ff">20&lt;/span>;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This is how you find the one dbt model that&amp;rsquo;s responsible for 40% of your bill. Without tags, it&amp;rsquo;s a guessing game.&lt;/p>
&lt;p>For non-dbt workloads — Airflow tasks, Lambda functions, BI tools — set query tags at the session level in your connection configuration. In Python:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">import&lt;/span> snowflake.connector
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">import&lt;/span> json
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>conn &lt;span style="color:#f92672">=&lt;/span> snowflake&lt;span style="color:#f92672">.&lt;/span>connector&lt;span style="color:#f92672">.&lt;/span>connect(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> account&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;your_account&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> user&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;your_user&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> password&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;your_password&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> warehouse&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;transform_wh&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> session_parameters&lt;span style="color:#f92672">=&lt;/span>{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#39;QUERY_TAG&amp;#39;&lt;/span>: json&lt;span style="color:#f92672">.&lt;/span>dumps({
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#39;pipeline&amp;#39;&lt;/span>: &lt;span style="color:#e6db74">&amp;#39;customer_ingestion&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#39;task&amp;#39;&lt;/span>: &lt;span style="color:#e6db74">&amp;#39;load_raw_customers&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#39;environment&amp;#39;&lt;/span>: &lt;span style="color:#e6db74">&amp;#39;production&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> })
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The goal is simple: &lt;strong>every query that runs on your platform should be attributable to a team, a pipeline, or a tool.&lt;/strong> Start with the big consumers and work outward.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="kill-the-zombies">Kill the zombies&lt;/h3>
&lt;br>
&lt;p>Every data platform accumulates dead weight. Tables nobody queries. Pipelines that run faithfully every morning, transforming data that no dashboard, no analyst, and no model has touched in months.&lt;/p>
&lt;p>These zombies cost you twice: once in compute (the pipeline that refreshes them) and once in storage (the data that sits there). Finding them is straightforward:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Tables with zero reads in the last 90 days
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.table_schema,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.&lt;span style="color:#66d9ef">table_name&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.&lt;span style="color:#66d9ef">row_count&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.bytes &lt;span style="color:#f92672">/&lt;/span> POWER(&lt;span style="color:#ae81ff">1024&lt;/span>, &lt;span style="color:#ae81ff">3&lt;/span>) &lt;span style="color:#66d9ef">AS&lt;/span> size_gb,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.last_altered,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">MAX&lt;/span>(ah.query_start_time) &lt;span style="color:#66d9ef">AS&lt;/span> last_queried
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> snowflake.account_usage.tables t
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">LEFT&lt;/span> &lt;span style="color:#66d9ef">JOIN&lt;/span> snowflake.account_usage.access_history ah
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">ON&lt;/span> ah.base_objects_accessed &lt;span style="color:#66d9ef">LIKE&lt;/span> &lt;span style="color:#e6db74">&amp;#39;%&amp;#39;&lt;/span> &lt;span style="color:#f92672">||&lt;/span> t.&lt;span style="color:#66d9ef">table_name&lt;/span> &lt;span style="color:#f92672">||&lt;/span> &lt;span style="color:#e6db74">&amp;#39;%&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> ah.query_start_time &lt;span style="color:#f92672">&amp;gt;=&lt;/span> DATEADD(&lt;span style="color:#e6db74">&amp;#39;day&amp;#39;&lt;/span>, &lt;span style="color:#f92672">-&lt;/span>&lt;span style="color:#ae81ff">90&lt;/span>, &lt;span style="color:#66d9ef">CURRENT_TIMESTAMP&lt;/span>())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.table_schema &lt;span style="color:#66d9ef">NOT&lt;/span> &lt;span style="color:#66d9ef">IN&lt;/span> (&lt;span style="color:#e6db74">&amp;#39;INFORMATION_SCHEMA&amp;#39;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> t.deleted &lt;span style="color:#66d9ef">IS&lt;/span> &lt;span style="color:#66d9ef">NULL&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> t.&lt;span style="color:#66d9ef">row_count&lt;/span> &lt;span style="color:#f92672">&amp;gt;&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">GROUP&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.table_schema, t.&lt;span style="color:#66d9ef">table_name&lt;/span>, t.&lt;span style="color:#66d9ef">row_count&lt;/span>, t.bytes, t.last_altered
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">HAVING&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> last_queried &lt;span style="color:#66d9ef">IS&lt;/span> &lt;span style="color:#66d9ef">NULL&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> size_gb &lt;span style="color:#66d9ef">DESC&lt;/span>;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>On the AWS side, check your S3 storage for orphaned data. Landing zones accumulate raw files that were loaded months ago and never cleaned up. A simple lifecycle policy handles this:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-json" data-lang="json">&lt;span style="display:flex;">&lt;span>{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">&amp;#34;Rules&amp;#34;&lt;/span>: [
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">&amp;#34;ID&amp;#34;&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;Archive raw data after 90 days&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">&amp;#34;Status&amp;#34;&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;Enabled&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">&amp;#34;Filter&amp;#34;&lt;/span>: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">&amp;#34;Prefix&amp;#34;&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;raw-landing/&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">&amp;#34;Transitions&amp;#34;&lt;/span>: [
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">&amp;#34;Days&amp;#34;&lt;/span>: &lt;span style="color:#ae81ff">90&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">&amp;#34;StorageClass&amp;#34;&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;GLACIER_INSTANT_RETRIEVAL&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">&amp;#34;Expiration&amp;#34;&lt;/span>: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">&amp;#34;Days&amp;#34;&lt;/span>: &lt;span style="color:#ae81ff">365&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The cultural shift matters as much as the tooling. Make it a quarterly habit: pull the zombie table list, review it with the team, and deprecate what nobody uses. If someone screams, you can always restore from Time Travel. But in my experience, nobody screams. The data was already dead — you&amp;rsquo;re just acknowledging it.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="cost-anomalies-will-find-you--or-you-can-find-them-first">Cost anomalies will find you — or you can find them first&lt;/h3>
&lt;br>
&lt;p>The scariest cost event isn&amp;rsquo;t the gradual creep. It&amp;rsquo;s the single bad query or misconfigured pipeline that doubles your weekly bill overnight. I&amp;rsquo;ve seen a single Snowflake query with a missing WHERE clause scan an entire 2TB table repeatedly inside a loop — burning through hundreds of credits in an hour.&lt;/p>
&lt;p>On Snowflake, set up resource monitors as a basic guardrail:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Create a resource monitor with alerts and hard stop
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">CREATE&lt;/span> &lt;span style="color:#66d9ef">OR&lt;/span> &lt;span style="color:#66d9ef">REPLACE&lt;/span> RESOURCE MONITOR monthly_budget
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">WITH&lt;/span> CREDIT_QUOTA &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#ae81ff">5000&lt;/span> &lt;span style="color:#75715e">-- adjust to your monthly budget
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span> FREQUENCY &lt;span style="color:#f92672">=&lt;/span> MONTHLY
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> START_TIMESTAMP &lt;span style="color:#f92672">=&lt;/span> IMMEDIATELY
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> TRIGGERS
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">ON&lt;/span> &lt;span style="color:#ae81ff">75&lt;/span> PERCENT &lt;span style="color:#66d9ef">DO&lt;/span> &lt;span style="color:#66d9ef">NOTIFY&lt;/span> &lt;span style="color:#75715e">-- email alert at 75%
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span> &lt;span style="color:#66d9ef">ON&lt;/span> &lt;span style="color:#ae81ff">90&lt;/span> PERCENT &lt;span style="color:#66d9ef">DO&lt;/span> &lt;span style="color:#66d9ef">NOTIFY&lt;/span> &lt;span style="color:#75715e">-- email alert at 90%
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span> &lt;span style="color:#66d9ef">ON&lt;/span> &lt;span style="color:#ae81ff">100&lt;/span> PERCENT &lt;span style="color:#66d9ef">DO&lt;/span> SUSPEND; &lt;span style="color:#75715e">-- hard stop at 100%
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Apply it to a warehouse
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">ALTER&lt;/span> WAREHOUSE transform_wh
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SET&lt;/span> RESOURCE_MONITOR &lt;span style="color:#f92672">=&lt;/span> monthly_budget;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>On AWS, enable Cost Anomaly Detection in the AWS Cost Management console. It uses ML to detect unusual spending patterns and sends alerts via SNS. For Snowflake-specific monitoring, a lightweight approach is a scheduled task that checks daily credit consumption against a rolling average:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Create a simple anomaly detection view
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">CREATE&lt;/span> &lt;span style="color:#66d9ef">OR&lt;/span> &lt;span style="color:#66d9ef">REPLACE&lt;/span> &lt;span style="color:#66d9ef">VIEW&lt;/span> monitoring.daily_cost_anomalies &lt;span style="color:#66d9ef">AS&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WITH&lt;/span> daily_usage &lt;span style="color:#66d9ef">AS&lt;/span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> DATE_TRUNC(&lt;span style="color:#e6db74">&amp;#39;day&amp;#39;&lt;/span>, start_time) &lt;span style="color:#66d9ef">AS&lt;/span> usage_date,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> warehouse_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SUM&lt;/span>(credits_used) &lt;span style="color:#66d9ef">AS&lt;/span> daily_credits
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> snowflake.account_usage.warehouse_metering_history
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> start_time &lt;span style="color:#f92672">&amp;gt;=&lt;/span> DATEADD(&lt;span style="color:#e6db74">&amp;#39;day&amp;#39;&lt;/span>, &lt;span style="color:#f92672">-&lt;/span>&lt;span style="color:#ae81ff">60&lt;/span>, &lt;span style="color:#66d9ef">CURRENT_TIMESTAMP&lt;/span>())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">GROUP&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> usage_date, warehouse_name
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>averages &lt;span style="color:#66d9ef">AS&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> warehouse_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AVG&lt;/span>(daily_credits) &lt;span style="color:#66d9ef">AS&lt;/span> avg_daily_credits,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> STDDEV(daily_credits) &lt;span style="color:#66d9ef">AS&lt;/span> stddev_daily_credits
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> daily_usage
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> usage_date &lt;span style="color:#f92672">&amp;lt;&lt;/span> &lt;span style="color:#66d9ef">CURRENT_DATE&lt;/span>()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">GROUP&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> warehouse_name
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> du.usage_date,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> du.warehouse_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> du.daily_credits,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> a.avg_daily_credits,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ROUND((du.daily_credits &lt;span style="color:#f92672">-&lt;/span> a.avg_daily_credits) &lt;span style="color:#f92672">/&lt;/span> &lt;span style="color:#66d9ef">NULLIF&lt;/span>(a.stddev_daily_credits, &lt;span style="color:#ae81ff">0&lt;/span>), &lt;span style="color:#ae81ff">2&lt;/span>) &lt;span style="color:#66d9ef">AS&lt;/span> z_score
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> daily_usage du
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">JOIN&lt;/span> averages a
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">ON&lt;/span> du.warehouse_name &lt;span style="color:#f92672">=&lt;/span> a.warehouse_name
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> du.usage_date &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">CURRENT_DATE&lt;/span>()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> (du.daily_credits &lt;span style="color:#f92672">-&lt;/span> a.avg_daily_credits) &lt;span style="color:#f92672">/&lt;/span> &lt;span style="color:#66d9ef">NULLIF&lt;/span>(a.stddev_daily_credits, &lt;span style="color:#ae81ff">0&lt;/span>) &lt;span style="color:#f92672">&amp;gt;&lt;/span> &lt;span style="color:#ae81ff">2&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> z_score &lt;span style="color:#66d9ef">DESC&lt;/span>;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>A z-score above 2 means today&amp;rsquo;s spend is more than two standard deviations above the 60-day average. That&amp;rsquo;s worth investigating. Pipe this into a Teams or Slack alert via an AWS Lambda function and you&amp;rsquo;ve got same-day cost anomaly detection for the price of a few lines of SQL.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="what-your-data-model-costs-you">What your data model costs you&lt;/h3>
&lt;br>
&lt;p>This one&amp;rsquo;s subtle but powerful. The way you model your data directly affects how much compute you burn on every query.&lt;/p>
&lt;p>In columnar warehouses like Snowflake, wide denormalised tables query faster than star schemas with multiple JOINs — because each JOIN has overhead, and Snowflake is optimised for scanning columns from flat structures. But wide tables cost more to &lt;em>maintain&lt;/em> because updating a single dimension attribute means rewriting many rows in the fact table.&lt;/p>
&lt;p>The practical approach is a layered architecture:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Staging models&lt;/strong> (views or ephemeral): Minimal transformation, column selection, type casting. Cheap to run, no storage cost.&lt;/li>
&lt;li>&lt;strong>Intermediate models&lt;/strong> (tables or incremental): Business logic, deduplication, SCD handling. Materialised because they&amp;rsquo;re referenced by multiple downstream models.&lt;/li>
&lt;li>&lt;strong>Mart models&lt;/strong> (incremental or table): Wide, denormalised, optimised for consumer queries. These are what analysts and BI tools actually hit.&lt;/li>
&lt;/ul>
&lt;p>The cost trap is materialising too much too early. Every table materialisation means Snowflake stores and maintains that data. Every &lt;code>dbt run&lt;/code> rebuilds it. If an intermediate model is only referenced by one downstream model, make it ephemeral or a view — let Snowflake inline it at query time.&lt;/p>
&lt;p>Check your dbt DAG for this pattern: a staging model materialised as a table, referenced by a single intermediate model, which is also materialised as a table, referenced by a single mart. That&amp;rsquo;s three materialisations where one (the mart) would suffice. The staging and intermediate models can be views or ephemeral — the compute happens once when the mart builds, not three separate times.&lt;/p>
&lt;hr>
&lt;/br>
&lt;/br>
&lt;h3 id="coming-back-to-why-this-matters">Coming back to why this matters&lt;/h3>
&lt;br>
&lt;p>I started this article with a story about not being able to explain my own platform&amp;rsquo;s costs. That wasn&amp;rsquo;t a technical failure. It was a leadership failure. I&amp;rsquo;d built something valuable and then neglected the part that kept it funded.&lt;/p>
&lt;p>The data engineers I respect most aren&amp;rsquo;t the ones who build the most elegant pipelines. They&amp;rsquo;re the ones who can walk into a budget conversation and say: &amp;ldquo;Here&amp;rsquo;s what we spend. Here&amp;rsquo;s what we get for it. Here&amp;rsquo;s what I&amp;rsquo;d cut, and here&amp;rsquo;s what I&amp;rsquo;d invest more in.&amp;rdquo; That&amp;rsquo;s the kind of clarity that earns trust — and trust is what keeps data teams alive when the inevitable cost-cutting conversations happen.&lt;/p>
&lt;p>You don&amp;rsquo;t need a FinOps certification or an expensive monitoring tool to get there. You need the queries in this article, a quarterly habit of reviewing them, and the willingness to kill the zombies nobody wants to admit are dead.&lt;/p>
&lt;p>Start with the 15-minute audit. Tag your queries. Set up a resource monitor. Do those three things this week and you&amp;rsquo;ll know more about your platform&amp;rsquo;s economics than most data teams learn in a year.&lt;/p>
&lt;p>The cheapest query is the one you never need to run. But the most valuable skill is knowing which queries are worth paying for.&lt;/p>
&lt;p>&lt;br>&lt;br>&lt;/p></content:encoded><category>Data Engineering</category><category>Cloud Architecture</category><category>Snowflake</category><category>AWS</category><category>Cost Optimization</category><category>FinOps</category><category>dbt</category><category>Data Platform</category></item></channel></rss>