<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Code Comments on Ghost in the data</title><link>https://ghostinthedata.info/tags/code-comments/</link><description>Ghost in the data</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>Ghost in the data</copyright><lastBuildDate>Sat, 06 Jun 2026 09:00:00 +1000</lastBuildDate><atom:link href="https://ghostinthedata.info/tags/code-comments/index.xml" rel="self" type="application/rss+xml"/><item><title>SQL Tells You What. Comments Tell You Why.</title><link>https://ghostinthedata.info/posts/2026/2026-06-06-code-tells-what-comments-why/</link><pubDate>Sat, 06 Jun 2026 09:00:00 +1000</pubDate><guid>https://ghostinthedata.info/posts/2026/2026-06-06-code-tells-what-comments-why/</guid><author>Chris Hillman</author><description>SQL is a declarative language — it tells you what the query does, never why. Here's why that distinction matters more in data engineering than anywhere else.</description><content:encoded>&lt;p>The best SQL doesn&amp;rsquo;t need comments. Write meaningful CTE names, descriptive aliases, clear column labels — and a skilled reader will follow your logic without a single annotation. That&amp;rsquo;s the right instinct.&lt;/p>
&lt;p>It&amp;rsquo;s also only half right.&lt;/p>
&lt;p>SQL is a declarative language. You&amp;rsquo;re not writing &lt;em>how&lt;/em> the database retrieves your data; you&amp;rsquo;re writing &lt;em>what&lt;/em> you want. That&amp;rsquo;s a useful distinction, because &amp;ldquo;what&amp;rdquo; and &amp;ldquo;why&amp;rdquo; are very different questions, and SQL can answer exactly one of them.&lt;/p>
&lt;p>A query can be perfectly, elegantly readable and still be completely opaque about the reason it exists. The name of your CTE, no matter how well chosen, cannot explain the business decision that gave birth to it.&lt;/p>
&lt;hr>
&lt;h3 id="the-self-documenting-ceiling">The self-documenting ceiling&lt;/h3>
&lt;p>Consider this query:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> customer_id,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SUM&lt;/span>(order_total) &lt;span style="color:#66d9ef">AS&lt;/span> lifetime_value,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> DATEDIFF(&lt;span style="color:#66d9ef">day&lt;/span>, &lt;span style="color:#66d9ef">MIN&lt;/span>(order_date), &lt;span style="color:#66d9ef">CURRENT_DATE&lt;/span>) &lt;span style="color:#66d9ef">AS&lt;/span> customer_age_days
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> orders
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> order_date &lt;span style="color:#f92672">&amp;gt;=&lt;/span> DATEADD(&lt;span style="color:#66d9ef">day&lt;/span>, &lt;span style="color:#f92672">-&lt;/span>&lt;span style="color:#ae81ff">90&lt;/span>, &lt;span style="color:#66d9ef">CURRENT_DATE&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> status_code &lt;span style="color:#66d9ef">NOT&lt;/span> &lt;span style="color:#66d9ef">IN&lt;/span> (&lt;span style="color:#ae81ff">1&lt;/span>, &lt;span style="color:#ae81ff">2&lt;/span>, &lt;span style="color:#ae81ff">9&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">GROUP&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> customer_id
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Clean SQL. No ambiguity about &lt;em>what&lt;/em> it does. But try answering these questions from the code alone:&lt;/p>
&lt;p>Why 90 days? Why not 60, or 180, or the full history? Is 90 the standard customer lifetime window, a finance reporting period, the SLA in a merchant agreement, or the number someone picked in a meeting four years ago and nobody has questioned since?&lt;/p>
&lt;p>What are status codes 1, 2, and 9? You might guess from context — pending, draft, cancelled — but you&amp;rsquo;re guessing. More importantly: why are they excluded? Is this a business rule about what counts as &amp;ldquo;real&amp;rdquo; revenue, or a data quality workaround because those status codes appear when an upstream webhook fires twice?&lt;/p>
&lt;p>These are not trivial questions. The 90-day cutoff defines what &amp;ldquo;active customer&amp;rdquo; means across your entire reporting layer. The excluded status codes determine what gets counted as revenue. Change either without understanding the original intent, and you&amp;rsquo;re not refactoring — you&amp;rsquo;re silently redefining business logic that stakeholders are relying on.&lt;/p>
&lt;p>Better SQL helps, but only so far:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> customer_id,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SUM&lt;/span>(order_total) &lt;span style="color:#66d9ef">AS&lt;/span> lifetime_value,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> DATEDIFF(&lt;span style="color:#66d9ef">day&lt;/span>, &lt;span style="color:#66d9ef">MIN&lt;/span>(order_date), &lt;span style="color:#66d9ef">CURRENT_DATE&lt;/span>) &lt;span style="color:#66d9ef">AS&lt;/span> customer_age_days
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> orders
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> order_date &lt;span style="color:#f92672">&amp;gt;=&lt;/span> DATEADD(&lt;span style="color:#66d9ef">day&lt;/span>, &lt;span style="color:#f92672">-&lt;/span>&lt;span style="color:#ae81ff">90&lt;/span>, &lt;span style="color:#66d9ef">CURRENT_DATE&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> status_code &lt;span style="color:#66d9ef">NOT&lt;/span> &lt;span style="color:#66d9ef">IN&lt;/span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#ae81ff">1&lt;/span>, &lt;span style="color:#75715e">-- pending
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span> &lt;span style="color:#ae81ff">2&lt;/span>, &lt;span style="color:#75715e">-- draft
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span> &lt;span style="color:#ae81ff">9&lt;/span> &lt;span style="color:#75715e">-- cancelled
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">GROUP&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> customer_id
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This is better. You&amp;rsquo;ve replaced the mystery with labels. But inline labels aren&amp;rsquo;t explanations — they&amp;rsquo;re just naming the what more explicitly. You still don&amp;rsquo;t know why.&lt;/p>
&lt;hr>
&lt;h3 id="what-sql-literally-cannot-tell-you">What SQL literally cannot tell you&lt;/h3>
&lt;p>SQL cannot explain why the program was written the way it was. It cannot discuss the reasons certain alternative approaches were taken. It cannot tell you that the business definition changed, that the upstream system is broken, or that this logic exists specifically to handle a problem that was supposed to be fixed in Q2 and wasn&amp;rsquo;t.&lt;/p>
&lt;p>Here&amp;rsquo;s the kind of context that belongs in a comment:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Customer lifetime window: 90-day lookback aligns with the SLA in the Merchant Agreement (v3.2).
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Reviewed and confirmed with Finance, Dec 2024. Don&amp;#39;t change without checking with that team first.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- status_codes 1, 2, 9 excluded per the order lifecycle defined in the legacy Salesforce migration.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- These codes appear as artefacts when the Salesforce sync fires before the order is confirmed.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- The upstream fix was deprioritised (JIRA: DATA-3841). Removing this filter will cause double-counting.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">SELECT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> customer_id,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">SUM&lt;/span>(order_total) &lt;span style="color:#66d9ef">AS&lt;/span> lifetime_value,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> DATEDIFF(&lt;span style="color:#66d9ef">day&lt;/span>, &lt;span style="color:#66d9ef">MIN&lt;/span>(order_date), &lt;span style="color:#66d9ef">CURRENT_DATE&lt;/span>) &lt;span style="color:#66d9ef">AS&lt;/span> customer_age_days
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">FROM&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> orders
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">WHERE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> order_date &lt;span style="color:#f92672">&amp;gt;=&lt;/span> DATEADD(&lt;span style="color:#66d9ef">day&lt;/span>, &lt;span style="color:#f92672">-&lt;/span>&lt;span style="color:#ae81ff">90&lt;/span>, &lt;span style="color:#66d9ef">CURRENT_DATE&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">AND&lt;/span> status_code &lt;span style="color:#66d9ef">NOT&lt;/span> &lt;span style="color:#66d9ef">IN&lt;/span> (&lt;span style="color:#ae81ff">1&lt;/span>, &lt;span style="color:#ae81ff">2&lt;/span>, &lt;span style="color:#ae81ff">9&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">GROUP&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> customer_id
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>That comment block is five lines. It contains: the business rationale, the stakeholder who owns the decision, the date it was confirmed, the upstream system causing the problem, the known issue reference, and the consequence of removing the filter. None of that information is recoverable from the SQL. All of it is load-bearing.&lt;/p>
&lt;hr>
&lt;h3 id="the-deduplication-that-nobody-remembers-adding">The deduplication that nobody remembers adding&lt;/h3>
&lt;p>Have you seen this line before:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>QUALIFY ROW_NUMBER() OVER (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> PARTITION &lt;span style="color:#66d9ef">BY&lt;/span> customer_id, order_id
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span> ingested_at &lt;span style="color:#66d9ef">DESC&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The SQL is completely transparent about what it does: take the most recently ingested row for each customer/order combination. What it cannot tell you is why duplicates exist in the first place, whether the source of those duplicates has been fixed, whether this deduplication logic is still necessary, or whether &amp;ldquo;most recently ingested&amp;rdquo; is actually the right tiebreaker for your use case.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Deduplicating on customer_id + order_id to handle duplicate webhook events from Eftpos.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Eftpos retries failed webhooks, which can fire the same order_created event multiple times.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Taking the latest ingested_at gives us the most recent event state (status, amount, etc.)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Revisit if we adopt Eftpos&amp;#39;s idempotency keys at the integration layer — this dedup may become redundant.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Background: this logic was added after the October 2024 incident (post-mortem in Confluence).
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>QUALIFY ROW_NUMBER() OVER (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> PARTITION &lt;span style="color:#66d9ef">BY&lt;/span> customer_id, order_id
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">ORDER&lt;/span> &lt;span style="color:#66d9ef">BY&lt;/span> ingested_at &lt;span style="color:#66d9ef">DESC&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The second version tells the reader everything they need to know to maintain this safely: the upstream cause, the logic behind the tiebreaker, when to reconsider it, and where to find the history. Without that comment, every future engineer who touches this code has to reverse-engineer context that no longer exists anywhere.&lt;/p>
&lt;hr>
&lt;h3 id="when-two-models-define-the-same-word-differently">When two models define the same word differently&lt;/h3>
&lt;p>This one doesn&amp;rsquo;t generate an error. It generates confusion at the worst possible moment.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- IMPORTANT: &amp;#39;churned&amp;#39; here means &amp;gt;90 days since last order.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- This differs from mkt_customer_segments, which uses &amp;gt;60 days for winback campaign targeting.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- Both definitions are intentional. Customer Success uses 90 days because cohort analysis
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- showed that 30% of &amp;#34;60-day churned&amp;#34; customers reorder organically within the next 30 days
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- and shouldn&amp;#39;t receive a discount. Marketing uses the stricter threshold to maximise winback opportunities.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">-- The discrepancy is documented and known. Do not &amp;#34;fix&amp;#34; one to match the other.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>&lt;span style="color:#66d9ef">CASE&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">WHEN&lt;/span> days_since_last_order &lt;span style="color:#f92672">&amp;lt;=&lt;/span> &lt;span style="color:#ae81ff">30&lt;/span> &lt;span style="color:#66d9ef">THEN&lt;/span> &lt;span style="color:#e6db74">&amp;#39;active&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">WHEN&lt;/span> days_since_last_order &lt;span style="color:#f92672">&amp;lt;=&lt;/span> &lt;span style="color:#ae81ff">90&lt;/span> &lt;span style="color:#66d9ef">THEN&lt;/span> &lt;span style="color:#e6db74">&amp;#39;at_risk&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">ELSE&lt;/span> &lt;span style="color:#e6db74">&amp;#39;churned&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">END&lt;/span> &lt;span style="color:#66d9ef">AS&lt;/span> customer_segment
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Without that comment, someone eventually &amp;ldquo;cleans up&amp;rdquo; one of the two definitions because having different churn thresholds in two models looks like a bug. They&amp;rsquo;re wrong, and they&amp;rsquo;re about to break a winback campaign that&amp;rsquo;s been performing well for six months.&lt;/p>
&lt;p>The SQL cannot warn them. The comment can.&lt;/p>
&lt;hr>
&lt;h3 id="dbt-descriptions-arent-exempt">dbt descriptions aren&amp;rsquo;t exempt&lt;/h3>
&lt;p>dbt gives you a proper home for the &amp;ldquo;why&amp;rdquo;: model descriptions and column-level descriptions in your YAML.&lt;/p>
&lt;p>Bad:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">models&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#f92672">name&lt;/span>: &lt;span style="color:#ae81ff">fct_orders&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">description&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;The orders fact table. Contains order data.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">columns&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#f92672">name&lt;/span>: &lt;span style="color:#ae81ff">customer_segment&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">description&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;The customer segment based on days since last order.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>That description restates what the model name already says. It is a comment-shaped void.&lt;/p>
&lt;p>Good:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">models&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#f92672">name&lt;/span>: &lt;span style="color:#ae81ff">fct_orders&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">description&lt;/span>: &amp;gt;&lt;span style="color:#e6db74">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> Order-level fact table used as the single source of truth for revenue reporting.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> Revenue excludes test accounts (prefixed &amp;#39;INTERNAL_&amp;#39;) and gift card redemptions,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> per the Finance revenue recognition policy agreed January 2025.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> See Confluence: Revenue Recognition Standards v3.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> Do not use this model for marketing attribution — use mkt_attributed_revenue instead,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> which applies different channel logic.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">columns&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#f92672">name&lt;/span>: &lt;span style="color:#ae81ff">customer_segment&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">description&lt;/span>: &amp;gt;&lt;span style="color:#e6db74">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> Lifecycle segment defined by the Customer Success team, Q1 2025.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> Thresholds (30/90 days) were derived from cohort analysis showing a median
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> reorder window of 28 days. Intentionally differs from the Marketing segment
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> definition in mkt_customer_segments, which uses a 60-day churn threshold
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> for winback campaign purposes. Both are correct for their respective use cases.&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>That description does something a column name never can: it tells you who owns the definition, when it was set, what analysis underpins it, and — critically — where the definition intentionally diverges from a similar field elsewhere in the warehouse. That last part is the thing that saves someone three hours of confused Teams messages.&lt;/p>
&lt;hr>
&lt;h3 id="the-archaeology-problem">The archaeology problem&lt;/h3>
&lt;p>Data teams inherit codebases. It happens constantly, and it will keep happening. Engineers leave. Consultants finish their contracts. Reorgs shuffle ownership. What gets left behind is the SQL, and whatever context wasn&amp;rsquo;t written down is gone.&lt;/p>
&lt;p>Good SQL survives the handover. It&amp;rsquo;s readable, consistent, and correct. But &amp;ldquo;readable&amp;rdquo; in the sense of mechanically parseable is not the same as &amp;ldquo;understandable&amp;rdquo; in the sense of knowing what business problem the code was solving, whose decision it was, and what you&amp;rsquo;d need to change if that decision changed.&lt;/p>
&lt;p>Comments are how you write for the engineer who inherits this in two years. They&amp;rsquo;re how you write for yourself in six months. They&amp;rsquo;re how you ensure that a working pipeline stays working through the context collapse that happens every time anyone leaves a team.&lt;/p>
&lt;p>SQL can only tell you what. Try not to shortchange the people reading it in either direction.&lt;/p></content:encoded><category>Data Engineering</category><category>Data Quality</category><category>SQL</category><category>dbt</category><category>Documentation</category><category>Data Quality</category><category>Code Comments</category><category>Data Pipelines</category><category>Best Practices</category></item></channel></rss>