Ghost in the data
  • Home
  • About
  • Posts
  • Topics
  • Resources
  • Posts
  • 2026
    • Talk
    • Brainstorming
    • Guerrilla Interview Guide
    • 2026 Strategy
    • Dimensional Modeling AWS
    • Duct Tape Data Engineer
    • AI Peer Reviewer
    • NBA Coach Lessons for Data Leaders
    • For Sooty
    • Healing Tables SCD2
    • WAP Iceberg Snowflake
  • 2025
    • UV Tools
    • Zsh Virtual Environments
    • Piracy Service Problem
    • 2025 Data Trends
    • Data Modeling Approaches
    • MacOS Dev Setup
    • Windows Dev Setup
    • Business Context Guide
    • Data Impact
    • Data Engineering Interviews
    • First 90 Days as Data Engineer
    • Senior to Staff Engineer
    • LLMs for Business Part 1
    • LLMs for Business Part 2
    • Mastering 1:1 Meetings
    • AI Prompting Secret
    • Conceptual Data Modeling
    • WAP Pattern for Data Pipelines
    • AI Simplified
    • dbt Fusion: The Engine Upgrade
    • Continuous Integration for Data Teams
    • Claude Code AI Agents
    • Clear Communication Superpower
    • Compliance vs Commitment
    • D&D Leadership
    • Reflective Best Self
    • Financial Independence
    • Dimensional Modeling Lives
    • Balancing Data Accessibility & Privacy
    • Data Quality Crisis
    • Data Quality Framework
    • AWS Data Pipeline
    • Invisible PR
    • AI's Twin Crises
  • 2024
    • Delta-lake
    • Data Normalisation
    • Data Profiling
    • Defensive Engineering
    • CI/CD
    • Setup Docker and Airflow
    • Find and Attract Data Engineers
    • 17 Years of Insights
    • Relationship Building
    • Individual Contributor
  • 2023
    • GitBash with SSH
    • Journalling
    • Minecraft Server in GCP
    • Onboarding a data team
    • File Format for Big Data
    • Incident Management
    • Data Vault
    • Books that are worth you time?
Hero Image
Write-Audit-Publish with Iceberg Tables in Snowflake

It was a Tuesday afternoon when the analyst pinged me on Microsoft Teams: “Hey, the Total Portfolio numbers just jumped 40% overnight. Did we land a whale?” We hadn’t. What actually happened was more mundane and significantly more painful. A schema change in the source system introduced a currency conversion bug. Our pipeline dutifully loaded the corrupted data into production at 3 AM, the dashboards updated by 6 AM, and the Department Head opened her morning report to numbers that looked like champagne-worthy growth.

  • Apache Iceberg
  • Snowflake
  • WAP Pattern
  • Data Quality
  • SQL
  • Lakehouse
  • Data Pipelines
  • Best Practices
Friday, February 27, 2026 Read
Hero Image
Healing Tables: When Day-by-Day Backfills Become a Slow-Motion Disaster

It was 2 AM on a Saturday when I realized we’d been loading data wrong for six months. The situation: a customer dimension with three years of history needed to be backfilled after a source system migration. The previous team’s approach was straightforward—run the daily incremental process 1,095 times, once for each day of history. They estimated three weeks to complete. What they hadn’t accounted for was how errors compound. By the time I looked at the data, we had 47,000 records with overlapping date ranges, 12,000 timeline gaps where customers seemed to vanish and reappear, and an unknowable number of missed changes from when source systems updated the same record multiple times in a single day.

  • SCD
  • Historical Load
  • dbt
  • SQL
  • Data Quality
  • Dimensional Modeling
  • Delta Lake
  • Best Practices
Saturday, February 7, 2026 Read
Hero Image
For Sooty

This one isn’t about data pipelines. There’s no framework, no architecture diagram, no code snippet at the end. Yesterday I said goodbye to my best friend. Sooty was a miniature schnauzer — nine kilograms of stubbornness, loyalty, and heart. Born October 2011. Gone February 2026. Fourteen years that changed the shape of everything. I don’t have the right words for this. I’m not sure anyone does. But I tried to write something that comes close, and I wanted to share it here — because this is my corner of the internet, and she deserves a place in it.

  • Personal
  • Life
  • Loss
  • Grief
Tuesday, February 3, 2026 Read
Hero Image
What an NBA Coach Can Teach Data Leaders About Building Teams That Actually Work

I was three hours into a retrospective that had devolved into blame-shifting when the most senior engineer on the team finally spoke up. “Look,” he said, “we can keep pointing fingers at the data model, or we can admit we don’t actually trust each other enough to have an honest conversation about what went wrong.” The room went quiet. He was right. That moment stuck with me because it exposed something I’ve seen destroy more data teams than bad architecture ever could: the absence of genuine connection between people who spend forty-plus hours a week depending on each other.

  • Leadership
  • Team Building
  • Culture
  • Management
  • Data Teams
  • Remote Work
  • Psychological Safety
Monday, February 2, 2026 Read
Hero Image
Context Engineering: The New Must-Have Skill for Data Engineers

Last year I watched a colleague ask AI to help write a dbt model. The AI spit out perfectly functional SQL—clean syntax, proper CTEs, the works. Looked great. Then I noticed the table would eventually hold 800 million rows. No partitioning. No clustering. Just a raw, unoptimised heap waiting to turn into a query performance nightmare (that would likely become my nightmare to fix). The engineer wasn’t at fault. The AI wasn’t at fault either, really. The AI simply didn’t know that our environment clusters large tables by date. It didn’t know our team’s conventions around incremental models. It couldn’t know, because nobody had told it.

  • AI
  • dbt
  • Data Quality
  • SQL
  • Productivity
  • VSCode
  • Claude
Saturday, January 31, 2026 Read
Hero Image
The Duct Tape Data Engineer

The Engineer Who Ships I want to tell you about a data engineer I worked with. Let’s call her Sarah. Sarah had a reputation. When business stakeholders had an urgent question—the kind that arrives at 4 PM on a Friday with the CEO’s name in the subject line—they went to Sarah. Not to the senior architect with the impeccable data model. Not to the platform team with their carefully orchestrated Airflow DAGs. They went to Sarah.

  • Data Engineering
  • DuckDB
  • Architecture
  • Pragmatism
  • Career Development
  • Technical Strategy
  • Data Platforms
  • Kimball
  • Data Modeling
Saturday, January 24, 2026 Read
Hero Image
That Tuesday Morning When I Finally Fixed Our Ten-Minute Queries

The Ten-Minute Query I’m sitting at my laptop on a Tuesday morning, waiting. The progress bar on my screen says ‘Query running… 4 minutes, 37 seconds.’ I lean back in my chair and let out this long sigh that probably says more than I intended. My manager walks past my desk. She glances at my screen, and I can see that look—the one that says she already knows what I’m about to tell her. I didn’t need to explain.

  • AWS Glue
  • Dimensional Modeling
  • Kimball Methodology
  • Data Quality
  • ETL
  • Write-Audit-Publish
  • Apache Iceberg
  • Step Functions
  • SCD Type 2
Friday, January 16, 2026 Read
Hero Image
The 2026 Data Engineering Strategy Nobody's Writing (But Everyone Needs)

What if I told you the biggest threat to your data platform isn’t technology—it’s that we’ve stopped building the next generation of engineers who’ll run it? Not the latest database that promises to solve everything. Not whether you picked the right orchestrator. The real crisis is that we’ve systematically broken our talent pipeline. And in 2026, that decision is going to start costing us in ways that no amount of tooling can fix.

  • Strategy
  • Team Building
  • Cost Optimization
  • DuckDB
  • AI Tools
  • Career Planning
  • 2026 Trends
  • Future of Work
Thursday, January 15, 2026 Read
Hero Image
The Guerrilla Guide to Data Engineering Interviews

The Scenario That Changes Everything Picture this: You’re sitting in an interview room—or more likely these days, staring at a Zoom window with your carefully curated bookshelf background—and the interviewer asks you about data quality. “Tell me about your experience with data quality,” they say. You have two choices. Choice A: “Data quality is really important in data engineering. It involves ensuring data is accurate, complete, consistent, and timely. I believe strongly in implementing data quality checks throughout the pipeline.”

  • Interviews
  • Career Growth
  • Technical Assessment
  • SQL
  • Data Modeling
  • Problem Solving
  • Delta Lake
  • dbt
  • Data Quality
Sunday, January 11, 2026 Read
Hero Image
Why Your Ideas Die in Planning Meetings

The silence that kills good ideas One morning, I sat in yet another meeting where we just spent two weeks backfilling a table then we found it was riddled with issues with the data. Even if we resolve the issue, it would then be another 2 weeks to backfill the data, there has to be a better way. “So, what do we think? Give me your best ideas for tackling this.”

  • Team Culture
  • Collaboration
  • Psychological Safety
  • Innovation
  • Change Management
  • Technical Leadership
  • Data Teams
Wednesday, January 7, 2026 Read
Hero Image
The Science of Conversation for people who hate small talk

One morning, I watched a data engineer struggle with using AI for thirty minutes, trying to debug a DBT job. The problem wasn’t the LLM’s capabilities—it was how the engineer framed the question. No context about what they’d already tried. No explanation of the expected versus actual output. Just “fix this code” followed by a massive code dump. This same engineer had similar struggles with stakeholders. Presentations that assumed too much context. Emails that buried the ask. Meetings where they answered questions nobody asked.

  • Communication
  • Soft Skills
  • Leadership
  • Career Growth
  • Team Building
  • Stakeholder Management
  • Professional Development
Sunday, January 4, 2026 Read