Ghost in the data
  • Home
  • About
  • Posts
  • Topics
  • Resources
  • RSS
  • Posts
  • 2026
    • Talk
    • Brainstorming
    • Guerrilla Interview Guide
    • 2026 Strategy
    • Dimensional Modeling AWS
    • Duct Tape Data Engineer
    • AI Peer Reviewer
    • NBA Coach Lessons for Data Leaders
    • For Sooty
    • Healing Tables SCD2
    • WAP Iceberg Snowflake
    • The CSV Test Suite Nobody Writes
    • 12 Steps to Better Data Engineering
    • Your Data Model Isn't Broken Pt I
    • Your Friends Will Be There
    • Fix Your Data Without Permission
    • Your Data Model Isn't Broken Pt II
    • Stop Building Salesforce Integrations
  • 2025
    • UV Tools
    • Zsh Virtual Environments
    • Piracy Service Problem
    • 2025 Data Trends
    • Data Modeling Approaches
    • MacOS Dev Setup
    • Windows Dev Setup
    • Business Context Guide
    • Data Impact
    • Data Engineering Interviews
    • First 90 Days as Data Engineer
    • Senior to Staff Engineer
    • LLMs for Business Part 1
    • LLMs for Business Part 2
    • Mastering 1:1 Meetings
    • Data Quality Test
    • AI Prompting Secret
    • Conceptual Data Modeling
    • WAP Pattern for Data Pipelines
    • AI Simplified
    • dbt Fusion: The Engine Upgrade
    • Continuous Integration for Data Teams
    • Claude Code AI Agents
    • Clear Communication Superpower
    • Compliance vs Commitment
    • D&D Leadership
    • Reflective Best Self
    • Financial Independence
    • Dimensional Modeling Lives
    • Balancing Data Accessibility & Privacy
    • Data Quality Crisis
    • Data Quality Framework
    • AWS Data Pipeline
    • Invisible PR
    • AI's Twin Crises
  • 2024
    • Delta-lake
    • Data Normalisation
    • Data Profiling
    • Defensive Engineering
    • CI/CD
    • Setup Docker and Airflow
    • Find and Attract Data Engineers
    • 17 Years of Insights
    • Relationship Building
    • Individual Contributor
  • 2023
    • GitBash with SSH
    • Journalling
    • Minecraft Server in GCP
    • Onboarding a data team
    • File Format for Big Data
    • Incident Management
    • Data Vault
    • Books that are worth you time?
Hero Image
The Duct Tape Data Engineer

The Engineer Who Ships I want to tell you about a data engineer I worked with. Let’s call her Sarah. Sarah had a reputation. When business stakeholders had an urgent question—the kind that arrives at 4 PM on a Friday with the CEO’s name in the subject line—they went to Sarah. Not to the senior architect with the impeccable data model. Not to the platform team with their carefully orchestrated Airflow DAGs. They went to Sarah.

  • Data Engineering
  • DuckDB
  • Architecture
  • Pragmatism
  • Career Development
  • Technical Strategy
  • Data Platforms
  • Kimball
  • Data Modeling
Saturday, January 24, 2026 Read
Hero Image
That Tuesday Morning When I Finally Fixed Our Ten-Minute Queries

The Ten-Minute Query I’m sitting at my laptop on a Tuesday morning, waiting. The progress bar on my screen says ‘Query running… 4 minutes, 37 seconds.’ I lean back in my chair and let out this long sigh that probably says more than I intended. My manager walks past my desk. She glances at my screen, and I can see that look—the one that says she already knows what I’m about to tell her. I didn’t need to explain.

  • AWS Glue
  • Dimensional Modeling
  • Kimball Methodology
  • Data Quality
  • ETL
  • Write-Audit-Publish
  • Apache Iceberg
  • Step Functions
  • SCD Type 2
Friday, January 16, 2026 Read
Hero Image
The 2026 Data Engineering Strategy Nobody's Writing (But Everyone Needs)

What if I told you the biggest threat to your data platform isn’t technology—it’s that we’ve stopped building the next generation of engineers who’ll run it? Not the latest database that promises to solve everything. Not whether you picked the right orchestrator. The real crisis is that we’ve systematically broken our talent pipeline. And in 2026, that decision is going to start costing us in ways that no amount of tooling can fix.

  • Strategy
  • Team Building
  • Cost Optimization
  • DuckDB
  • AI Tools
  • Career Planning
  • 2026 Trends
  • Future of Work
Thursday, January 15, 2026 Read
Hero Image
The Guerrilla Guide to Data Engineering Interviews

The Scenario That Changes Everything Picture this: You’re sitting in an interview room—or more likely these days, staring at a Zoom window with your carefully curated bookshelf background—and the interviewer asks you about data quality. “Tell me about your experience with data quality,” they say. You have two choices. Choice A: “Data quality is really important in data engineering. It involves ensuring data is accurate, complete, consistent, and timely. I believe strongly in implementing data quality checks throughout the pipeline.”

  • Interviews
  • Career Growth
  • Technical Assessment
  • SQL
  • Data Modeling
  • Problem Solving
  • Delta Lake
  • dbt
  • Data Quality
Sunday, January 11, 2026 Read
Hero Image
Why Your Ideas Die in Planning Meetings

The silence that kills good ideas One morning, I sat in yet another meeting where we just spent two weeks backfilling a table then we found it was riddled with issues with the data. Even if we resolve the issue, it would then be another 2 weeks to backfill the data, there has to be a better way. “So, what do we think? Give me your best ideas for tackling this.”

  • Team Culture
  • Collaboration
  • Psychological Safety
  • Innovation
  • Change Management
  • Technical Leadership
  • Data Teams
Wednesday, January 7, 2026 Read
Hero Image
The Science of Conversation for people who hate small talk

One morning, I watched a data engineer struggle with using AI for thirty minutes, trying to debug a DBT job. The problem wasn’t the LLM’s capabilities—it was how the engineer framed the question. No context about what they’d already tried. No explanation of the expected versus actual output. Just “fix this code” followed by a massive code dump. This same engineer had similar struggles with stakeholders. Presentations that assumed too much context. Emails that buried the ask. Meetings where they answered questions nobody asked.

  • Communication
  • Soft Skills
  • Leadership
  • Career Growth
  • Team Building
  • Stakeholder Management
  • Professional Development
Sunday, January 4, 2026 Read
Hero Image
The Data Quality Test: 10 Questions That Predict Pipeline Disasters

I’ve been writing about data quality a lot lately. Enough that I notice myself doing it. Enough that a small voice says: haven’t you made this point already? Schema drift, NULL propagation, duplicate records, the whole catalogue of things that go wrong in the space between a source system and a warehouse. I keep circling back to it. And every time, I almost talk myself out of writing the piece. Then I reflect on what’s happened in the last few years of work. The postmortems I’ve read, the pipelines I’ve inherited — and the same pattern shows up with depressing regularity. Not exotic failures. Not edge cases. The boring stuff. The questions nobody asked before the first row hit the warehouse.

  • Data Quality
  • Pipeline Design
  • Schema Drift
  • Idempotency
  • Data Contracts
  • Incident Response
  • Data Ownership
Friday, April 11, 2025 Read
  • ««
  • «
  • 1
  • 2
  • »
  • »»