2026

You Can't Incentivise a Pipeline That Doesn't Break

I worked alongside a data engineer who was, by every formal measure, the best performer on the team. He was also quietly destroying the platform. Not maliciously. He was optimising for the thing being measured. His work shipped fast because he skipped the edge case analysis. He closed tickets at first resolution without ever checking whether the underlying pattern would recur. He didn’t review anyone else’s PRs (not his KPIs, so why would he?).

Saturday, July 4, 2026 Read

Your Team Already Has Patterns. They Just Don't Know It.

When I started a new role, one of the first things I did was try to understand how data moved through the system. Not the dashboards, not the data models — the pipes. Where did things come from? How did they get in? What happened to them along the way? There were somewhere between twenty and thirty source systems feeding the platform. Not a massive number, but enough to tell a story when you looked at the ingestion layer all at once. What I found was that all the pipelines had originated from two base templates. A sensible starting point. The kind of thing a small team puts in place early to stop complete chaos.

Saturday, June 27, 2026 Read

Keep Moving

Some days I open my laptop and by 5pm I genuinely cannot tell you what I did. Not because it was complicated. Not because there were emergencies. The stand-up happened. A few Teams messages were sent. A ticket was groomed. A document was “reviewed”. A meeting was attended where everyone agreed something was important and then the meeting ended and nothing changed. And then somehow it was evening and the pipeline I meant to fix was exactly as broken as it was in the morning.

Tuesday, June 23, 2026 Read

Ghost Skills: Teaching AI Agents to Think Like Data Engineers

Another week, another skills repo on the GitHub trending page. I know. There are roughly seventeen of them now, all promising to turn your AI coding agent from a confident intern into a slightly-less-confident intern. Most of them are great. Most of them are also built by solo devs, for solo devs, on solo-dev codebases that fit comfortably in a context window. Which is fine, if that’s your world. Less fine if your world involves a Snowflake warehouse with four tables that could be the source of truth for “customer”, an SCD2 someone half-built in 2021 and quietly walked away from, and a dbt project where stg_users_final_v3_actually_use_this is, somehow, the one you’re meant to use. (Don’t laugh. You’ve seen worse.)

Sunday, June 14, 2026 Read

The Competitive Moat That AI Can't Replicate

The Restaurant That Refused to Take Bookings Online Let me tell you a story about a restaurant owner who became obsessed with human connection. He didn’t want people booking online. He wanted them to call. He wanted the ritual of a human voice, the small exchange about an anniversary or a first date, the warmth of being recognised. His team thought he was losing his mind. Online bookings were standard. Everyone did it. Why make customers work harder?

Saturday, June 13, 2026 Read

SQL Tells You What. Comments Tell You Why.

The best SQL doesn’t need comments. Write meaningful CTE names, descriptive aliases, clear column labels — and a skilled reader will follow your logic without a single annotation. That’s the right instinct. It’s also only half right. SQL is a declarative language. You’re not writing how the database retrieves your data; you’re writing what you want. That’s a useful distinction, because “what” and “why” are very different questions, and SQL can answer exactly one of them.

Saturday, June 6, 2026 Read

Don't Go Dark: Visibility Is a Data Engineering Skill

There’s a specific kind of silence in data engineering that I’ve learned to fear. Not the silence of a system that’s working well. Not the comfortable quiet of a team in flow. I mean the silence of a project that’s been running for three weeks and you still can’t point to a single visible thing it has produced. The kind of silence where, if your manager stopped you in the hallway and asked “how’s that migration going?”, you’d say “fine” because saying anything more accurate would require explaining things you haven’t fully articulated yet — even to yourself.

Saturday, May 23, 2026 Read

The Broken Window in Your Data Pipeline

There’s a particular kind of data problem that doesn’t announce itself. It accumulates. We were receiving Salesforce data through delta extraction — sensible in theory, because full snapshots can run to hundreds of terabytes and less than 1% of records change on any given day. The problem is that deltas require someone to know what “changed” means. In Salesforce, that’s less obvious than it sounds. Watch a last_modified column and you’ll miss objects that get updated when a related object changes, without their own timestamp reflecting it.

Saturday, May 9, 2026 Read

Five Worlds of Data Engineering

You watch a conference talk about implementing data contracts, and nobody mentions that the advice assumes you have multiple teams producing data — which you don’t. You read a post declaring “if you’re still using stored procedures in 2026, you’re doing it wrong,” and the comments erupt. Half the people are nodding along. Half are furious. Both sides are right. They’re just living in different worlds and don’t realise it.

Saturday, May 2, 2026 Read

Your Data Platform Costs More Than It Should

Let me tell you about the moment I stopped treating cloud costs as someone else’s problem. We were three months into a Snowflake migration. Everything was humming. Pipelines were green, dashboards were fast, the analytics team was happier than I’d seen them before. I felt good about the work we’d done. Then finance forwarded me the invoice. The number wasn’t catastrophic. But it was significantly higher than what we’d budgeted, and when I started digging, I couldn’t explain where most of it was going. I knew we had warehouses running. I knew we had pipelines executing. But I couldn’t tell you which warehouse was responsible for what cost, which pipelines were the expensive ones, or whether the money was well spent. I had built a platform I was proud of — and I had no idea what it actually cost to operate.

Saturday, April 25, 2026 Read

Why Your Pipeline Finishes Later Every Month

Let me tell you about a graph that changed how I think about data engineering. A junior engineer on my team — let’s call her Priya — had been tracking something nobody asked her to track. Every morning for two months, she’d noted the timestamp when our main analytics pipeline completed. She wasn’t trying to make a point. She was just curious, because the finance team kept mentioning their dashboards weren’t ready when they arrived at 8 AM anymore.

Saturday, April 18, 2026 Read

Stop Building Salesforce Integrations From Scratch

Let me tell you about Marcus. Marcus was on a team I led a few years back. Sharp, motivated, the kind of engineer who actually read documentation before writing code. When the business asked us to get Salesforce data into our warehouse, Marcus volunteered. He’d done API work before. He figured a few weeks, tops. He scoped it carefully. Built a Python service that authenticated via OAuth, pulled Account, Contact, and Opportunity objects through the Bulk API, flattened the nested JSON into relational tables, handled pagination, managed rate limits. Wrote solid tests. Documented everything. The kind of work you’d point to in a code review and say this is how it’s done.

Saturday, April 4, 2026 Read