Your Data Model Isn't Broken, Part I: Why Refactoring Beats Rebuilding

In the early 2000’s - Netscape’s decision to rewrite their browser from scratch was the single worst strategic mistake a software company could make.

At the time, Netscape was winning. They had the dominant browser. They had market share. They had momentum. And then they decided the codebase was too messy, too tangled, too hard to work with — so they threw it all away and started over. Navigator 4.0 became the foundation for a rewrite that would eventually ship as version 6.0. There was no 5.0. Three years of development. No shipping product. And while Netscape’s engineers were busy building their beautiful new browser in a vacuum, Internet Explorer ate their lunch, their dinner, and most of their market share.

It’s haunted me ever since: old code isn’t ugly because it’s bad. Old code is ugly because it works. Every strange condition, every seemingly redundant check, every patch that makes a new developer wince — those are battle scars. Each one represents a bug that took weeks to find in production, a customer workflow nobody anticipated, or an edge case that only surfaces on the third Tuesday of months ending in “R.”

I think about that every time I hear a data team say “let’s just rebuild it.”

The sentence that starts every failed data project

I’ve been on data teams long enough to recognise the pattern. It always starts the same way. Someone — usually someone new, often someone senior — opens a dimensional model they didn’t build, scrolls through a few hundred lines of transformation logic, and says the words that should make every data leader’s blood run cold:

“This is a mess. We need to start from scratch.”

And look, I get it. I’ve felt that impulse myself, and likely have said the words myself. You open a fact table with 200 columns. You find bridge tables that reference other bridge tables. You discover a slowly-changing dimension nested inside another slowly-changing dimension, and you think: who built this? What were they thinking?

But here’s what I’ve learned the hard way, across multiple teams and more warehouse migrations than I care to count: they were thinking about the business. That bizarre WHERE clause filtering out both “Unknown” and “unknown”? That’s not sloppy code. That’s a case-sensitivity bug someone found in production data from a source system that nobody controlled. The seemingly redundant join that adds three seconds to your query? It handles a quarterly reconciliation edge case that cost the finance team two days of manual work before someone encoded the fix.

That fact table with 200 columns isn’t a design failure. It’s an accurate representation of a business that has 200 things it needs to measure. The “clean” replacement model will eventually have 200 columns too — they’ll just have different names and it’ll take you eighteen months to figure out why you need them all.

What data teams keep forgetting

This isn’t about browsers or even about code quality. It was about knowledge. When you throw away a codebase and start fresh, you’re not just discarding syntax. You’re discarding years of accumulated understanding about how the real world actually behaves — understanding that was earned through production incidents, user complaints, and painful debugging sessions.

Every collected bug fix in that old code represents something learned. Each fix might be just one line, a couple of characters even, but a lot of work and time went into figuring out those two characters were needed. And that knowledge — the why behind the fix — almost never makes it into documentation. It lives in the code itself, or it lives nowhere.

This is doubly true for data systems. A software application has unit tests, integration tests, user acceptance testing. A data model has… what exactly? Row counts? Spot checks? A business user who eyeballs the dashboard and says “yeah, that looks about right”? The knowledge embedded in a mature data warehouse is far more fragile than application code, because the testing infrastructure around it is almost always weaker. Throw it away and you’re not just rebuilding a codebase — you’re rebuilding an institutional memory that was never written down in the first place.

Fred Brooks saw this coming fifty years ago. His “Second System Effect” from The Mythical Man-Month describes what happens when an engineer builds their second version of something: they over-design it. All the features they wisely deferred from the first version, all the architectural improvements they dreamed about, all the “if only we’d done it this way” ideas — they dump everything into the replacement. Brooks noted that a designer’s first system tends to be spare and clean because they know their limitations. The second system becomes a dumping ground for ambition.

Sound familiar? Every data warehouse rebuild I’ve witnessed follows this arc. The team doesn’t just want to replicate what exists — they want to add a semantic layer, implement a medallion architecture, introduce data contracts, add real-time streaming, switch to Data Vault, build a self-serve analytics platform, and migrate to a new cloud provider. All at once. In a single initiative.

And then they wonder why, two years later, they’ve delivered nothing and the business is still running reports off the “legacy” warehouse that was supposed to be decommissioned eighteen months ago.

The Teradata-to-Snowflake reality check

If you want to see the rebuild-vs-refactor debate play out in real time, watch a Teradata-to-Snowflake migration. The pattern is remarkably consistent.

The pitch is compelling: Snowflake is cheaper, scales elastically, separates compute from storage, and runs standard SQL. Moving from Teradata should be straightforward — convert the SQL, move the data, validate the results. Easy, right?

Roland Wenzlofsky, a Snowflake Solutions Architect, wrote about this in early 2026 and his observations line up perfectly with what I’ve seen. Most organisations approach these migrations as a translation exercise. The technical migration succeeds. Then the first quarterly cloud bill arrives at double the projected budget. Dashboards that loaded in seconds now queue for twenty minutes during batch windows. Pipelines that ran in 45 minutes on Teradata take five hours on Snowflake.

The problem isn’t Snowflake. The problem is that Teradata professionals carry assumptions into Snowflake that aren’t just incomplete — they’re actively counterproductive. Distribution keys, join strategies, indexing patterns — all the hard-won optimisation knowledge from Teradata becomes a liability in a platform built on fundamentally different architecture.

One of the largest Teradata-to-Snowflake migrations in North America involved 1.5 petabytes across 600 databases and roughly 45,000 objects. Before they could even begin, the team had to inventory every orphan object and document every behavioural difference between platforms. That’s not a “lift and shift.” That’s an archaeological dig.

But here’s what I find most telling: even when the migration succeeds technically, the data quality problems survive the move perfectly intact. William Flaiz documented a healthcare organisation that spent $1.8 million migrating from Siebel to Salesforce. Not a single record lost. Zero downtime. Perfect data mapping. Post-migration, the sales team still couldn’t run accurate pipeline reports. The “opportunity stage” field contained 89 different values for what should have been six standard stages — including 1,247 records with the typo “Closeing Soon.” That typo migrated perfectly to the new $2.3 million platform. As Flaiz put it: when performance is bad, we assume the technology is the limiting factor. Data quality problems are messier. They implicate people, processes, training gaps, and years of accumulated shortcuts.

New platform, same chaos. Because the platform was never the problem.

Chesterton’s Fence, or: why you shouldn’t delete that WHERE clause

There’s a principle in philosophy that every data engineer should have tattooed somewhere visible. G.K. Chesterton wrote it in 1929, and it goes roughly like this: if you come across a fence in the middle of a field and can’t see what purpose it serves, don’t tear it down. Go away and figure out why someone built it. Once you understand the reason, then you can decide whether it still needs to be there.

In data engineering, the fences are everywhere. That WHERE status NOT IN ('Unknown', 'unknown', 'UNKNOWN') clause. That filter excluding records from a specific date range in 2019. The join to a reference table that only has 12 rows and hasn’t been updated in three years. They all look pointless until you remove one and discover that the finance reconciliation breaks, or that a regulatory report starts including test transactions that were supposed to be filtered out, or that a dashboard starts showing a revenue spike from a data migration artifact that happened four years ago.

Hyrum Wright — formerly at Google, now at Adobe — formalised a related idea that’s become known as Hyrum’s Law: with enough users of an API, every observable behaviour will be depended on by somebody, regardless of what the documentation promises. For data systems, this is an absolute nightmare during rebuilds. Your downstream consumers don’t just depend on documented schemas. They depend on output ordering, null handling patterns, timestamp precision, and format quirks that were never specified anywhere. You rebuild the pipeline, and suddenly a report that’s worked for three years breaks — not because the data is wrong, but because the columns come back in a different order and someone’s Excel macro was hardcoded to column positions.

I’ve seen teams spend more time debugging these “invisible contracts” after a rebuild than they would have spent refactoring the original system over two years.

The psychology that makes us do it anyway

So if the evidence against big-bang rewrites is this overwhelming — and it is, from Spolsky to Brooks to McKinsey studies showing that large IT projects run 45% over budget and deliver 56% less value than predicted — why do smart, experienced data teams keep proposing them?

Because the impulse to rebuild isn’t rational. It’s psychological. And once you understand the cognitive biases at play, you start seeing them everywhere.

The planning fallacy is the first culprit. Kahneman and Tversky showed that humans systematically underestimate how long tasks will take, even when they have direct experience with similar tasks that took longer than expected. In controlled studies, only 13% of subjects finished their project by the time they’d assigned a 50% probability of completion. The Sydney Opera House was estimated at AU$7 million and four years; it was delivered at AU$102 million and fourteen years. Every data warehouse rebuild proposal I’ve seen exhibits this same delusional optimism. “Six months, maybe nine.” It’s never six months.

And it gets worse, because of what one writer calls the Achilles Paradox of rewrites: while you’re building the new version, features keep getting added to the old one. The business doesn’t stop generating requirements just because you’ve decided to rebuild. So the target keeps moving. The new system is perpetually six months from matching the functionality of the old one.

Not Invented Here syndrome is the second bias, and it’s the one that nobody wants to admit. Katz and Allen’s research on R&D project groups found that teams with stable composition develop a decreasing ability to absorb ideas from outside the group over time. They start believing they possess a monopoly on knowledge in their domain. The diagnostic sign? What researchers call “thought-terminating clichés” — phrases like “we already tried that” or “our situation is different.”

In data teams, NIH manifests as building custom orchestration frameworks instead of using Airflow. Writing bespoke quality checks instead of adopting Great Expectations or dbt tests. Designing proprietary transformation layers instead of using tools that thousands of other teams have battle-tested. And most relevantly: insisting that the existing data model is fundamentally broken and needs to be replaced with something designed in-house from the ground up.

The new leader rewrite is the third pattern, and it might be the most destructive. A new head of data arrives. They look at the legacy warehouse they’ve inherited. They don’t understand the history behind any of the design decisions. They don’t know about the regulatory edge case that explains the weird date filter, or the source system quirk that necessitates the redundant join. All they see is complexity that they didn’t create — and complexity you didn’t create always looks worse than complexity you did.

So they propose a rebuild. It’s partly strategic — they want to put their stamp on the architecture. And it’s partly genuine — they really do think they can do better. But they’re falling prey to the same illusion Spolsky identified: reading code is harder than writing it, and unfamiliar code always looks worse than it is.

There’s a Stack Overflow survey finding that sits underneath all of this: “feeling unproductive” was the number one cause of developer unhappiness at 45%. Working with legacy systems is slow. It’s frustrating. Progress is incremental and often invisible. A greenfield rebuild, by contrast, feels amazing — for the first few months. You’re making decisions, building things, moving fast. The architecture is clean and the tests pass and the world makes sense. And then reality seeps in, the edge cases accumulate, and eighteen months later you’ve got a codebase that looks suspiciously like the one you replaced.

The complexity was always essential

Fred Brooks drew a distinction that I think about constantly. He separated essential complexity — complexity that’s inherent to the problem domain and can’t be removed by better engineering — from accidental complexity, which comes from poor implementation choices and can theoretically be eliminated.

The truth about mature data warehouses: most of the complexity is essential. It’s not there because the original engineers were incompetent. It’s there because the business is genuinely that complicated.

The transformation handling fifteen date formats? That’s because you have fifteen source systems and none of them agree on how to represent a date. The slowly-changing dimension with Type 2 tracking on attributes that seem trivial? Someone in compliance needed audit trails on those fields after a regulatory inquiry. The bridge table that connects customers to accounts through an intermediate entity that nobody can explain? It models a many-to-many relationship that emerged when the company acquired a subsidiary with a different customer hierarchy.

A rebuild won’t make this complexity disappear. It’ll just redistribute it. Instead of one tangled fact table, you’ll have twelve microservices each handling a piece of the logic. Instead of one confusing WHERE clause, you’ll have business rules scattered across a semantic layer, a transformation layer, and a data quality framework. The total complexity will be identical — or worse, because now it’s distributed across more systems with more failure modes.

Ward Cunningham — the person who coined the term “technical debt” — has said he wishes he’d used the word “opportunity” instead. His original metaphor wasn’t about sloppy code at all. It was about the gap between your code’s current model of the problem domain and your team’s evolved understanding of that domain. Technical debt, properly understood, is learning that hasn’t been applied yet. The legacy data model doesn’t represent failure. It represents the team’s best understanding at the time it was built, plus every correction that production reality demanded afterward.

Refactoring respects this. It asks: which parts of this complexity are essential (keep them, clarify them, test them) and which parts are accidental (remove them incrementally, one safe step at a time)? It preserves institutional knowledge while improving structure. It delivers value continuously instead of asking the business to wait years for a payoff that — statistically, based on everything we know about large-scale IT projects — probably won’t arrive.

So what do you do instead?

I’m going to save the detailed playbook for Part II of this series, but the short version is this: you refactor. Methodically. Incrementally. With tests.

Martin Fowler and Pramod Sadalage proved that database refactoring is possible back in 2006, when conventional wisdom said it couldn’t be done. Their principle was simple: make each change as small as possible, because the pain of integration increases exponentially with the size of the integration.

dbt has turned this into a practical workflow for analytics engineering teams. Bring your legacy SQL in unchanged. Wrap it in a model. Verify the output matches. Then — and only then — start decomposing. Extract CTEs. Introduce staging layers. Build tests. Refactor one model at a time, auditing each change against the original output. It’s not glamorous. It doesn’t let you redesign everything from first principles. But it works, and it works without asking the business to lose access to their reports for six months.

The Strangler Fig pattern, the Write-Audit-Publish pattern, the Expand and Contract pattern — these are all variations on the same theme: replace incrementally, verify continuously, and never throw away working logic until you’ve proved the replacement handles every case the original did.

I’ll dig into each of these in Part II. For now, the point is simpler.

The scars are data

When I look at a legacy data model — really look at it, with patience and without ego — I don’t see a mess anymore. I see a record of everything a business has learned about itself. Every weird join is a relationship somebody fought to understand. Every cryptic transformation is a business rule that was discovered through painful experience. Every seemingly arbitrary filter is a production incident that someone fixed at some point, probably at an hour they’d rather not remember.

The Netscape rewrite remains one of the most studied failures in software history. And yet here we are, in 2026, watching data teams propose the same mistake — just with different technology names. Teradata becomes Snowflake. Informatica becomes dbt. The on-prem warehouse becomes the cloud lakehouse. The pitch changes, but the impulse doesn’t: tear it down, start over, do it right this time.

My view? Your data model isn’t broken. It’s battle-tested. It’s messy because reality is messy. And the right response to inherited complexity isn’t demolition — it’s archaeology. Understand what’s there. Understand why it’s there. Then improve it, one careful change at a time.

The scars in your data model aren’t flaws. They’re knowledge. Treat them accordingly.

Chris Hillman