The Duct Tape Data Engineer
The Engineer Who Ships
I want to tell you about a data engineer I worked with. Let’s call her Sarah.
Sarah had a reputation. When business stakeholders had an urgent question—the kind that arrives at 4 PM on a Friday with the CEO’s name in the subject line—they went to Sarah. Not to the senior architect with the impeccable data model. Not to the platform team with their carefully orchestrated Airflow DAGs. They went to Sarah.
Her code wasn’t pretty. She’d pull data from production databases with SQL queries that made the architecture team wince. She’d transform things in Python scripts that violated every design pattern in the book. She’d dump results into Excel because that’s where the CFO could actually use them.
And here’s the thing: Sarah shipped. Every time. On time. With answers that moved the business forward.
At the time, I thought Sarah was the best data engineer I’d ever worked with. I’m less certain now.
Sarah is what I would call a “duct tape programmer.” Someone who ships working solutions while others are still debating architectures. In the moment, there’s no one you’d rather have on your team.
But moments pass. And the code remains.
Three Types of Data Engineer
I’ve come to believe there are three distinct approaches to data engineering, and understanding the differences matters more than most of us realise.
The Architecture Astronaut builds for a future that never arrives. They spend months perfecting data contracts and tenant-level data mesh architecture. They have governance frameworks, schema registries, and architectural purity that would make Zhamak Dehghani proud. What they don’t have is a single stakeholder whose problem they’ve solved.
The Duct Tape Engineer ships at all costs. They’re Sarah at 4 PM on Friday—pulling data from production, hacking together Python scripts, dumping results into Excel. They solve problems. They move fast. They leave a trail of undocumented queries and mystery scripts in their wake.
The Foundational Pragmatist does something different. They invest in the foundations that make future speed possible—proper dimensional models, conformed dimensions, clean grain definitions—and then they duct tape on top of that prepared surface. They ship fast because they built well, not instead of building well.
The industry talks endlessly about the first two. The conferences are full of astronauts evangelising the latest architecture. The war stories celebrate the duct tape heroes who saved the day. But the engineers who actually deliver sustained value? They’re usually the third type, and they’re rarely on stage.
The Architecture Astronaut Visits Your Desk
You know this person. They show up at your desk, coffee in hand, ready to explain why your simple ETL pipeline is fundamentally wrong.
“What you really need,” they say, eyes lighting up, “is a proper lakehouse architecture. Bronze, silver, gold layers. Delta Lake for ACID compliance. Apache Kafka for real-time ingestion. And have you considered data mesh? Each domain team should own their own data products with standardized contracts and federated governance.”
You nod politely, but inside you’re doing the math. Your company has three data sources. Your biggest table has 50 million rows. Your most demanding stakeholder needs a weekly report. And this person wants you to implement an architecture designed for companies processing petabytes per day.
They’re the over-engineering colleague who “comes up to your desk, coffee mug in hand, and starts rattling on about how if you implement event-driven architecture with a proper schema registry, your pipeline will be 34% more scalable.” Scalable for what? You’re processing 100MB of data. On a good day.
The architecture astronaut isn’t stupid. They’re often brilliant. They’ve read every blog post, attended every conference, mastered every tool. They can diagram a system that would make Netflix jealous.
But they have a disease: they’re allergic to shipping.
They’d rather spend six months perfecting a medallion architecture than six hours writing the SQL query that answers the CEO’s question. They optimise for theoretical elegance at the expense of actual impact. And they leave a trail of half-finished platforms in their wake.
A Tale of Two Teams
Let me tell you about two data teams I’ve observed over the years.
The first was building the future. Data contracts. Federated governance. Tenant-level data mesh with domain ownership and self-serve capabilities. They had architecture review boards, schema evolution policies, and compatibility matrices. Every decision went through rigorous evaluation. Every interface was documented exhaustively.
Three years in, they were still perfecting their approach. The platform team had become a bureaucracy that produced governance documents, not insights. Architecture astronauts, every one of them.
The second team took a different path. They weren’t duct tape engineers—they didn’t hack things together and hope for the best. Instead, they invested in a proper dimensional model. Conformed dimensions. Clean fact tables. Well-defined grain. Nothing fancy. No distributed systems. No streaming. Just solid data modeling fundamentals executed well.
When the general manager asked a question, they answered it in minutes. Not because they had magical technology, but because the foundation was sound. The conformed customer dimension meant they didn’t have to reconcile six different definitions of “active customer.” The properly-grained fact tables meant they could slice and dice without fear of double-counting. The slowly changing dimensions meant they could answer “what did this look like last quarter?” without archaeological expeditions through backup tapes.
One team had years of architectural innovation with nothing to show for it. The other had basic dimensional modeling and could answer almost anything.
The difference wasn’t intelligence or effort. It was understanding which investments actually compound. The second team wasn’t avoiding architecture—they were choosing the right architecture. The kind that creates a foundation for speed rather than a monument to complexity.
The 85% Failure Rate
But wait, you might say. Surely all these complex architectures exist for a reason. Surely the industry’s best minds have converged on data mesh and lakehouses because they work.
In 2015, Gartner estimated that most big data projects would fail. By 2017, analyst Nick Heudecker admitted they’d been “too conservative.” The actual failure rate was higher—in the 80% range. Most data migrations fail or exceed their budget and scope.
Eighty percent. In what other engineering discipline would we accept failure rates like that? In what other field would we keep advocating for the same approaches that fail four out of five times?
The UK’s National Health Service attempted to centralise patient records in one of the largest data projects ever undertaken. They flushed $15 billion down the drain. Fifteen billion dollars. And they got nothing.
Lyft spent years migrating to Hive. By the time they finished in 2020, the entire industry had moved on. They ended up stuck on an outdated system, having invested countless engineering hours chasing an architecture that was already obsolete.
These aren’t exceptions. They’re the pattern. Complex architectures fail. Simple foundations, built well, endure.
The Philosophy of Worse
This tension isn’t new. Richard Gabriel identified it in 1989 with his famous essay “Worse is Better.”
The core idea is counterintuitive: “Implementation simplicity has highest priority.” Not correctness. Not completeness. Not elegance. Simplicity.
Gabriel argued that “it is better to get half of the right thing available so that it spreads like a virus. Once people are hooked on it, take the time to improve it to 90% of the right thing.”
The four principles he outlined still apply:
- Simplicity: The system should be simple above all else.
- Correctness: It’s slightly better to be simple than correct.
- Consistency: Can be sacrificed for simplicity.
- Completeness: Can be sacrificed in favour of any other quality.
This feels wrong to engineers. We’re trained to build things correctly. We’re judged on technical sophistication. Our peers admire elegant architectures and comprehensive solutions.
But Gabriel was describing how Unix beat more sophisticated operating systems. How C beat more powerful languages. How TCP/IP beat more theoretically correct protocols. The simpler, “worse” solutions won because they shipped, spread, and evolved.
John Gall formalised the pattern in 1975: “A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.”
Don’t architect a lakehouse from scratch. Start with a simple pipeline that works. Evolve it as you understand your actual requirements. That’s not compromise—that’s how successful systems are built.
The Foundation That Enables Speed
Here’s where the duct tape philosophy goes wrong: it assumes all architecture is overhead.
I believe strongly in dimensional modeling. Not because it’s elegant—though it is. Not because it’s theoretically pure—though it can be. But because when a proper dimensional model exists, I can answer the general manager’s Friday afternoon question in fifteen minutes instead of four hours.
The conformed dimension that took a week to build properly? It’s saved me hundreds of hours of query writing since. The fact table with clean grain definitions? It means I’m not debugging data quality issues at 11 PM.
This is what separates the foundational pragmatist from the pure duct tape engineer. Sarah could answer Friday’s question. But she’d have to re-derive the customer definition from scratch every time. She’d spend hours reconciling data that a conformed dimension would have reconciled once. She could produce 3 reports but they might differ in definitions of what a customer is.
The architecture astronaut spends six months building infrastructure nobody uses. The duct tape engineer hacks together solutions that become unmaintainable. The foundational pragmatist invests two weeks in dimensional modeling and creates the conditions for rapid, reliable delivery forever after.
The question isn’t whether to invest in architecture. It’s which architecture and when. A solid dimensional model? Almost always worth it. A lakehouse for your 50GB dataset? Probably not. Data contracts and federated governance for a team of three? You’re solving problems you don’t have.
That team I mentioned—the one answering questions in minutes—they weren’t magicians. They’d done the foundational work. They had clean dimensions. They had documented grain. They had conformed definitions.
Their duct tape stuck better because they’d prepared the surface.
The Maturity Gradient
There’s a progression that the conference talks never mention:
Monthly reports → Weekly reports → Daily pipelines → Micro-batches → Streaming
Each step leftward costs exponentially more in complexity. Each step rightward gives you exponentially more time to get the quality right.
When I build something new, I start as far right as the business can tolerate. Monthly reporting sounds unsexy, but it gives you thirty days to catch quality issues before anyone sees bad data. You have time to validate, to cross-check, to build confidence.
Once monthly is bulletproof, moving to weekly is straightforward. You’ve already solved the hard problems—the business logic, the edge cases, the data quality rules. Now you’re just increasing frequency. Daily follows. Micro-batch follows.
Can you skip straight to real-time? Sure. I’ve seen it done. I’ve also seen the pain involved—debugging race conditions, explaining to stakeholders why the “real-time” dashboard showed impossible numbers, rebuilding pipelines from scratch because the streaming-first design couldn’t handle a schema change.
The streaming-first approach means you’re solving data quality, business logic, edge cases, AND distributed systems problems simultaneously. That’s a recipe for the 85% failure rate we talked about.
Ship the monthly report. Prove the logic. Harden the quality. Then earn your way to daily.
Duct Tape With a Sunset Date
Here’s what separates strategic pragmatism from technical negligence: the hardening plan.
Every quick solution I ship comes with an implicit contract. Yes, I’ll wrangle this data together to plug the compliance gap by Friday. Yes, I’ll build that dashboard for the legal team using queries that would make a data modeler weep. But written somewhere—in Jira, in a Confluence page, in my own notes—is the plan to do it properly.
The CFO’s urgent analysis gets a spreadsheet today. It gets a proper pipeline next quarter.
This isn’t compromise—it’s sequencing. The business gets value immediately. The team gets breathing room. And the quick solution becomes a specification for the permanent one. You’ve proven the logic works. You’ve validated the business need. Now you can invest in building it right, with far less risk than starting from scratch.
The failure mode isn’t duct tape itself. It’s duct tape without a sunset date. It’s the “temporary” query that’s still running five years later because nobody remembered to harden it. It’s technical debt accumulated without a repayment plan.
I’ve used this approach to protect my teams from massive pieces of work that would have consumed them for months. The compliance hole that needed plugging immediately? Duct tape with a ticket. The legal team’s urgent risk analysis? Duct tape with a documented plan. The high-value project that couldn’t wait for proper architecture? Duct tape with a sunset date.
The duct tape isn’t the problem. Forgetting to remove it is.
Sarah’s Legacy
Every organisation has Bob’s query.
You know the one. Bob wrote it three years ago. Bob left eighteen months ago. The query is still running every Tuesday morning, feeding a report that twelve people apparently depend on, documented nowhere, understood by no one.
Here’s the uncomfortable truth: Bob’s query is what Sarah’s Friday afternoon heroics look like three years later.
Sarah shipped. She solved the problem. The CEO got their answer. But she didn’t write the ticket. She didn’t document the logic. She didn’t create a sunset date. She moved on to the next fire, and the quick fix quietly became load-bearing infrastructure.
The person who inherited Sarah’s work doesn’t think she’s a hero. They think she’s the reason they’re debugging mystery scripts at 11 PM instead of building something new.
This is what duct tape looks like when the sunset date gets forgotten. It’s technical debt, pure and simple. A liability waiting to break at the worst possible moment.
But here’s the reframe: Sarah’s legacy is also an opportunity.
That query is a perfect project for a junior engineer. The risk is low—it’s already working, so they have a baseline to compare against. The scope is contained—one query, one output, one business purpose. They can pull it apart, understand the logic, document what they find, and rebuild it as a proper pipeline.
They learn data modeling. They learn testing. They learn documentation. They learn how to interview stakeholders about requirements that were never written down. They harden their skills in a safe environment while simultaneously paying down technical debt. This is what builds up Junior to Senior levels.
Every piece of Sarah’s legacy is both a problem to solve and a training ground to offer. The mature data engineer sees both. They create the ticket, assign the junior, and turn a liability into a development opportunity.
The debt gets paid. The junior grows. The organisation gets a documented, tested, maintainable pipeline instead of a mystery script.
That’s pragmatism with a long-term view.
The Pragmatist’s Manifesto
Here’s what the foundational pragmatist understands:
Shipping is a feature. The most important feature. Your data pipeline doesn’t exist until it’s serving production queries. Your data model doesn’t matter until it’s powering real decisions. “Your product must have it.”
The pipeline that ships insights beats the pipeline that ships nothing. A “50%-good solution that people actually have solves more problems and survives longer than a 99% solution that nobody has because it’s in your lab where you’re endlessly polishing the damn thing.”
But foundations compound. A solid dimensional model isn’t architecture astronaut behaviour—it’s the investment that makes all future delivery faster and more reliable. Build the foundation once, benefit forever.
Directionally correct beats precisely wrong. The dashboard that’s 80% accurate and delivered today creates more value than the perfectly governed data product delivered never.
Start with CSV, graduate to Parquet, earn your way to streaming. You don’t need real-time until you do. You don’t need a lakehouse until you do. You don’t need Kafka until you do. And you probably don’t.
Every shortcut needs a sunset. Ship the hack. Document the plan. Pay down the debt. Quick fixes without a hardening plan aren’t pragmatism—they’re negligence with extra steps.
Jamie Zawinski said it best: “At the end of the day, ship the fucking thing! It’s great to rewrite your code and make it cleaner and by the third time it’ll actually be pretty. But that’s not the point—you’re not here to write code; you’re here to ship products.”
For data engineers, the adaptation is obvious: You’re not here to build architectures. You’re here to ship insights. But the insights you ship today shouldn’t sabotage the insights you need to ship tomorrow.
The Caveat
Here’s where I need to be careful.
Pragmatic shortcuts aren’t an excuse for incompetence. “Duct tape programmers must be talented enough to pull it off.”
Sarah wasn’t shipping ugly code because she didn’t know better. She was making conscious trade-offs. She understood the implications of pulling from production databases. She knew how to write cleaner code—she just knew when not to bother.
Her failure wasn’t the duct tape. It was stopping there.
There’s a difference between strategic simplicity and reckless shortcuts. The foundational pragmatist understands the trade-offs. They know what they’re sacrificing and why it’s worth it. They can explain their decisions and defend them. And they write the ticket for the hardening work.
Netflix actually needs Kafka. Meta actually needs petabyte-scale systems. Google actually needs distributed query engines.
The question is whether your company does.
Most don’t. The data supports this overwhelmingly. But knowing when you’re the exception requires understanding the fundamentals. You can’t break the rules effectively until you know what the rules are.
So yes, learn the proper patterns. Understand data modeling. Know why slowly changing dimensions exist. Study the medallion architecture. Read about data mesh.
Then make conscious decisions about when to use them. Not because a blog post said so. Not because a vendor promised ROI. Not because that’s what the architecture astronaut wants.
Because they solve a problem you actually have.
The Engineer Who Holds All Three
Here’s what I’ve learned after years of watching these patterns play out: the best data engineers understand all three modes and know when each applies.
I read every blog post about data mesh. I attend the conferences. I evaluate new tools. I think constantly about how we might evolve our architecture. That’s not being an astronaut—that’s staying informed.
When my team faces a Friday afternoon crisis, I can be Sarah. I can hack together the solution, ship the answer, buy us breathing room. That’s not recklessness—that’s responsiveness.
But unlike Sarah, I write the ticket. I document the sunset date. And when we have the space, I advocate for the dimensional modeling that will make the next crisis easier to handle.
The architecture astronaut’s failure is pursuing elegance without urgency. The duct tape engineer’s failure is urgency without investment in foundations. The pure pragmatist ships fast on solid ground and pays down their debts.
I’ve watched teams spend years on data mesh with nothing to show for it. I’ve inherited Sarah’s legacy code and cursed her name at midnight. I’ve also seen teams with solid data models answer any question in minutes, year after year.
The difference wasn’t the sophistication of their approach—it was whether their investments actually compounded into capability, and whether their shortcuts came with expiration dates.
So yes, ship the dashboard. Answer the executive’s question. Plug the compliance hole today.
But write down the hardening plan. Build the dimensional model when you can. Start with monthly, earn your way to daily. Invest in the foundations that make future speed possible.
And when the junior engineer asks why Sarah’s query is still running after five years, hand them the ticket to fix it.
That’s the job.
