<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Engineering Leadership on Ghost in the data</title><link>https://ghostinthedata.info/tags/engineering-leadership/</link><description>Ghost in the data</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>Ghost in the data</copyright><lastBuildDate>Sat, 27 Jun 2026 09:00:00 +1000</lastBuildDate><atom:link href="https://ghostinthedata.info/tags/engineering-leadership/index.xml" rel="self" type="application/rss+xml"/><item><title>Your Team Already Has Patterns. They Just Don't Know It.</title><link>https://ghostinthedata.info/posts/2026/2026-06-27-pattern-bank/</link><pubDate>Sat, 27 Jun 2026 09:00:00 +1000</pubDate><guid>https://ghostinthedata.info/posts/2026/2026-06-27-pattern-bank/</guid><author>Chris Hillman</author><description>A pattern bank gives your data engineering team a shared vocabulary for how work gets done — and the confidence to estimate it. Here's how to build one collaboratively, without turning it into a bureaucratic exercise.</description><content:encoded>&lt;p>When I started a new role, one of the first things I did was try to understand how data moved through the system. Not the dashboards, not the data models — the pipes. Where did things come from? How did they get in? What happened to them along the way?&lt;/p>
&lt;p>There were somewhere between twenty and thirty source systems feeding the platform. Not a massive number, but enough to tell a story when you looked at the ingestion layer all at once. What I found was that all the pipelines had originated from two base templates. A sensible starting point. The kind of thing a small team puts in place early to stop complete chaos.&lt;/p>
&lt;p>But over time, as new sources were onboarded by different engineers under different pressures, the implementations had drifted from those templates in ways that were hard to track. Some pipelines were applying data transformations before data even landed in Snowflake — so the raw layer wasn&amp;rsquo;t actually raw. Some were enforcing data contracts at the landing stage, before the data had been typed, which created problems downstream when contract violations turned up in places the pipeline wasn&amp;rsquo;t designed to handle them. And there was no consistent approach to manifest file checking — verifying that the file you received matched what the source said it sent — which is exactly the kind of gap that sits quietly until a source delivers a partial file on a bad day and nobody notices until a stakeholder does.&lt;/p>
&lt;p>None of this was catastrophic. I want to be clear about that. I&amp;rsquo;ve seen far worse — environments where the ingestion layer is a genuine free-for-all with no common ancestry at all. By comparison, this was a B+. The bones were solid. There was a lineage to the templates you could trace, and the engineers who&amp;rsquo;d built on them had generally been trying to do the right thing. But there was drift, and where there&amp;rsquo;s drift there are gaps, and gaps have a way of becoming expensive at the worst possible moments.&lt;/p>
&lt;p>The flip side was interesting. The transformation layer and the serving patterns were genuinely robust. Whoever had shaped those had been thinking in terms of reusable approaches from the beginning. SCD2 loads were SCD2 loads. Reporting tables followed a consistent structure. That part of the platform had a grammar to it, even if nobody had written it down.&lt;/p>
&lt;p>The contrast between the two layers was instructive. And it surfaced a question I keep coming back to: what would it look like if the whole team had a shared vocabulary for the work they were doing — not just in transformation, but across the whole pipeline lifecycle? Not a rigid rulebook. Not a standards document that nobody reads. Just a common language, agreed on together, that made the implicit explicit.&lt;/p>
&lt;p>That&amp;rsquo;s what a pattern bank is. And this article is about how to build one.&lt;/p>
&lt;hr>
&lt;h3 id="the-problem-that-nobody-names">The problem that nobody names&lt;/h3>
&lt;/br>
&lt;p>Before we get to the solution, it&amp;rsquo;s worth naming what actually breaks when there&amp;rsquo;s no shared pattern vocabulary on a data engineering team.&lt;/p>
&lt;p>The most obvious symptom is inconsistency. When every engineer approaches recurring problems from first principles, you end up with a codebase that looks like it was built by ten different people — because it was. Code reviews become debates about form rather than substance. A new team member joins and spends their first month just trying to understand why the SFTP ingestion from System A looks nothing like the SFTP ingestion from System B, even though they&amp;rsquo;re structurally the same thing. The answer — &amp;ldquo;that&amp;rsquo;s just how it grew&amp;rdquo; — isn&amp;rsquo;t satisfying, and it doesn&amp;rsquo;t help them build.&lt;/p>
&lt;p>But inconsistency isn&amp;rsquo;t the thing that hurts most. The thing that hurts most is estimation.&lt;/p>
&lt;p>Oe of the early challenges was quoting projects to internal customers — duration, cost, team size. The kind of thing stakeholders reasonably want to know before they commit. And without a clear picture of which parts of a solution were known quantities versus genuinely novel territory, those conversations were uncomfortable. You could make an educated guess, but you couldn&amp;rsquo;t anchor it to anything concrete. Each project felt like a bespoke exercise, even when large parts of it were things the team had done before.&lt;/p>
&lt;p>The problem has a name in project planning circles: unknown unknowns. The things you don&amp;rsquo;t know you don&amp;rsquo;t know. The hidden surprises that don&amp;rsquo;t appear in any upfront estimate because you haven&amp;rsquo;t recognised them yet. Pattern reuse directly attacks this. When you&amp;rsquo;ve built the same ingestion type before, you know roughly how long it takes, where the edge cases live, and what tends to go wrong. You&amp;rsquo;ve converted some of the unknown unknowns into known quantities — and that changes everything about how you plan.&lt;/p>
&lt;p>There was one more piece that crystallised why the pattern bank mattered. The technical architects were responsible for designing how source systems should serve data to the platform. They sat upstream of the data team, making decisions about formats, frequency, delivery mechanisms. But they didn&amp;rsquo;t have a clear picture of what happened once data arrived — what the ingestion layer expected, what would make the data engineer&amp;rsquo;s life easy versus hard, what patterns the team was already working with.&lt;/p>
&lt;p>The pattern bank, as it developed, became a communication artefact as much as an engineering one. It made the platform legible to people who sat outside it. Here is how we ingest data. Here is what we need from you when you&amp;rsquo;re designing a new source feed. Here is why the way you&amp;rsquo;re proposing to serve this data is going to create problems downstream.&lt;/p>
&lt;p>That transparency was worth as much as the internal consistency.&lt;/p>
&lt;hr>
&lt;h3 id="what-a-pattern-bank-actually-is">What a pattern bank actually is&lt;/h3>
&lt;/br>
&lt;p>The simplest definition: a pattern bank is a catalogue of reusable solution approaches that your team draws from when designing new work.&lt;/p>
&lt;p>It&amp;rsquo;s not a code library — though patterns might eventually generate template code. It&amp;rsquo;s not a runbook — those are step-by-step guides for specific incidents. It&amp;rsquo;s not a playbook — those describe how to handle particular scenarios. A pattern bank operates at a higher level of abstraction than all of those. It describes the &lt;em>shape&lt;/em> of a solution, not the &lt;em>substance&lt;/em> of a specific implementation.&lt;/p>
&lt;p>Think of it like a recipe catalogue. When you&amp;rsquo;re planning a dinner party, you choose recipes based on what you&amp;rsquo;re cooking for and what ingredients you have. You don&amp;rsquo;t invent a new cooking method each time — you draw from a repertoire. The pattern bank is the repertoire. Each new project is the dinner.&lt;/p>
&lt;p>For a data engineering team, the pattern bank tends to organise naturally into three layers that follow how data moves through a platform:&lt;/p>
&lt;p>&lt;strong>Ingestion patterns&lt;/strong> cover how data gets from a source into your environment. This might be a direct file transfer from SFTP, an API pull, an event stream, a database replication — the mechanics differ, but each is a recognisable type. At the ingestion layer, the goal is usually to get data as close to your transformation environment as possible in its raw form, before worrying about types or structure. Staging — the step where you impose data types and light transformations — often lives here too, and it&amp;rsquo;s worth treating it as a distinct sub-pattern.&lt;/p>
&lt;p>&lt;strong>Transformation patterns&lt;/strong> cover what happens to data once it&amp;rsquo;s in. SCD1 for current-state overwrite, SCD2 for full history with row versioning, transactional loads for append-only fact tables, reference table management, bi-temporal designs for systems that need to track both valid time and transaction time. These patterns are where most of the design complexity lives, and they&amp;rsquo;re often the most stable part of a mature pattern bank because the underlying approaches have been well-understood for decades.&lt;/p>
&lt;p>&lt;strong>Delivery patterns&lt;/strong> cover how processed data gets to consumers. A flat file export to a third-party system, a reporting aggregate table consumed by a BI tool, a real-time serving layer for an operational dashboard — each is a pattern with its own shape, its own testing considerations, its own failure modes.&lt;/p>
&lt;p>The power of this structure is in what it enables at design time. When a new project lands, you&amp;rsquo;re no longer starting with a blank page. You&amp;rsquo;re asking: which ingestion pattern fits this source? Which transformation pattern fits this business requirement? Which delivery pattern fits this consumer? You pick from the catalogue, combine them into a solution design, and — critically — you know what you&amp;rsquo;re working with because you&amp;rsquo;ve worked with it before.&lt;/p>
&lt;p>&lt;img src="https://ghostinthedata.info/posts/2026/2026-06-27-pattern-bank/images/pattern-bank-stage-menu.svg" alt="A solution design is one pattern from each stage, with shared tails factored into sub-patterns">&lt;/p>
&lt;p>&lt;em>One recipe, one pattern from each stage. The amber path is a single solution design; the grey fan into the shared sub-pattern is the factoring at work.&lt;/em>&lt;/p>
&lt;p>One instance of an approach is a one-off solution. Two instances is a coincidence. Three instances is a pattern worth documenting. That&amp;rsquo;s the threshold I&amp;rsquo;d suggest for adding something to the bank — build it twice, recognise it on the third, name it and write it down.&lt;/p>
&lt;hr>
&lt;h3 id="what-it-should-not-be">What it should not be&lt;/h3>
&lt;/br>
&lt;p>The counter-example that keeps me honest about what a pattern bank should &lt;em>not&lt;/em> be.&lt;/p>
&lt;p>I&amp;rsquo;ve seen pattern banks that went too far. Not too far in terms of coverage — too far in terms of specificity. Every combination of source type and transformation type and delivery type had its own entry. Every edge case was enumerated. Every variant had its own named pattern. The catalogue was exhaustive in the worst sense of the word.&lt;/p>
&lt;p>The result was a pattern bank that engineers couldn&amp;rsquo;t use without first spending an hour navigating it. The patterns were so granular that applying one felt like following a script rather than making a design decision. Worse — they were so specific that anything slightly different from the documented pattern triggered a governance conversation about whether to create a new entry or adapt an existing one. The bank had become bureaucracy wearing an engineering costume.&lt;/p>
&lt;p>A pattern bank should describe the shape of a solution clearly enough that a mid-level engineer can use it to make a design decision without asking five follow-up questions. If your patterns require a full implementation guide to be useful, they&amp;rsquo;re not patterns — they&amp;rsquo;re procedures. Keep them abstract enough to cover genuine variants. Keep them concrete enough to actually guide work. The test is whether someone can look at a pattern and understand &lt;em>what kind of problem it solves&lt;/em> within about thirty seconds.&lt;/p>
&lt;p>The three-layer structure helps with this. When you keep ingestion patterns, transformation patterns, and delivery patterns separate, each pattern is already scoped to a defined part of the problem. That&amp;rsquo;s enough structure. Within each pattern, describe what problem it solves, when to use it, when &lt;em>not&lt;/em> to use it, and roughly what it involves. That&amp;rsquo;s the minimum viable pattern documentation. A short note on known effort range doesn&amp;rsquo;t hurt either.&lt;/p>
&lt;p>Resist the urge to enumerate every combination. The recipe catalogue analogy holds: a good recipe book teaches you to cook, not to follow instructions. The patterns should develop your team&amp;rsquo;s design intuition, not replace it.&lt;/p>
&lt;p>There&amp;rsquo;s also a structural cure for the combinatorial urge, and it took me longer than it should have to see it: factor the convergence points out. When several patterns funnel into the same mechanism — say, five different ingestion types that all end in the same landing-and-staging steps — that shared tail isn&amp;rsquo;t a reason to write five longer entries, and it&amp;rsquo;s certainly not a reason to write twenty-five combined ones. It&amp;rsquo;s a sub-pattern. Give it one entry of its own and have the parent patterns reference it.&lt;/p>
&lt;p>On the platform I described at the start, every ingestion pattern — file drops, API pulls, database replication, direct pushes — converged on a single landing mechanism into the warehouse. Once we pulled that out as its own sub-pattern, each parent entry lost about a third of its length and gained a crisp ending: &amp;ldquo;hands over to the landing sub-pattern.&amp;rdquo; And when the landing mechanism later needed a change, it was one edit instead of five. Factoring is how you keep the bank small while the platform grows.&lt;/p>
&lt;hr>
&lt;h3 id="you-cannot-build-this-alone">You cannot build this alone&lt;/h3>
&lt;/br>
&lt;p>Most writing about pattern banks covers what they should contain and then leaves you to figure out how to get your team to actually care about it.&lt;/p>
&lt;p>The truth is that a pattern bank built by one person — even a thoughtful, experienced person — will struggle to gain traction. Not because the patterns are wrong, but because nobody else owns them. When something doesn&amp;rsquo;t quite fit a pattern from the bank, the engineer&amp;rsquo;s instinct is to work around it rather than raise the question, because the bank feels like someone else&amp;rsquo;s thing. The patterns were handed down; they weren&amp;rsquo;t grown.&lt;/p>
&lt;p>The irony is that the teams who need a pattern bank most urgently — teams with inconsistent approaches, estimation problems, high bus-factor risk — are also the teams most likely to resist a top-down version of one. Engineers who&amp;rsquo;ve been doing the work their way for two years don&amp;rsquo;t want to be told their approach is being standardised out of existence.&lt;/p>
&lt;p>So you have to build it differently.&lt;/p>
&lt;p>The starting point is an audit, not a design. Before you introduce a pattern bank as a concept, spend time with your team&amp;rsquo;s existing work. Look at what&amp;rsquo;s been built over the last six to twelve months. Look for the shapes in it. Which approaches keep appearing? Where did engineers independently converge on similar solutions? Where did they diverge on problems that are structurally the same?&lt;/p>
&lt;p>This is the part where you&amp;rsquo;re listening, not speaking. You&amp;rsquo;re mapping the territory.&lt;/p>
&lt;p>When you bring the team together for the first conversation about patterns, the framing matters enormously. The worst version of that conversation starts with: &amp;ldquo;I&amp;rsquo;ve been looking at how we do things, and I think we should standardise on some approaches.&amp;rdquo; Even if that&amp;rsquo;s true and well-intentioned, it positions you as the author and everyone else as the audience.&lt;/p>
&lt;p>The better version starts with a question: &amp;ldquo;What do we keep building over and over?&amp;rdquo; Give people a chance to name the recurring shapes in their own work. You&amp;rsquo;ll find that engineers who&amp;rsquo;ve been on the team for a while can articulate these patterns clearly — they just haven&amp;rsquo;t had a reason to before. When they name a pattern, write it down. Give it back to them as documentation. That&amp;rsquo;s their knowledge, made visible.&lt;/p>
&lt;p>When you name what already exists rather than imposing what should exist, the dynamic shifts. You&amp;rsquo;re not standardising their work. You&amp;rsquo;re recognising it.&lt;/p>
&lt;p>This is the buy-in mechanism. The process of building the pattern bank &lt;em>together&lt;/em> is the adoption mechanism. When an engineer contributed to the definition of a pattern, they&amp;rsquo;ll defend it, refine it, and reach for it naturally in design conversations. When they received it from above, they&amp;rsquo;ll comply with it when they remember to — and quietly work around it when something doesn&amp;rsquo;t quite fit.&lt;/p>
&lt;hr>
&lt;h3 id="making-the-first-session-work">Making the first session work&lt;/h3>
&lt;/br>
&lt;p>The first time you sit down as a team to build the pattern bank, keep it light. You&amp;rsquo;re not trying to produce a finished document. You&amp;rsquo;re trying to start a conversation that will take months to reach maturity.&lt;/p>
&lt;p>A good opening question: &amp;ldquo;If you were explaining to a new team member what kinds of pipelines we build, how would you describe them?&amp;rdquo; This is deliberately informal. You&amp;rsquo;re not asking for a taxonomy — you&amp;rsquo;re asking for the words people already use.&lt;/p>
&lt;p>Let the conversation happen. Take notes. You&amp;rsquo;ll start to hear natural groupings emerge. Someone will say &amp;ldquo;well, most of our ingestion is basically file drops or API calls.&amp;rdquo; Someone else will add &amp;ldquo;plus the two database replications, but those work the same way as the file drops once they&amp;rsquo;re in landing.&amp;rdquo; That&amp;rsquo;s a pattern taking shape.&lt;/p>
&lt;p>When objections come up — &amp;ldquo;but the System A feed is completely different because&amp;hellip;&amp;rdquo; — treat them as contributions, not resistance. Either the system A feed is genuinely a different pattern (name it separately), or the difference is a variant within the existing pattern (document it as a variant). Both outcomes are useful. The engineer who raised the objection has just enriched the catalogue.&lt;/p>
&lt;p>There&amp;rsquo;s an important interpersonal dynamic to watch in these early sessions. Engineers who&amp;rsquo;ve been on the team a long time have often built strong opinions about how certain things should be done — and those opinions are usually right, because they&amp;rsquo;re grounded in hard experience. If you&amp;rsquo;re newer to the team, or coming in with a perspective shaped by a different environment, the temptation is to steer the conversation toward what you think the patterns should look like. Resist that. The first job is to surface what exists, not to redesign it.&lt;/p>
&lt;p>Even if you can see clearly that a particular approach has flaws — that two of the team&amp;rsquo;s existing ingestion patterns are more similar than they appear and probably belong under one umbrella — raise it as a question, not a conclusion. &amp;ldquo;I&amp;rsquo;m noticing these two look structurally similar — do you see them as the same pattern, or are there meaningful differences I&amp;rsquo;m missing?&amp;rdquo; That&amp;rsquo;s an invitation to examine, not an instruction to comply. And the answer might surprise you. The engineer who built both of them might have a very clear reason why they&amp;rsquo;re distinct that isn&amp;rsquo;t obvious from the outside.&lt;/p>
&lt;p>The sceptics in the room deserve more than a paragraph, because for a long time I misread what was actually happening for them. The standard advice — don&amp;rsquo;t argue them into participation, ask them what they&amp;rsquo;d add or change, give them an editing role rather than an audience role — is right as far as it goes. But it treats scepticism as a mood to be managed, and it&amp;rsquo;s usually something more legitimate than that.&lt;/p>
&lt;p>The engineers most resistant to a pattern bank are very often the ones whose expertise &lt;em>is&lt;/em> the current way of working. The person who can untangle the gnarliest legacy pipeline from memory, who knows every undocumented quirk of every source feed, who gets pulled into every incident because nobody else holds the map — that person has real, earned status, and it rests on knowledge that lives in their head. A pattern bank threatens to convert that hard-won knowledge into a page anyone can read. That&amp;rsquo;s a genuine loss, and you cannot argue someone out of feeling it.&lt;/p>
&lt;p>What works is role transfer, not persuasion. The new way of working needs heroes too: someone has to lead the audit of existing work, someone has to own the pattern standards, someone has to write the entries for the genuinely tricky patterns that nobody else fully understands. The people who hold the deepest knowledge of the current state should get first claim on those roles. You&amp;rsquo;re not asking them to surrender their expertise — you&amp;rsquo;re asking them to encode it, with their name on it.&lt;/p>
&lt;p>And there&amp;rsquo;s one question that does more work than any amount of advocacy: &amp;ldquo;what would you need to see before you&amp;rsquo;d trust this?&amp;rdquo; Ask it sincerely and write the answers down, because the answers &lt;em>are&lt;/em> the acceptance criteria for the pattern bank — co-authored by the people most likely to find its weaknesses. The sceptic who told you exactly what would convince them has just committed, in public, to being convincible. The person most likely to undermine a pattern bank built without them is the same person who becomes its most vocal defender when they&amp;rsquo;ve shaped it.&lt;/p>
&lt;p>At the end of the session, you should have a rough list of candidate patterns and a few volunteers to draft the first write-ups. Keep the documentation minimal — one page per pattern is plenty to start. The goal is a living document, not a finished artefact.&lt;/p>
&lt;p>The question of where to keep the pattern bank is worth getting right early. It should live somewhere the team already goes — a Teams wiki, a shared OneNote, a Confluence space, a structured markdown repo. Wherever your existing documentation lives is the right place. Don&amp;rsquo;t create a new system that requires a new habit. The pattern bank should be the path of least resistance in a design conversation, not a side trip.&lt;/p>
&lt;p>Don&amp;rsquo;t wait until the bank is &amp;ldquo;complete&amp;rdquo; to start using it. Start referencing patterns in design conversations almost immediately. When a new project comes in, run the question out loud: &amp;ldquo;Which ingestion pattern are we looking at here? Which transformation pattern?&amp;rdquo; Even before the bank is formally documented, the vocabulary starts to take hold. The naming matters as much as the documentation — maybe more. Once a team has words for what it&amp;rsquo;s doing, it starts to think in those words.&lt;/p>
&lt;hr>
&lt;h3 id="what-to-put-in-a-pattern-entry">What to put in a pattern entry&lt;/h3>
&lt;/br>
&lt;p>The minimum useful documentation for a pattern:&lt;/p>
&lt;p>&lt;strong>Name.&lt;/strong> Short, memorable, specific to your team&amp;rsquo;s context. &amp;ldquo;SFTP file ingestion&amp;rdquo; is fine. &amp;ldquo;Direct file transfer pattern&amp;rdquo; is fine. Resist the urge to make it sound architectural.&lt;/p>
&lt;p>&lt;strong>What problem it solves.&lt;/strong> One sentence. What kind of situation does this pattern address?&lt;/p>
&lt;p>&lt;strong>When to use it.&lt;/strong> What conditions make this the right choice?&lt;/p>
&lt;p>&lt;strong>When not to use it.&lt;/strong> This is the part most pattern documentation skips, and it&amp;rsquo;s often the most valuable. What situations look like they&amp;rsquo;d fit this pattern but actually don&amp;rsquo;t?&lt;/p>
&lt;p>&lt;strong>The basic shape.&lt;/strong> Three to six steps describing the high-level approach. Not a full implementation guide. Think &amp;ldquo;what are the stages?&amp;rdquo; not &amp;ldquo;what does the code look like?&amp;rdquo;&lt;/p>
&lt;p>&lt;strong>Known variants.&lt;/strong> What changes in different situations? An SFTP pull from an external system and an S3-to-S3 transfer might share a parent pattern but have meaningfully different considerations. Document the variants without creating a separate entry for each.&lt;/p>
&lt;p>&lt;strong>What it hands over, and what it needs to start.&lt;/strong> Patterns don&amp;rsquo;t run in isolation — each one ends where another begins, and the boundary is where work stalls. Two short lists fix most of that: the things this pattern produces that the next one depends on (the handover), and the things that must already exist before work on this pattern can sensibly start (the pickup criteria). If a transformation pattern needs the source data landed, typed, and described before it can begin, say so in the entry — because that single line is the difference between an engineer starting tomorrow and an engineer discovering, two days in, that they&amp;rsquo;re blocked.&lt;/p>
&lt;p>&lt;strong>How it&amp;rsquo;s validated.&lt;/strong> What kind of testing closes this pattern out, and with what kind of data? A useful rule of thumb that took us a while to articulate: fabricated rows belong in unit tests and nowhere else; real production history is the default for business validation; and if you genuinely need a synthetic scenario, create it through the source system&amp;rsquo;s own test environment so it arrives via the real pipeline — never hand-inserted into the warehouse. A pattern entry doesn&amp;rsquo;t need the whole standard. It just needs to say which of those applies, so validation is part of the design conversation rather than an afterthought at the end.&lt;/p>
&lt;p>&lt;strong>Effort signal.&lt;/strong> Is this a well-understood pattern with a reasonable effort history? Or does it have significant unknown elements? Even a simple &amp;ldquo;known / known with caveats / novel&amp;rdquo; classification is enough. It gives you something to point to when you&amp;rsquo;re estimating.&lt;/p>
&lt;p>To make this concrete, here&amp;rsquo;s roughly what a pattern entry might look like for a direct file ingestion pattern — pared back, readable, just enough to guide a design decision:&lt;/p>
&lt;hr>
&lt;p>&lt;strong>Pattern: Direct File Ingestion&lt;/strong>&lt;/p>
&lt;p>&lt;em>What it solves:&lt;/em> Landing structured or semi-structured files from an external source into the data platform&amp;rsquo;s landing zone.&lt;/p>
&lt;p>&lt;em>Use when:&lt;/em> Source delivers files on a schedule (push or pull), and transformation happens separately once the file is confirmed complete.&lt;/p>
&lt;p>&lt;em>Don&amp;rsquo;t use when:&lt;/em> Source delivers events in real-time that need immediate processing — that&amp;rsquo;s a streaming pattern. Also not appropriate when source volume is large enough that full file delivery creates latency problems.&lt;/p>
&lt;p>&lt;em>Basic shape:&lt;/em>&lt;/p>
&lt;ol>
&lt;li>File arrives in agreed drop location (SFTP, S3, or internal file transfer)&lt;/li>
&lt;li>Arrival trigger fires (event-driven or scheduled poll)&lt;/li>
&lt;li>File is validated for completeness (presence check, row count if available)&lt;/li>
&lt;li>File is copied to landing zone in raw form — no type conversion at this stage&lt;/li>
&lt;li>Staging job applies data types, basic cleansing, and audit columns&lt;/li>
&lt;li>Downstream transformation pattern picks up from staging&lt;/li>
&lt;/ol>
&lt;p>&lt;em>Known variants:&lt;/em> SFTP pull (we poll the source), S3 push (source lands in our bucket), internal transfer from another system on the platform. Error handling and retry logic differ slightly for each, but the core shape is the same.&lt;/p>
&lt;p>&lt;em>Hands over:&lt;/em> A landed, typed staging table; a manifest reconciliation result; a note of any rejected or quarantined rows. &lt;em>Needs to start:&lt;/em> Agreed drop location and credentials; a sample file; a named contact at the source who can answer format questions.&lt;/p>
&lt;p>&lt;em>Validated by:&lt;/em> Completeness check against the manifest; row counts reconciled to source; staging types asserted against the agreed contract.&lt;/p>
&lt;p>&lt;em>Effort signal:&lt;/em> Well-known. Delivery estimate should be grounded in actuals from previous builds. Flag if source has no delivery receipt mechanism — that adds complexity and uncertainty.&lt;/p>
&lt;hr>
&lt;p>That&amp;rsquo;s one page. Maybe two if the variant notes are detailed. It&amp;rsquo;s not comprehensive documentation — it&amp;rsquo;s a design guide. The engineer using it brings the implementation knowledge; the pattern provides the frame.&lt;/p>
&lt;p>The test I mentioned earlier is worth applying to every entry: can a mid-level engineer use this to make a design decision within about thirty seconds of reading it? If the answer is no, trim it or elevate the abstraction level. If someone needs a wall of text to understand what kind of problem the pattern addresses, the problem hasn&amp;rsquo;t been stated clearly enough.&lt;/p>
&lt;hr>
&lt;h3 id="patterns-in-use-from-blank-page-to-solution-design">Patterns in use: from blank page to solution design&lt;/h3>
&lt;/br>
&lt;p>The recipe analogy only works if you can actually see someone cooking. So here&amp;rsquo;s what using a pattern bank looks like in practice, when a project lands on your desk.&lt;/p>
&lt;p>Say your team is onboarding a new source system — a third-party vendor who delivers customer transaction data daily via SFTP. The commercial team wants it available in the reporting environment within forty-eight hours of the file landing. There are some business rules around transaction categorisation that need to be applied, and the output needs to land in an existing aggregate table that feeds a Power BI dashboard.&lt;/p>
&lt;p>Without a pattern bank, this is a design conversation that starts with &amp;ldquo;so how do we want to approach this?&amp;rdquo; With a pattern bank, it&amp;rsquo;s a pattern-matching exercise.&lt;/p>
&lt;p>&lt;strong>Ingestion:&lt;/strong> The source is delivering files via SFTP on a schedule. That&amp;rsquo;s the direct file ingestion pattern. You&amp;rsquo;ve done it before; you know the shape. The main variant question is whether you&amp;rsquo;re pulling from their server or they&amp;rsquo;re pushing to yours — that changes the authentication setup but not the core pattern. You note that their SFTP has had reliability issues in the past (the commercial team mentioned it), so you flag the retry handling variant in your notes.&lt;/p>
&lt;p>&lt;strong>Transformation:&lt;/strong> The business rules around transaction categorisation sound like a reference table lookup — you&amp;rsquo;re mapping raw values from the source to agreed internal categories. That&amp;rsquo;s a known pattern. The historical record requirement is the key question: does the business need to see how a transaction was categorised at the time it was processed, even if the categorisation rules change later? If yes, that&amp;rsquo;s a bi-temporal concern and it changes the transformation pattern significantly. If no — if current categorisation is all that matters — it&amp;rsquo;s a much simpler SCD1 or SCD2 load depending on whether you need row history.&lt;/p>
&lt;p>&lt;strong>Delivery:&lt;/strong> The output goes into an existing aggregate table feeding a Power BI dashboard. Depending on how that table is structured, you&amp;rsquo;re either appending new rows, overwriting a date partition, or recalculating an aggregate. Each is a recognisable delivery pattern with known characteristics.&lt;/p>
&lt;p>By the time that design conversation is twenty minutes in, you have a solution sketch: direct file ingestion (SFTP pull variant), SCD1 transformation with reference table lookup, aggregate table delivery. Three patterns, all known, all with effort history. The only real design question that needs time is the bi-temporal one — you need to clarify the business requirement before you can confirm the transformation pattern.&lt;/p>
&lt;p>And before the estimate goes anywhere, you walk the boundaries: what does ingestion hand to transformation, what does transformation hand to delivery, and who is waiting on whom at each point. Which brings me to the part of this picture I got wrong for the longest time.&lt;/p>
&lt;hr>
&lt;h3 id="the-seams-between-patterns">The seams between patterns&lt;/h3>
&lt;/br>
&lt;p>Everything I&amp;rsquo;ve described so far treats patterns as blocks: pick one per stage, snap them together, estimate from history. That&amp;rsquo;s true, and it&amp;rsquo;s also where the picture quietly lies to you — because in practice, the blocks are not where delivery goes wrong. The boundaries between them are.&lt;/p>
&lt;p>I learned this when we took our pattern bank a step further and turned it into a delivery plan: each pattern decomposed into roughly day-sized pieces of work, each piece with an exit criterion. The decomposition surfaced something the catalogue view had hidden. The estimates &lt;em>inside&lt;/em> patterns were fine — we had effort history, the day-sized pieces held up. The overruns lived &lt;em>between&lt;/em> patterns: the ingestion work finished on Tuesday and the transformation work started the following Thursday, and nobody could quite say where the week went.&lt;/p>
&lt;p>The week went into the seam. A review that sat in someone&amp;rsquo;s queue. A handover that turned out to be missing the one thing the next engineer needed, triggering a conversation, then a clarification, then a small rework. A wait for a batch window, or for another team, or for a sign-off from someone who didn&amp;rsquo;t know they were on the critical path.&lt;/p>
&lt;p>This gives you a distinction worth building into how you plan: effort versus elapsed time. Effort lives inside patterns — it&amp;rsquo;s the day-sized, estimable work the bank already describes. Elapsed time accumulates at seams, and it follows completely different rules. You can&amp;rsquo;t reduce a review queue by working harder, and you can&amp;rsquo;t estimate a sign-off from effort history. If your plans only count effort, the seams are invisible right up until the deadline isn&amp;rsquo;t.&lt;/p>
&lt;p>&lt;img src="https://ghostinthedata.info/posts/2026/2026-06-27-pattern-bank/images/pattern-bank-seams.svg" alt="Patterns are blocks of estimable effort; elapsed time accumulates at the seams between them">&lt;/p>
&lt;p>&lt;em>The same work, on a calendar. The solid segments are effort — the patterns. The gaps are the seams.&lt;/em>&lt;/p>
&lt;p>So once the patterns are named, name the seams too. Every place where work passes between patterns — or between people — gets the same lightweight treatment a pattern gets: what crosses the boundary (the handover pack), and what the receiving side needs before they can start (the pickup criteria). The handover from ingestion to transformation, for instance, might be: the staging table, the agreed contract it conforms to, a note of known quirks, and confirmation the data is actually flowing. Write it down once, and every future handover at that seam stops being a negotiation.&lt;/p>
&lt;p>The quality bar for a handover is a test I now apply to everything: a piece of work is done when a different engineer could pick it up tomorrow with no conversation. Not &amp;ldquo;done pending a chat,&amp;rdquo; not &amp;ldquo;done but ask me about the weird bit.&amp;rdquo; No conversation. It sounds strict, and it is — but every conversation a handover requires is elapsed time hiding in plain sight, and elapsed time at seams is precisely the thing your effort-based estimates can&amp;rsquo;t see.&lt;/p>
&lt;p>The pattern bank describes the blocks. The seam map describes the gaps between the blocks. You need both, and the second one is the one nobody writes down.&lt;/p>
&lt;hr>
&lt;h3 id="the-estimation-conversation">The estimation conversation&lt;/h3>
&lt;/br>
&lt;p>There&amp;rsquo;s a distinction in planning that doesn&amp;rsquo;t get talked about enough: the difference between things you don&amp;rsquo;t know, and things you don&amp;rsquo;t know you don&amp;rsquo;t know.&lt;/p>
&lt;p>The first kind — known unknowns — you can account for in estimates. You know they&amp;rsquo;re there, you can add buffer, you can plan a spike to resolve them early.&lt;/p>
&lt;p>The second kind — unknown unknowns — are the ones that blow up timelines. They surface mid-project, when you&amp;rsquo;re already committed to a delivery date. They&amp;rsquo;re the integration behaviour you didn&amp;rsquo;t anticipate, the edge case in the source system nobody mentioned, the schema change that came through without notice.&lt;/p>
&lt;p>Pattern reuse doesn&amp;rsquo;t eliminate unknown unknowns. But it converts some of them into known unknowns, and some known unknowns into known quantities. When you&amp;rsquo;ve built the same ingestion type four times, you know where the surprises usually come from. You&amp;rsquo;ve already met most of the ways that type of pipeline can misbehave.&lt;/p>
&lt;p>&lt;img src="https://ghostinthedata.info/posts/2026/2026-06-27-pattern-bank/images/pattern-bank-estimation-shift.svg" alt="Pattern reuse converts unknown unknowns into known unknowns, and known unknowns into known quantities">&lt;/p>
&lt;p>&lt;em>Each repetition of a pattern moves surprises to the right — out of the category that blows up timelines and into the category you can plan.&lt;/em>&lt;/p>
&lt;p>That conversion is where a pattern bank earns its keep. It&amp;rsquo;s not a tidiness exercise. It changes the quality of your estimates, and it changes how you talk about risk with internal customers.&lt;/p>
&lt;p>At the organisation I mentioned earlier, once we had a clearer picture of which project components were known patterns versus genuinely novel territory, the project conversations changed character. Instead of quoting a number and hoping for the best, we could say: &amp;ldquo;This solution uses three patterns from our standard set — the effort estimate for those is grounded in what we&amp;rsquo;ve delivered before. This fourth component is new for us — we&amp;rsquo;re treating it as discovery work and we&amp;rsquo;ll re-estimate once we&amp;rsquo;ve built a proof of concept.&amp;rdquo; That&amp;rsquo;s a different kind of conversation. It&amp;rsquo;s a more honest one, and paradoxically, it inspires more confidence.&lt;/p>
&lt;p>There&amp;rsquo;s a maturation point worth knowing about in advance. Once a pattern has genuine effort history — three or four builds behind it — you can decompose it into day-sized pieces of work, each with its own exit criterion. That&amp;rsquo;s the moment the bank stops being a catalogue and starts being a delivery playbook: a new project isn&amp;rsquo;t just &amp;ldquo;three known patterns,&amp;rdquo; it&amp;rsquo;s a sequence of named, day-sized stories you can lay against a calendar. And when you do lay it against a calendar, estimate the seams separately, in elapsed time rather than effort — because the question at a seam isn&amp;rsquo;t &amp;ldquo;how long will this take to do&amp;rdquo; but &amp;ldquo;how long will this take to happen.&amp;rdquo;&lt;/p>
&lt;p>Stakeholders aren&amp;rsquo;t usually uncomfortable with uncertainty. They&amp;rsquo;re uncomfortable with surprises. Pattern banking reduces surprises by making the known quantities explicit — which in turn makes the uncertain parts easier to name, plan around, and manage.&lt;/p>
&lt;hr>
&lt;h3 id="from-documentation-to-machinery">From documentation to machinery&lt;/h3>
&lt;/br>
&lt;p>Remember the opening of this article: a team with two sensible base templates, and twenty implementations that had drifted from them in ways nobody could track. Here&amp;rsquo;s the uncomfortable implication for everything I&amp;rsquo;ve said so far — a documented pattern bank is still just documentation, and documentation drifts. The patterns describe what the team agreed to do; nothing stops the codebase from quietly doing something else, one reasonable-seeming exception at a time. The bank I&amp;rsquo;ve described prevents the team from &lt;em>forgetting&lt;/em> the patterns. It doesn&amp;rsquo;t prevent them from &lt;em>departing&lt;/em> from them.&lt;/p>
&lt;p>So it&amp;rsquo;s worth knowing that a pattern bank has a maturity path, and documentation is the middle of it, not the end.&lt;/p>
&lt;p>&lt;img src="https://ghostinthedata.info/posts/2026/2026-06-27-pattern-bank/images/pattern-bank-maturity.svg" alt="A pattern bank matures from vocabulary to documentation to machinery">&lt;/p>
&lt;p>The first stage is vocabulary: the team has names for the work, spoken but not written. This is further than most teams ever get, and it&amp;rsquo;s where the adoption battle is won. The second stage is documentation: the one-page entries, the seam map, the effort signals. The third stage is machinery: the pattern&amp;rsquo;s invariants are enforced by your delivery tooling, so that violating them doesn&amp;rsquo;t produce a governance conversation — it produces a failed build.&lt;/p>
&lt;p>Not everything in a pattern can or should be mechanised. But some pattern elements are invariants — things that must hold for the platform to stay trustworthy — and invariants are exactly the things that drift erodes first. A sequencing rule is a good example. We had one that said, in effect, data ships before code: the ingestion side of a new source goes to production first, and transformation work doesn&amp;rsquo;t start until the data it depends on actually exists where production builds can see it. As documentation, that rule held about as well as documented rules ever do. Then we wired it into CI — the transformation build simply fails if the upstream data isn&amp;rsquo;t there — and the rule stopped being a rule. It became a property of the system. The pattern cannot drift on that invariant, because the machinery won&amp;rsquo;t let it.&lt;/p>
&lt;p>Encoding invariants has a sharp corollary that I didn&amp;rsquo;t anticipate: optional elements and mechanical gates don&amp;rsquo;t mix. We&amp;rsquo;d had data contracts in our patterns as &amp;ldquo;preferred but optional&amp;rdquo; — encouraged, written up, mostly adopted. The moment we wanted CI to validate new sources against their contract, &amp;ldquo;optional&amp;rdquo; stopped being coherent. A gate can&amp;rsquo;t check a thing that may or may not exist. So the gate forced the decision we&amp;rsquo;d been politely deferring: contracts became required for new sources, with existing ones grandfathered. That&amp;rsquo;s the general lesson — every time you automate the enforcement of a pattern element, you&amp;rsquo;ll be forced to decide whether the things it depends on are truly required. The machinery is honest in a way the documentation never has to be.&lt;/p>
&lt;p>Don&amp;rsquo;t read this as &amp;ldquo;mechanise everything.&amp;rdquo; Pick the invariants whose violation is expensive — the raw layer staying raw, the sequencing between stages, the contract a source promised — and let the rest stay as guidance. The thirty-second test still applies to the documentation; the machinery is just there to hold the lines that matter most while humans exercise judgement on everything else.&lt;/p>
&lt;hr>
&lt;h3 id="a-fourth-job-scoping-change">A fourth job: scoping change&lt;/h3>
&lt;/br>
&lt;p>I&amp;rsquo;ve described three jobs for a pattern bank so far: a shared design vocabulary, grounded estimation, and legibility to the people around the team. There&amp;rsquo;s a fourth, and I only discovered it later, when the bank was already in place and we needed it for something it was never designed to do.&lt;/p>
&lt;p>We were proposing a significant change to how the team delivered — the kind of change that touches process people have used for years, with passionate, technically senior advocates on multiple sides. The instinctive reaction to a proposal like that is that &lt;em>everything&lt;/em> is changing: the way I build, the way I test, the way my work gets to production, the way I&amp;rsquo;m on the hook when it breaks. When change feels total, people defend totally.&lt;/p>
&lt;p>The pattern bank changed the shape of that conversation entirely. Because the team&amp;rsquo;s whole way of working was laid out as a catalogue of named patterns, we could put the proposal against it and show precisely where the change landed: of the entire bank, exactly one pattern&amp;rsquo;s promotion steps were being rewritten, plus one new gate at a seam between two stages. Every ingestion pattern: untouched. Every delivery pattern: untouched. The transformation logic itself: untouched. We could literally point at the diagram and say — the workshop is deciding the contents of one box.&lt;/p>
&lt;p>&lt;img src="https://ghostinthedata.info/posts/2026/2026-06-27-pattern-bank/images/pattern-bank-change-scope.svg" alt="Scoping a process change against the pattern bank: one box changes, everything else is untouched">&lt;/p>
&lt;p>&lt;em>The most useful slide in the whole proposal wasn&amp;rsquo;t about the change. It was about everything that wasn&amp;rsquo;t changing.&lt;/em>&lt;/p>
&lt;p>That framing did more to lower the temperature than any argument about the merits of the change itself. Not because it dodged the hard conversation — the contents of that one box still had to be debated, properly and at length — but because it bounded the conversation. People could engage with the actual proposal instead of defending against an imagined one.&lt;/p>
&lt;p>This is, I think, a structural property rather than a one-off trick. Without a pattern bank, a process change has no edges: nobody can say with confidence what it touches and what it doesn&amp;rsquo;t, so everyone reasonably assumes it touches them. With a pattern bank, a change is a diff. You can enumerate exactly which entries are modified, which seams gain or lose a gate, and which patterns are provably unaffected. The same catalogue that scopes new work scopes change to the work itself.&lt;/p>
&lt;p>If you&amp;rsquo;re a lead who expects to steer your team through a significant shift in the next year or two — a new deployment model, a platform migration, a rework of how releases happen — that alone might justify building the bank now. It&amp;rsquo;s much easier to show people that only one box is changing if the boxes already exist.&lt;/p>
&lt;hr>
&lt;h3 id="keeping-it-alive-without-making-it-a-burden">Keeping it alive without making it a burden&lt;/h3>
&lt;/br>
&lt;p>The biggest risk to a pattern bank isn&amp;rsquo;t that it will be rejected. It&amp;rsquo;s that it will be adopted once and then quietly ignored as the team moves on to delivery pressures.&lt;/p>
&lt;p>The solution is governance that&amp;rsquo;s light enough to actually happen.&lt;/p>
&lt;p>Pattern review doesn&amp;rsquo;t need its own meeting. Fold it into your existing team rhythms — a retrospective, a sprint review, a regular team catch-up. Set aside fifteen minutes every month or two to ask: have any of our recently built solutions introduced a new pattern we should document? Has anything we&amp;rsquo;ve built repeatedly over the past quarter highlighted that one of our existing patterns needs refinement?&lt;/p>
&lt;p>The trigger for adding a new pattern remains the same: you&amp;rsquo;ve built the same thing three times. Before that, document it as a one-off or a variant. After that, give it its own entry. The three-instance rule keeps the bank from growing with speculative patterns that haven&amp;rsquo;t proven their recurring value.&lt;/p>
&lt;p>Retiring patterns is just as important as adding them. A pattern that hasn&amp;rsquo;t been used in twelve months is probably no longer part of how your team works. Don&amp;rsquo;t delete it — move it to a legacy section. Old pipelines built on deprecated patterns still exist, and future engineers will want context when they encounter them.&lt;/p>
&lt;p>The most common way a pattern bank dies is through drift: the documented patterns diverge from what the team actually does, and nobody updates the documentation because the overhead isn&amp;rsquo;t worth it. The way to avoid this is to make update rituals small. One sentence changed, one variant added, one effort note updated after a project completes. It takes five minutes if it&amp;rsquo;s part of the project close-out conversation. It takes months if it&amp;rsquo;s treated as a separate documentation effort.&lt;/p>
&lt;p>One small artefact worth keeping alongside the bank: a glossary. A dozen or so terms of art, one line each — what &amp;ldquo;landing&amp;rdquo; means here, what &amp;ldquo;staging&amp;rdquo; means here, what &amp;ldquo;the contract&amp;rdquo; refers to. It sounds trivial until you&amp;rsquo;ve watched two documents written three months apart quietly disagree about what a word means, or a design review burn twenty minutes discovering that two people were using &amp;ldquo;raw&amp;rdquo; differently. The patterns give the team nouns for solutions; the glossary keeps the rest of the vocabulary honest. It costs a page, and it makes everything else the team writes — pattern entries, design docs, handover notes — interoperable.&lt;/p>
&lt;p>One practical approach: put the pattern bank in a place the team already touches regularly. A Teams wiki, a Confluence space, a shared document with a clear structure — wherever the team already goes for reference material. Don&amp;rsquo;t create a dedicated system that requires a new habit to use. The path of least resistance should lead to the pattern bank, not away from it.&lt;/p>
&lt;hr>
&lt;h3 id="what-it-looks-like-when-its-working">What it looks like when it&amp;rsquo;s working&lt;/h3>
&lt;/br>
&lt;p>The sign that a pattern bank has taken hold isn&amp;rsquo;t that engineers consult it before starting every project. It&amp;rsquo;s that the vocabulary from it enters the team&amp;rsquo;s natural speech.&lt;/p>
&lt;p>You&amp;rsquo;ll hear it in design conversations: &amp;ldquo;this looks like an SCD2 load with a late-arriving records wrinkle&amp;rdquo; rather than &amp;ldquo;so this is kind of like what we did for System B but slightly different.&amp;rdquo; You&amp;rsquo;ll hear it in planning: &amp;ldquo;the ingestion here is a known pattern, it&amp;rsquo;s the delivery piece that&amp;rsquo;s new territory.&amp;rdquo; You&amp;rsquo;ll hear it at the seams: &amp;ldquo;what&amp;rsquo;s in the handover pack for this one — could someone pick it up tomorrow without talking to you?&amp;rdquo; You&amp;rsquo;ll hear it when a new team member asks how things work, and an experienced engineer can answer them in fifteen minutes using pattern language rather than two weeks of reading old code.&lt;/p>
&lt;p>You&amp;rsquo;ll also hear it in conversations with people outside the data team — architects, project managers, internal customers — who start to understand the platform not as a black box but as a system with legible parts. That transparency matters. When the architects designing how source systems should serve data understand what an ingestion pattern expects, they make better design decisions. When an internal customer understands the difference between a project that uses known patterns and one that introduces a new one, they understand why the effort estimates are different.&lt;/p>
&lt;p>The pattern bank doesn&amp;rsquo;t make the work easier. It makes the work legible. And when the work is legible — to the engineers doing it, to the people managing it, and to the people upstream of it — everything else gets a little simpler.&lt;/p>
&lt;p>Going back to that organisation where I started: the ingestion layer that had twenty different approaches to twenty similar problems didn&amp;rsquo;t get rebuilt. That wasn&amp;rsquo;t the point. But over time, as new source systems were onboarded, they were designed against the patterns the team had agreed on. The sprawl stopped sprawling. The conversations about effort estimates became more grounded. The invariants that mattered most stopped relying on memory and started failing builds instead. And when the day came that we needed to change how the team delivered — not just what it delivered — the bank turned a frightening proposal into a bounded one.&lt;/p>
&lt;p>When a new engineer joined the team, there was something to hand them that explained not just what the platform did, but why things were built the way they were.&lt;/p>
&lt;p>That&amp;rsquo;s what the pattern bank was, at its core. Not a governance document. Not a bureaucratic exercise. A way of saying: here is what we know, here is how we think about this, and here is how we&amp;rsquo;ve agreed to do it together.&lt;/p>
&lt;p>Start there. Build it with your team. Refine it every time you learn something new.&lt;/p>
&lt;p>&lt;/br>&lt;/br>&lt;/p></content:encoded><category>Data Engineering</category><category>Leadership</category><category>Career Development</category><category>Data Engineering</category><category>Pattern Bank</category><category>Team Culture</category><category>Project Management</category><category>Solution Design</category><category>Engineering Leadership</category><category>Estimation</category></item></channel></rss>