Ghost in the data
  • Home
  • About
  • Posts
  • Topics
  • Posts
  • 2025
    • UV Tools
    • Zsh Virtual Environments
    • 2025 Data Trends
    • Data Modeling Approaches
    • MacOS Dev Setup
    • Windows Dev Setup
    • Business Context Guide
    • Data Impact
    • Data Engineering Interviews
    • First 90 Days as Data Engineer
    • Senior to Staff Engineer
    • LLMs for Business Part 1
    • LLMs for Business Part 2
    • Mastering 1:1 Meetings
    • AI Prompting Secret
    • Conceptual Data Modeling
    • WAP Pattern for Data Pipelines
    • AI Simplified
    • dbt Fusion: The Engine Upgrade
    • Continuous Integration for Data Teams
  • 2024
    • Delta-lake
    • Data Normalisation
    • Data Profiling
    • Defensive Engineering
    • CI/CD
    • Setup Docker and Airflow
    • Find and Attract Data Engineers
    • 17 Years of Insights
    • Relationship Building
    • Individual Contributor
  • 2023
    • GitBash with SSH
    • Journalling
    • Minecraft Server in GCP
    • Onboarding a data team
    • File Format for Big Data
    • Incident Management
    • Data Vault
    • Books that are worth you time?
Hero Image
Continuous Integration for Data Teams: Beyond the Buzzwords

The Day Everything Broke (And How CI Could Have Saved Us) Picture this: It’s 9 AM on a Monday, and your Slack is exploding. The executive dashboard is showing impossible numbers. Customer support is fielding complaints about incorrect billing amounts. The marketing team is questioning why their conversion metrics suddenly dropped to zero. You trace it back to a seemingly innocent change you merged Friday afternoon—a simple column rename that seemed harmless enough. But that “harmless” change cascaded through your entire data pipeline, breaking downstream models, dashboards, and automated reports.

  • ContinuousIntegration
  • DataQuality
  • dbt
  • DevOps
  • DataEngineering
  • GitHub
  • Datafold
  • DataValidation
Saturday, June 28, 2025 Read
Hero Image
dbt Fusion: The Engine Upgrade That's Got Everyone Talking

When Your Favorite Tool Gets a Makeover You know that feeling when your favorite app suddenly changes its interface? That mix of excitement and anxiety about whether the changes will actually improve your workflow or just mess with muscle memory you’ve spent years building. That’s exactly what happened when dbt Labs dropped dbt Fusion on the analytics engineering community. The reactions were… let’s call them passionate. Some folks were celebrating like they’d just discovered fire, while others were questioning whether this marked the beginning of the end for open-source dbt.

  • dbt
  • DataEngineering
  • AnalyticsEngineering
  • OpenSource
  • DataTools
  • SQL
  • DataModeling
Saturday, June 21, 2025 Read
Hero Image
AI Simplified: Understanding LLMs, Workflows, and Agents

AI Buzzwords Demystified If you’ve been following AI developments lately, you’ve probably encountered terms like LLMs, RAG, ReAct, and AI Agents. While these technologies are transforming how we interact with AI, the terminology can be overwhelming. In this post, I’ll break down these concepts into digestible explanations with practical examples. Let’s start with the foundation and progressively build up to more complex systems. Large Language Models (LLMs): The Foundation At the core of today’s AI revolution are Large Language Models (LLMs). Popular applications such as ChatGPT and Claude are built on top of these powerful models. They excel at generating and manipulating text based on the prompts we provide.

  • AI Concepts
  • ChatGPT
  • Claude
  • LLM Interaction
  • AI Workflows
  • Language Models
  • RAG
  • AI Agents
Saturday, May 24, 2025 Read
Hero Image
Streamlining Data Pipeline Reliability: The Write-Audit-Publish Pattern

Introduction: Why Safe Data Pipelines Matter In the world of data engineering, there’s a constant challenge we all face: how do we ensure our production data remains reliable and error-free when deploying updates? Anyone who’s experienced the cold sweat of a bad deployment affecting critical business data knows this pain all too well. Enter the Write-Audit-Publish pattern—a robust approach that can significantly reduce the risk of data pipeline failures. This pattern, which shares DNA with the well-known Blue-Green deployment strategy from software engineering, creates a safety net that can save your team countless hours of troubleshooting and emergency fixes.

  • Write-Audit-Publish
  • WAP Pattern
  • Airflow
  • Data Reliability
  • Blue-Green Deployment
  • Data Quality
  • Python
Sunday, May 18, 2025 Read
Hero Image
The Art and Science of Conceptual Data Modeling: Building Pipelines That Last

Introduction: Why Conceptual Data Modeling Makes or Breaks Your Pipeline Ever found yourself staring at a faulty data pipeline, wondering where it all went wrong? Join the club. I’ve been there too many times to count. The hard truth? Most pipeline failures aren’t technical issues—they’re conceptual ones. We get so caught up in the how (tools, languages, frameworks) that we completely miss the what and why of our data needs.

  • ConceptualDataModeling
  • DataEngineering
  • StakeholderManagement
  • EmpatheticDesign
  • DataPipelines
  • RequirementGathering
Saturday, May 17, 2025 Read
Hero Image
The One Simple Secret to Effective AI Prompting

Forget Everything You’ve Learned About AI Prompting There’s a sea of articles out there about “how to talk to AI” or “the perfect prompt structure.” Frameworks, formulas, special keywords—it can get overwhelming. But what if I told you that you could forget all of it? That’s right. All those complicated prompting techniques might be unnecessary, because there’s one fundamental principle that works better than anything else: AI excels at roleplaying.

  • AI Prompting
  • ChatGPT
  • Claude
  • LLM Interaction
  • AI Communication
  • Language Models
  • Roleplaying
  • AI Productivity
Sunday, May 11, 2025 Read
Hero Image
Mastering One-on-One Meetings: Building Trust and Driving Growth

Introduction Have you ever felt that slight relief when your manager cancels your 1:1 meeting? Early in my career as a data professional, I viewed 1:1s as just another checkbox on my calendar—often treating them like mini-standups where I’d rattle off project updates before awkwardly waiting for the meeting to end. Looking back, I realize how much potential growth I left on the table. As I progressed from an individual contributor to leading a team, I’ve learned that 1:1 meetings aren’t administrative burdens—they’re golden opportunities for trust-building, relationship development, and strategic alignment that many of us simply don’t know how to leverage.

  • One-on-One Meetings
  • Management
  • Emotional Intelligence
  • Trust Building
  • Workplace Communication
  • Professional Development
  • Feedback
  • Mentorship
Saturday, March 22, 2025 Read
Hero Image
Leveraging LLMs for Business Impact: Part 2 - Building an AI Data Engineer Agent

Introduction In Part 1 of this series, we explored the theoretical foundations of Large Language Models (LLMs), Retrieval Augmented Generation (RAG), and vector databases. Now, it’s time to put theory into practice. This is going to be a long read, so grab some coffee, and one (couple) of your favorite biscuits. One use case for leveraging LLM’s, is creating of a Agent - a Senior Data Engineer AI that automatically reviews Pull Requests in your data engineering projects. This agent will be that nit picky Data Engineer that enforces SQL formatting standards, ensure naming and data type consistency, validate data quality checks, and suggest improvements based on best practices. By integrating this into your GitHub workflow, you can maintain higher code quality, accelerate onboarding for new team members, and reduce the burden of manual code reviews.

  • GitHub Actions
  • CI/CD
  • AI Agents
  • Code Review
  • Data Quality
  • DBT
  • SQL Standards
Saturday, March 8, 2025 Read
Hero Image
Leveraging LLMs for Business Impact: Part 1 - Theory and Foundations

Introduction In today’s rapidly evolving technological landscape, Large Language Models (LLMs) have emerged as transformative tools with the potential to revolutionize business operations across industries. While the hype around these technologies is intense, understanding their practical applications and underlying mechanisms is crucial for organizations seeking to leverage them effectively. This two-part series aims to demystify LLMs and their associated technologies, starting with the theoretical foundations in Part 1, followed by a hands-on implementation guide using AWS services in Part 2.

  • LLM
  • RAG
  • Vector Databases
  • AI Business Applications
  • Data Architecture
Friday, March 7, 2025 Read
Hero Image
From Senior to Staff: Navigating the Data Engineering Leadership Path

Introduction: The Critical Inflection Point The transition from Senior to Staff Engineer represents a pivotal moment in any technical career path. It’s the point where your impact extends beyond your code and transforms into something much more profound – true technical leadership. While this shift can feel daunting, it also opens doors to some of the most rewarding work of your career. The beautiful thing about the engineering career ladder is that it uniquely allows for advancement without stepping away from the technical work that many of us love.

  • Staff Engineer
  • Career Growth
  • Technical Leadership
  • Chapter Lead
  • Data Leadership
  • Engineering Career
  • Promotion
Sunday, March 2, 2025 Read
Hero Image
Your First 90 Days as a Data Engineer: A Strategic Guide

Introduction Landing your first data engineering role—or starting at a new company—is both exhilarating and daunting. After navigating multiple interviews and accepting an offer, you’ve finally arrived at your desk with a new laptop and company swag (if your lucky). Even now, after solving countless problems ranging from minor bugs to enterprise-scale data challenges, I still occasionally feel that flutter of uncertainty in my stomach, when starting a new role. What if I don’t know what I’m doing? What if I make a mistake?

  • Onboarding
  • Professional Growth
  • Team Collaboration
  • Career Advice
  • Data Culture
Sunday, February 23, 2025 Read
Hero Image
Mastering Data Interviews: A Comprehensive Guide

Introduction After nearly two decades in the data engineering field, I’ve sat on both sides of the interview table countless times. Whether you’re a seasoned professional looking to change roles or a newcomer trying to break into the field, the interview process for data engineering positions can be both challenging and mysterious. There’s often uncertainty about what questions you’ll face, what skills you need to demonstrate, and what interviewers are really looking for beneath the surface.

  • Interviews
  • Technical Assessment
  • Career Growth
  • SQL
  • Data Modeling
  • Problem Solving
Saturday, February 22, 2025 Read
  • ««
  • «
  • 1
  • 2
  • »
  • »»