Ghost in the data
  • Home
  • About
  • Posts
  • Topics
  • Categories
  • Analytics Engineering
  • Artificial Intelligence
  • Best Practices
  • Big Data
  • Business Technology
  • Career Development
  • Cloud Computing
  • Communication
  • Conflict Resolution
  • Data Engineering
  • Data Modeling
  • Data Modelling
  • Data Pipelines
  • Data Quality
  • Data Storage
  • Data Warehousing
  • Database Design
  • Dbt
  • Delta-Lake
  • Development
  • Development Tools
  • DevOps
  • Employee Engagement
  • Gaming Servers
  • Google Cloud Platform
  • Hiring
  • IT Management
  • Leadership
  • Life Hacks
  • Mindfulness
  • Minecraft
  • Personal Development
  • Pipeline
  • Pipeline Design
  • Productivity
  • Professional Development
  • Professional Growth
  • Promotion
  • Psychology
  • Python
  • Python Tools
  • Setup Guide
  • Stakeholder Management
  • Team Building
  • Team Management
  • Technology Trends
  • Tutorial
  • Version Control
  • Workplace Dynamics
Hero Image
Continuous Integration for Data Teams: Beyond the Buzzwords

The Day Everything Broke (And How CI Could Have Saved Us) Picture this: It’s 9 AM on a Monday, and your Slack is exploding. The executive dashboard is showing impossible numbers. Customer support is fielding complaints about incorrect billing amounts. The marketing team is questioning why their conversion metrics suddenly dropped to zero. You trace it back to a seemingly innocent change you merged Friday afternoon—a simple column rename that seemed harmless enough. But that “harmless” change cascaded through your entire data pipeline, breaking downstream models, dashboards, and automated reports.

  • ContinuousIntegration
  • DataQuality
  • dbt
  • DevOps
  • DataEngineering
  • GitHub
  • Datafold
  • DataValidation
Saturday, June 28, 2025 Read
Hero Image
dbt Fusion: The Engine Upgrade That's Got Everyone Talking

When Your Favorite Tool Gets a Makeover You know that feeling when your favorite app suddenly changes its interface? That mix of excitement and anxiety about whether the changes will actually improve your workflow or just mess with muscle memory you’ve spent years building. That’s exactly what happened when dbt Labs dropped dbt Fusion on the analytics engineering community. The reactions were… let’s call them passionate. Some folks were celebrating like they’d just discovered fire, while others were questioning whether this marked the beginning of the end for open-source dbt.

  • dbt
  • DataEngineering
  • AnalyticsEngineering
  • OpenSource
  • DataTools
  • SQL
  • DataModeling
Saturday, June 21, 2025 Read
Hero Image
Streamlining Data Pipeline Reliability: The Write-Audit-Publish Pattern

Introduction: Why Safe Data Pipelines Matter In the world of data engineering, there’s a constant challenge we all face: how do we ensure our production data remains reliable and error-free when deploying updates? Anyone who’s experienced the cold sweat of a bad deployment affecting critical business data knows this pain all too well. Enter the Write-Audit-Publish pattern—a robust approach that can significantly reduce the risk of data pipeline failures. This pattern, which shares DNA with the well-known Blue-Green deployment strategy from software engineering, creates a safety net that can save your team countless hours of troubleshooting and emergency fixes.

  • Write-Audit-Publish
  • WAP Pattern
  • Airflow
  • Data Reliability
  • Blue-Green Deployment
  • Data Quality
  • Python
Sunday, May 18, 2025 Read
Hero Image
The Art and Science of Conceptual Data Modeling: Building Pipelines That Last

Introduction: Why Conceptual Data Modeling Makes or Breaks Your Pipeline Ever found yourself staring at a faulty data pipeline, wondering where it all went wrong? Join the club. I’ve been there too many times to count. The hard truth? Most pipeline failures aren’t technical issues—they’re conceptual ones. We get so caught up in the how (tools, languages, frameworks) that we completely miss the what and why of our data needs.

  • ConceptualDataModeling
  • DataEngineering
  • StakeholderManagement
  • EmpatheticDesign
  • DataPipelines
  • RequirementGathering
Saturday, May 17, 2025 Read
Hero Image
Leveraging LLMs for Business Impact: Part 2 - Building an AI Data Engineer Agent

Introduction In Part 1 of this series, we explored the theoretical foundations of Large Language Models (LLMs), Retrieval Augmented Generation (RAG), and vector databases. Now, it’s time to put theory into practice. This is going to be a long read, so grab some coffee, and one (couple) of your favorite biscuits. One use case for leveraging LLM’s, is creating of a Agent - a Senior Data Engineer AI that automatically reviews Pull Requests in your data engineering projects. This agent will be that nit picky Data Engineer that enforces SQL formatting standards, ensure naming and data type consistency, validate data quality checks, and suggest improvements based on best practices. By integrating this into your GitHub workflow, you can maintain higher code quality, accelerate onboarding for new team members, and reduce the burden of manual code reviews.

  • GitHub Actions
  • CI/CD
  • AI Agents
  • Code Review
  • Data Quality
  • DBT
  • SQL Standards
Saturday, March 8, 2025 Read
Hero Image
Leveraging LLMs for Business Impact: Part 1 - Theory and Foundations

Introduction In today’s rapidly evolving technological landscape, Large Language Models (LLMs) have emerged as transformative tools with the potential to revolutionize business operations across industries. While the hype around these technologies is intense, understanding their practical applications and underlying mechanisms is crucial for organizations seeking to leverage them effectively. This two-part series aims to demystify LLMs and their associated technologies, starting with the theoretical foundations in Part 1, followed by a hands-on implementation guide using AWS services in Part 2.

  • LLM
  • RAG
  • Vector Databases
  • AI Business Applications
  • Data Architecture
Friday, March 7, 2025 Read
Hero Image
From Senior to Staff: Navigating the Data Engineering Leadership Path

Introduction: The Critical Inflection Point The transition from Senior to Staff Engineer represents a pivotal moment in any technical career path. It’s the point where your impact extends beyond your code and transforms into something much more profound – true technical leadership. While this shift can feel daunting, it also opens doors to some of the most rewarding work of your career. The beautiful thing about the engineering career ladder is that it uniquely allows for advancement without stepping away from the technical work that many of us love.

  • Staff Engineer
  • Career Growth
  • Technical Leadership
  • Chapter Lead
  • Data Leadership
  • Engineering Career
  • Promotion
Sunday, March 2, 2025 Read
Hero Image
Your First 90 Days as a Data Engineer: A Strategic Guide

Introduction Landing your first data engineering role—or starting at a new company—is both exhilarating and daunting. After navigating multiple interviews and accepting an offer, you’ve finally arrived at your desk with a new laptop and company swag (if your lucky). Even now, after solving countless problems ranging from minor bugs to enterprise-scale data challenges, I still occasionally feel that flutter of uncertainty in my stomach, when starting a new role. What if I don’t know what I’m doing? What if I make a mistake?

  • Onboarding
  • Professional Growth
  • Team Collaboration
  • Career Advice
  • Data Culture
Sunday, February 23, 2025 Read
Hero Image
Mastering Data Interviews: A Comprehensive Guide

Introduction After nearly two decades in the data engineering field, I’ve sat on both sides of the interview table countless times. Whether you’re a seasoned professional looking to change roles or a newcomer trying to break into the field, the interview process for data engineering positions can be both challenging and mysterious. There’s often uncertainty about what questions you’ll face, what skills you need to demonstrate, and what interviewers are really looking for beneath the surface.

  • Interviews
  • Technical Assessment
  • Career Growth
  • SQL
  • Data Modeling
  • Problem Solving
Saturday, February 22, 2025 Read
Hero Image
Maximizing Data Impact: A Guide to Effective Data Engineering

Introduction Creating impact goes far beyond writing efficient code or building robust pipelines. It’s about understanding how your work translates into tangible value for stakeholders across the organization. Types of Impact Our work forms the backbone of data-driven decision making in organizations. However, measuring and communicating this impact isn’t always straightforward. If you feel your work isn’t making a meaningful difference, it might be time to pivot your focus or approach. Understanding the various ways we create value helps guide these decisions and ensures we’re contributing in ways that matter.

  • Data Impact
  • Visualization
  • Stakeholder Management
  • Team Enablement
  • Data Quality
Saturday, February 15, 2025 Read
Hero Image
Data Modeling Showdown: Kimball vs One Big Table vs Relational

Introduction When architecting a data warehouse, one of the most crucial decisions is choosing the right data modeling approach. Like selecting the right tool for a job, each modeling methodology has its strengths and ideal use cases. Today, we’ll explore three popular approaches: Kimball’s dimensional modeling (star schema), the one big table approach, and traditional relational modeling. The Dataset: Understanding Our Example To illustrate these approaches, let’s consider a retail sales system with these core components:

  • Data Warehouse
  • SQL
  • Star Schema
  • Database Design
  • Performance Optimization
Saturday, January 25, 2025 Read
Hero Image
Data Industry Trends: What to Expect in 2025

Introduction The data industry has kicked off 2025 with transformative developments that are fundamentally reshaping our approach to data management and analytics. The landscape is witnessing seismic shifts - from Databricks’ historic funding round to Boomi’s strategic acquisition of Rivery, and the industry-shaking Iceberg buyout. Yet amid this technological evolution, a critical question emerges: how will these advancements translate into tangible value for organizations? As we navigate through this dynamic environment, the focus extends beyond identifying dominant technologies to understanding their practical impact on business outcomes. Let’s explore the key trends that are defining the data world in 2025, and more importantly, how they’re reshaping the way organizations leverage their data assets.

  • Industry Trends
  • Apache Iceberg
  • AI
  • Data Solutions
  • SQL
  • Data Governance
Saturday, January 18, 2025 Read
  • ««
  • «
  • 1
  • 2
  • »
  • »»