Ghost in the data
  • Home
  • About
  • Posts
  • Posts
  • 2025
    • UV Tools
    • Zsh Virtual Environments
    • 2025 Data Trends
    • Data Modeling Approaches
    • MacOS Dev Setup
    • Windows Dev Setup
    • Business Context Guide
    • Data Impact
    • Data Engineering Interviews
    • First 90 Days as Data Engineer
    • Senior to Staff Engineer
    • LLMs for Business Part 1
    • LLMs for Business Part 2
    • Mastering 1:1 Meetings
    • AI Prompting Secret
    • Conceptual Data Modeling
    • WAP Pattern for Data Pipelines
    • AI Simplified
  • 2024
    • Delta-lake
    • Data Normalisation
    • Data Profiling
    • Defensive Engineering
    • CI/CD
    • Setup Docker and Airflow
    • Find and Attract Data Engineers
    • 17 Years of Insights
    • Relationship Building
    • Individual Contributor
  • 2023
    • GitBash with SSH
    • Journalling
    • Minecraft Server in GCP
    • Onboarding a data team
    • File Format for Big Data
    • Incident Management
    • Data Vault
    • Books that are worth you time?
Hero Image
AI Simplified: Understanding LLMs, Workflows, and Agents

AI Buzzwords Demystified If you’ve been following AI developments lately, you’ve probably encountered terms like LLMs, RAG, ReAct, and AI Agents. While these technologies are transforming how we interact with AI, the terminology can be overwhelming. In this post, I’ll break down these concepts into digestible explanations with practical examples. Let’s start with the foundation and progressively build up to more complex systems. Large Language Models (LLMs): The Foundation At the core of today’s AI revolution are Large Language Models (LLMs). Popular applications such as ChatGPT and Claude are built on top of these powerful models. They excel at generating and manipulating text based on the prompts we provide.

  • AI Concepts
  • ChatGPT
  • Claude
  • LLM Interaction
  • AI Workflows
  • Language Models
  • RAG
  • AI Agents
Saturday, May 24, 2025 Read
Hero Image
Streamlining Data Pipeline Reliability: The Write-Audit-Publish Pattern

Introduction: Why Safe Data Pipelines Matter In the world of data engineering, there’s a constant challenge we all face: how do we ensure our production data remains reliable and error-free when deploying updates? Anyone who’s experienced the cold sweat of a bad deployment affecting critical business data knows this pain all too well. Enter the Write-Audit-Publish pattern—a robust approach that can significantly reduce the risk of data pipeline failures. This pattern, which shares DNA with the well-known Blue-Green deployment strategy from software engineering, creates a safety net that can save your team countless hours of troubleshooting and emergency fixes.

  • Write-Audit-Publish
  • WAP Pattern
  • Airflow
  • Data Reliability
  • Blue-Green Deployment
  • Data Quality
  • Python
Sunday, May 18, 2025 Read
Hero Image
The Art and Science of Conceptual Data Modeling: Building Pipelines That Last

Introduction: Why Conceptual Data Modeling Makes or Breaks Your Pipeline Ever found yourself staring at a faulty data pipeline, wondering where it all went wrong? Join the club. I’ve been there too many times to count. The hard truth? Most pipeline failures aren’t technical issues—they’re conceptual ones. We get so caught up in the how (tools, languages, frameworks) that we completely miss the what and why of our data needs.

  • ConceptualDataModeling
  • DataEngineering
  • StakeholderManagement
  • EmpatheticDesign
  • DataPipelines
  • RequirementGathering
Saturday, May 17, 2025 Read
Hero Image
The One Simple Secret to Effective AI Prompting

Forget Everything You’ve Learned About AI Prompting There’s a sea of articles out there about “how to talk to AI” or “the perfect prompt structure.” Frameworks, formulas, special keywords—it can get overwhelming. But what if I told you that you could forget all of it? That’s right. All those complicated prompting techniques might be unnecessary, because there’s one fundamental principle that works better than anything else: AI excels at roleplaying.

  • AI Prompting
  • ChatGPT
  • Claude
  • LLM Interaction
  • AI Communication
  • Language Models
  • Roleplaying
  • AI Productivity
Sunday, May 11, 2025 Read
Hero Image
Mastering One-on-One Meetings: Building Trust and Driving Growth

Introduction Have you ever felt that slight relief when your manager cancels your 1:1 meeting? Early in my career as a data professional, I viewed 1:1s as just another checkbox on my calendar—often treating them like mini-standups where I’d rattle off project updates before awkwardly waiting for the meeting to end. Looking back, I realize how much potential growth I left on the table. As I progressed from an individual contributor to leading a team, I’ve learned that 1:1 meetings aren’t administrative burdens—they’re golden opportunities for trust-building, relationship development, and strategic alignment that many of us simply don’t know how to leverage.

  • One-on-One Meetings
  • Management
  • Emotional Intelligence
  • Trust Building
  • Workplace Communication
  • Professional Development
  • Feedback
  • Mentorship
Saturday, March 22, 2025 Read
Hero Image
Leveraging LLMs for Business Impact: Part 2 - Building an AI Data Engineer Agent

Introduction In Part 1 of this series, we explored the theoretical foundations of Large Language Models (LLMs), Retrieval Augmented Generation (RAG), and vector databases. Now, it’s time to put theory into practice. This is going to be a long read, so grab some coffee, and one (couple) of your favorite biscuits. One use case for leveraging LLM’s, is creating of a Agent - a Senior Data Engineer AI that automatically reviews Pull Requests in your data engineering projects. This agent will be that nit picky Data Engineer that enforces SQL formatting standards, ensure naming and data type consistency, validate data quality checks, and suggest improvements based on best practices. By integrating this into your GitHub workflow, you can maintain higher code quality, accelerate onboarding for new team members, and reduce the burden of manual code reviews.

  • GitHub Actions
  • CI/CD
  • AI Agents
  • Code Review
  • Data Quality
  • DBT
  • SQL Standards
Saturday, March 8, 2025 Read
Hero Image
Leveraging LLMs for Business Impact: Part 1 - Theory and Foundations

Introduction In today’s rapidly evolving technological landscape, Large Language Models (LLMs) have emerged as transformative tools with the potential to revolutionize business operations across industries. While the hype around these technologies is intense, understanding their practical applications and underlying mechanisms is crucial for organizations seeking to leverage them effectively. This two-part series aims to demystify LLMs and their associated technologies, starting with the theoretical foundations in Part 1, followed by a hands-on implementation guide using AWS services in Part 2.

  • LLM
  • RAG
  • Vector Databases
  • AI Business Applications
  • Data Architecture
Friday, March 7, 2025 Read
Hero Image
From Senior to Staff: Navigating the Data Engineering Leadership Path

Introduction: The Critical Inflection Point The transition from Senior to Staff Engineer represents a pivotal moment in any technical career path. It’s the point where your impact extends beyond your code and transforms into something much more profound – true technical leadership. While this shift can feel daunting, it also opens doors to some of the most rewarding work of your career. The beautiful thing about the engineering career ladder is that it uniquely allows for advancement without stepping away from the technical work that many of us love.

  • Staff Engineer
  • Career Growth
  • Technical Leadership
  • Chapter Lead
  • Data Leadership
  • Engineering Career
  • Promotion
Sunday, March 2, 2025 Read
Hero Image
Your First 90 Days as a Data Engineer: A Strategic Guide

Introduction Landing your first data engineering role—or starting at a new company—is both exhilarating and daunting. After navigating multiple interviews and accepting an offer, you’ve finally arrived at your desk with a new laptop and company swag (if your lucky). Even now, after solving countless problems ranging from minor bugs to enterprise-scale data challenges, I still occasionally feel that flutter of uncertainty in my stomach, when starting a new role. What if I don’t know what I’m doing? What if I make a mistake?

  • Onboarding
  • Professional Growth
  • Team Collaboration
  • Career Advice
  • Data Culture
Sunday, February 23, 2025 Read
Hero Image
Mastering Data Interviews: A Comprehensive Guide

Introduction After nearly two decades in the data engineering field, I’ve sat on both sides of the interview table countless times. Whether you’re a seasoned professional looking to change roles or a newcomer trying to break into the field, the interview process for data engineering positions can be both challenging and mysterious. There’s often uncertainty about what questions you’ll face, what skills you need to demonstrate, and what interviewers are really looking for beneath the surface.

  • Interviews
  • Technical Assessment
  • Career Growth
  • SQL
  • Data Modeling
  • Problem Solving
Saturday, February 22, 2025 Read
Hero Image
Maximizing Data Impact: A Guide to Effective Data Engineering

Introduction Creating impact goes far beyond writing efficient code or building robust pipelines. It’s about understanding how your work translates into tangible value for stakeholders across the organization. Types of Impact Our work forms the backbone of data-driven decision making in organizations. However, measuring and communicating this impact isn’t always straightforward. If you feel your work isn’t making a meaningful difference, it might be time to pivot your focus or approach. Understanding the various ways we create value helps guide these decisions and ensures we’re contributing in ways that matter.

  • Data Impact
  • Visualization
  • Stakeholder Management
  • Team Enablement
  • Data Quality
Saturday, February 15, 2025 Read
Hero Image
Breaking Down Business Context

Breaking Down the Business Context Barrier: A Value-Driven Approach to Stakeholder Conversations “I don’t have the business knowledge or context.” I dont know about you, but this is a barrier I face from colleagues alot. Ironically, it’s the mindset that often prevents us from having the very conversations that would help us gain that context. Starting with Curiosity The journey to understanding business value doesn’t start with technical knowledge – it starts with curiosity about people, and there job/role. In my years, I’ve learned that the most valuable conversations often begin with simple questions about the person across the table:

  • Communication
  • Business Value
  • Stakeholder Engagement
  • Technical Leadership
  • Value Creation
Saturday, February 8, 2025 Read
  • ««
  • «
  • 1
  • 2
  • »
  • »»