Ghost in the data
  • Home
  • About
  • Posts
  • Topics
  • Resources
  • Categories
  • AI Development
  • Analytics Engineering
  • Artificial Intelligence
  • AWS
  • Banking
  • Best Practices
  • Big Data
  • Business Technology
  • Career Development
  • Career Growth
  • Cloud Computing
  • Cloud Infrastructure
  • Communication
  • Conflict Resolution
  • Data Architecture
  • Data Culture
  • Data Engineering
  • Data Governance
  • Data Modeling
  • Data Modelling
  • Data Pipelines
  • Data Privacy
  • Data Quality
  • Data Storage
  • Data Warehousing
  • Database Design
  • Dbt
  • Delta-Lake
  • Development
  • Development Tools
  • DevOps
  • Employee Engagement
  • Gaming Servers
  • Google Cloud Platform
  • Hiring
  • IT Management
  • Leadership
  • Life Hacks
  • Mindfulness
  • Minecraft
  • Personal Development
  • Personal Finance
  • Pipeline
  • Pipeline Design
  • Productivity
  • Professional Development
  • Professional Growth
  • Promotion
  • Psychology
  • Python
  • Python Tools
  • Setup Guide
  • SQL
  • Stakeholder Management
  • Team Building
  • Team Culture
  • Team Management
  • Technology Trends
  • Tutorial
  • User Experience
  • Version Control
  • Workplace Dynamics
Hero Image
Building Your First AWS Data Pipeline: A Guide for Data Professionals Who've Never Touched Cloud Infrastructure

The spreadsheet that changed everything Here’s a story that might sound familiar. You’re pulling data from an API—maybe daily sales numbers, maybe customer interactions, maybe something else entirely. Every morning, you open your laptop, run a Python script, save the CSV somewhere, and get on with your actual work. It takes maybe five minutes, but it’s five minutes you can’t forget about. Miss a day and you’ve got a gap in your data. Go on vacation? Better hope someone remembers to run your script.

  • AWS
  • Data Pipelines
  • Lambda
  • S3
  • Athena
  • Cloud Computing
  • Data Ingestion
Wednesday, November 26, 2025 Read
Hero Image
When Your Data Quality Fails at 9 PM on a Friday

When everything goes wrong at once It’s 9 PM on a Friday. You’re halfway through your second beer, finally relaxing after a brutal week. Your phone buzzes. Then it buzzes again. And again. The support team’s in full panic mode, your manager’s calling, and somewhere in Melbourne, two very angry guests are standing outside the same Airbnb property—both holding confirmation emails that say the place is theirs for the weekend.

  • Data Quality
  • SQL
  • Database Design
  • Data Validation
  • Testing
  • Data Engineering
  • Production Issues
Saturday, November 22, 2025 Read
Hero Image
Balancing Data Accessibility and Privacy in Financial Services

The Data Tightrope: Where Accessibility Meets Privacy Let’s face it—in today’s data landscape, data is simultaneously your most valuable asset and your biggest potential liability. Finding that sweet spot where data remains accessible enough to drive business decisions while being locked down enough to satisfy privacy regulations. It’s not just about ticking compliance boxes—it’s about maintaining customer trust while still extracting every bit of analytical value from your data assets.

  • DataPrivacy
  • Anonymization
  • RetentionPolicies
  • BankingData
  • DataMinimization
  • GDPR
  • DataGovernance
Friday, November 21, 2025 Read
Hero Image
Why Dimensional Modeling Isn't Dead—It's Just Getting Started

The Great Data Modeling Debate Nobody Asked For Another meeting where someone confidently declared, “We don’t need data modeling anymore—just dump everything in the data lake and let analysts figure it out.” I’ve heard variations of this statement for years now, in meetings or at conferences. The pitch is always the same: traditional data warehousing is dead, dimensional modeling is a relic from the 90s, and modern big data tools have made structured modeling obsolete. Schema-on-read is the future. Agility over architecture.

  • DimensionalModeling
  • DataWarehouse
  • DataModeling
  • DataQuality
  • Analytics
  • Kimball
  • BigData
Friday, November 7, 2025 Read
Hero Image
Financial Independence: Your Shield Against Job Loss Fear

The Fear That Follows You Home One evening, after pushing another commit past midnight, I couldn’t bring myself to sit up. Not because I was tired—though I was. Not because the commit had issues—it went smoothly, and tested all fine. I couldn’t get up because I’d spent the entire day with a knot in my stomach, wondering if our team would survive the next round of “organizational restructuring.” Here’s what made it worse: I had no idea if my fear was rational. Were we really at risk? Or was I just catastrophizing? The uncertainty was eating me alive.

  • financial independence
  • job security
  • emergency fund
  • career development
  • mental health
  • workplace stress
  • budgeting
  • redundancy
Sunday, November 2, 2025 Read
Hero Image
Building AI Agents with Claude Code

Introduction Imagine you’re reviewing a pull request with dozens of SQL files, each containing complex queries for your data pipeline. You spot inconsistent formatting, or syntax which doesn’t work with your infrastructure. Sound familiar? It’s common for data professionals to struggle with maintaining consistent SQL standards across their projects, especially when working with specialized platforms and it can be time consuming to review these elements within a peer review. It would be better use of time to focus on the hard thinking elements, like logic etc. However these small syntax or style issues, can be distracting. Well at least they are for me.

  • claude-code
  • sql-agents
  • starburst
  • delta-lake
  • trino
  • sql-validation
  • dbt
  • data-engineering
  • ai-tools
  • vscode
Saturday, September 13, 2025 Read
Hero Image
Continuous Integration for Data Teams: Beyond the Buzzwords

The Day Everything Broke (And How CI Could Have Saved Us) Picture this: It’s 9 AM on a Monday, and your Slack is exploding. The executive dashboard is showing impossible numbers. Customer support is fielding complaints about incorrect billing amounts. The marketing team is questioning why their conversion metrics suddenly dropped to zero. You trace it back to a seemingly innocent change you merged Friday afternoon—a simple column rename that seemed harmless enough. But that “harmless” change cascaded through your entire data pipeline, breaking downstream models, dashboards, and automated reports.

  • ContinuousIntegration
  • DataQuality
  • dbt
  • DevOps
  • DataEngineering
  • GitHub
  • Datafold
  • DataValidation
Saturday, June 28, 2025 Read
Hero Image
dbt Fusion: The Engine Upgrade That's Got Everyone Talking

When Your Favorite Tool Gets a Makeover You know that feeling when your favorite app suddenly changes its interface? That mix of excitement and anxiety about whether the changes will actually improve your workflow or just mess with muscle memory you’ve spent years building. That’s exactly what happened when dbt Labs dropped dbt Fusion on the analytics engineering community. The reactions were… let’s call them passionate. Some folks were celebrating like they’d just discovered fire, while others were questioning whether this marked the beginning of the end for open-source dbt.

  • dbt
  • DataEngineering
  • AnalyticsEngineering
  • OpenSource
  • DataTools
  • SQL
  • DataModeling
Saturday, June 21, 2025 Read
Hero Image
Streamlining Data Pipeline Reliability: The Write-Audit-Publish Pattern

Introduction: Why Safe Data Pipelines Matter In the world of data engineering, there’s a constant challenge we all face: how do we ensure our production data remains reliable and error-free when deploying updates? Anyone who’s experienced the cold sweat of a bad deployment affecting critical business data knows this pain all too well. Enter the Write-Audit-Publish pattern—a robust approach that can significantly reduce the risk of data pipeline failures. This pattern, which shares DNA with the well-known Blue-Green deployment strategy from software engineering, creates a safety net that can save your team countless hours of troubleshooting and emergency fixes.

  • Write-Audit-Publish
  • WAP Pattern
  • Airflow
  • Data Reliability
  • Blue-Green Deployment
  • Data Quality
  • Python
Sunday, May 18, 2025 Read
Hero Image
The Art and Science of Conceptual Data Modeling: Building Pipelines That Last

Introduction: Why Conceptual Data Modeling Makes or Breaks Your Pipeline Ever found yourself staring at a faulty data pipeline, wondering where it all went wrong? Join the club. I’ve been there too many times to count. The hard truth? Most pipeline failures aren’t technical issues—they’re conceptual ones. We get so caught up in the how (tools, languages, frameworks) that we completely miss the what and why of our data needs.

  • ConceptualDataModeling
  • DataEngineering
  • StakeholderManagement
  • EmpatheticDesign
  • DataPipelines
  • RequirementGathering
Saturday, May 17, 2025 Read
Hero Image
Leveraging LLMs for Business Impact: Part 2 - Building an AI Data Engineer Agent

Introduction In Part 1 of this series, we explored the theoretical foundations of Large Language Models (LLMs), Retrieval Augmented Generation (RAG), and vector databases. Now, it’s time to put theory into practice. This is going to be a long read, so grab some coffee, and one (couple) of your favorite biscuits. One use case for leveraging LLM’s, is creating of a Agent - a Senior Data Engineer AI that automatically reviews Pull Requests in your data engineering projects. This agent will be that nit picky Data Engineer that enforces SQL formatting standards, ensure naming and data type consistency, validate data quality checks, and suggest improvements based on best practices. By integrating this into your GitHub workflow, you can maintain higher code quality, accelerate onboarding for new team members, and reduce the burden of manual code reviews.

  • GitHub Actions
  • CI/CD
  • AI Agents
  • Code Review
  • Data Quality
  • DBT
  • SQL Standards
Saturday, March 8, 2025 Read
Hero Image
Leveraging LLMs for Business Impact: Part 1 - Theory and Foundations

Introduction In today’s rapidly evolving technological landscape, Large Language Models (LLMs) have emerged as transformative tools with the potential to revolutionize business operations across industries. While the hype around these technologies is intense, understanding their practical applications and underlying mechanisms is crucial for organizations seeking to leverage them effectively. This two-part series aims to demystify LLMs and their associated technologies, starting with the theoretical foundations in Part 1, followed by a hands-on implementation guide using AWS services in Part 2.

  • LLM
  • RAG
  • Vector Databases
  • AI Business Applications
  • Data Architecture
Friday, March 7, 2025 Read
  • ««
  • «
  • 1
  • 2
  • »
  • »»