Ghost in the data
  • Home
  • About
  • Posts
  • Topics
  • Resources
  • Tags
  • AI
  • AI Agents
  • AI Business Applications
  • AI Communication
  • AI Concepts
  • AI Productivity
  • AI Prompting
  • AI Workflows
  • Ai-Tools
  • Airflow
  • Analytics
  • AnalyticsEngineering
  • Anonymization
  • Apache Airflow
  • Apache Iceberg
  • Athena
  • Automation
  • AVRO
  • AWS
  • BankingData
  • Bedrock Edition
  • BigData
  • Blue-Green Deployment
  • Budgeting
  • Business Case
  • Business Value
  • Business-Communication
  • Career Advice
  • Career Development
  • Career Growth
  • Chapter Lead
  • ChatGPT
  • CI/CD
  • Claude
  • Claude-Code
  • Cloud Computing
  • Cloud Gaming
  • Code Review
  • Communication
  • ConceptualDataModeling
  • Continuous Learning
  • ContinuousIntegration
  • CSV
  • Culture
  • Data Architecture
  • Data Culture
  • Data Engineering
  • Data Ethics
  • Data Governance
  • Data Impact
  • Data Ingestion
  • Data Leadership
  • Data Modeling
  • Data Modelling
  • Data Pipeline
  • Data Pipelines
  • Data Quality
  • Data Reliability
  • Data Solutions
  • Data System Resilience
  • Data Testing
  • Data Transformation
  • Data Validation
  • Data Vault
  • Data Warehouse
  • Data Warehouse Architecture
  • Database Design
  • DataDemocratization
  • DataEngineering
  • Datafold
  • DataGovernance
  • DataMinimization
  • DataModeling
  • DataPipelines
  • DataPrivacy
  • DataQuality
  • DataTools
  • DataValidation
  • DataWarehouse
  • Dbt
  • Decision Making
  • Delta-Lake
  • Development
  • Development Tools
  • DevOps
  • DimensionalModeling
  • Emergency Fund
  • Emotional Intelligence
  • EmpatheticDesign
  • Employee Engagement
  • Employee Productivity
  • Engineering Career
  • ETL
  • ETL Pipeline
  • Family Gaming
  • Feedback
  • File Formats
  • Financial Independence
  • Frameworks
  • GCP
  • GDPR
  • Git
  • GitBash
  • GitHub
  • GitHub Actions
  • Hiring Strategies
  • Incident Response
  • Industry Trends
  • Inspirational Quote
  • Intergroup Conflict
  • Interviews
  • Job Security
  • Journal
  • Journaling Techniques
  • JSON
  • Kimball
  • Lambda
  • Language Models
  • Leadership
  • LLM
  • LLM Interaction
  • MacOS
  • Management
  • Mental Health
  • Mentorship
  • Mindfulness Practices
  • Minecraft
  • Moral Development
  • Onboarding
  • One-on-One Meetings
  • OpenSource
  • ORC
  • Organizational Culture
  • Parquet
  • Performance Optimization
  • Personal Growth
  • Pipeline
  • PostegreSQL
  • Presentation-Skills
  • Problem Solving
  • Production Issues
  • Professional Development
  • Professional Growth
  • Professional-Skills
  • Promotion
  • Psychological Safety
  • Public-Speaking
  • Python
  • RAG
  • Recruitment
  • Redundancy
  • Remote Work
  • RequirementGathering
  • RetentionPolicies
  • Risk Management
  • Robbers Cave Experiment
  • ROI
  • Roleplaying
  • S3
  • Schema Evolution
  • Self-Awareness
  • Self-Reflection
  • Server Setup
  • ServiceDesign
  • ShadowIT
  • SQL
  • SQL Standards
  • Sql-Agents
  • Sql-Validation
  • SSH
  • SSH Keys
  • Staff Engineer
  • Stakeholder Engagement
  • Stakeholder Management
  • StakeholderManagement
  • Star Schema
  • Starburst
  • Strategy
  • Strengths
  • Success Habits
  • Talent Acquisition
  • Team Building
  • Team Collaboration
  • Team Enablement
  • Team-Management
  • Technical Assessment
  • Technical Leadership
  • Testing
  • Tools and Access
  • Trino
  • Trust Building
  • UserExperience
  • UV
  • UV Package Manager
  • Value Creation
  • Vector Databases
  • Virtual Environments
  • Visualization
  • Vocal-Techniques
  • Vscode
  • WAP Pattern
  • Windows
  • Workplace Communication
  • Workplace Relationships
  • Workplace Stress
  • Write-Audit-Publish
  • Zsh
Hero Image
The Four Stages of Data Quality: From Hidden Costs to Measurable Value

This is the fundamental problem with data quality. You know it matters. Everyone knows it matters. But until you can quantify the impact, connect it to business outcomes, and build a credible business case, it remains this abstract thing that’s important but never urgent enough to properly fund. I wrote a practical guide to data quality last week that walks through hands-on implementation—the SQL queries, the profiling techniques, the actual mechanics of finding and fixing data issues. Think of that as the “how to use the tools” guide. This article is different. This is the “why these tools matter and how to convince your organization to actually use them” guide.

  • Data Quality
  • ROI
  • Business Case
  • Data Governance
  • Strategy
  • Frameworks
Monday, November 24, 2025 Read
Hero Image
When Your Data Quality Fails at 9 PM on a Friday

When everything goes wrong at once It’s 9 PM on a Friday. You’re halfway through your second beer, finally relaxing after a brutal week. Your phone buzzes. Then it buzzes again. And again. The support team’s in full panic mode, your manager’s calling, and somewhere in Melbourne, two very angry guests are standing outside the same Airbnb property—both holding confirmation emails that say the place is theirs for the weekend.

  • Data Quality
  • SQL
  • Database Design
  • Data Validation
  • Testing
  • Data Engineering
  • Production Issues
Saturday, November 22, 2025 Read
Hero Image
Streamlining Data Pipeline Reliability: The Write-Audit-Publish Pattern

Introduction: Why Safe Data Pipelines Matter In the world of data engineering, there’s a constant challenge we all face: how do we ensure our production data remains reliable and error-free when deploying updates? Anyone who’s experienced the cold sweat of a bad deployment affecting critical business data knows this pain all too well. Enter the Write-Audit-Publish pattern—a robust approach that can significantly reduce the risk of data pipeline failures. This pattern, which shares DNA with the well-known Blue-Green deployment strategy from software engineering, creates a safety net that can save your team countless hours of troubleshooting and emergency fixes.

  • Write-Audit-Publish
  • WAP Pattern
  • Airflow
  • Data Reliability
  • Blue-Green Deployment
  • Data Quality
  • Python
Sunday, May 18, 2025 Read
Hero Image
Leveraging LLMs for Business Impact: Part 2 - Building an AI Data Engineer Agent

Introduction In Part 1 of this series, we explored the theoretical foundations of Large Language Models (LLMs), Retrieval Augmented Generation (RAG), and vector databases. Now, it’s time to put theory into practice. This is going to be a long read, so grab some coffee, and one (couple) of your favorite biscuits. One use case for leveraging LLM’s, is creating of a Agent - a Senior Data Engineer AI that automatically reviews Pull Requests in your data engineering projects. This agent will be that nit picky Data Engineer that enforces SQL formatting standards, ensure naming and data type consistency, validate data quality checks, and suggest improvements based on best practices. By integrating this into your GitHub workflow, you can maintain higher code quality, accelerate onboarding for new team members, and reduce the burden of manual code reviews.

  • GitHub Actions
  • CI/CD
  • AI Agents
  • Code Review
  • Data Quality
  • DBT
  • SQL Standards
Saturday, March 8, 2025 Read
Hero Image
Maximizing Data Impact: A Guide to Effective Data Engineering

Introduction Creating impact goes far beyond writing efficient code or building robust pipelines. It’s about understanding how your work translates into tangible value for stakeholders across the organization. Types of Impact Our work forms the backbone of data-driven decision making in organizations. However, measuring and communicating this impact isn’t always straightforward. If you feel your work isn’t making a meaningful difference, it might be time to pivot your focus or approach. Understanding the various ways we create value helps guide these decisions and ensures we’re contributing in ways that matter.

  • Data Impact
  • Visualization
  • Stakeholder Management
  • Team Enablement
  • Data Quality
Saturday, February 15, 2025 Read
Hero Image
Mastering Data Engineering: Insights and Best Practices

Introduction I have been working with Data for a bit over 17 years now, I have seen it evolve from its nascent stages to a cornerstone of the tech industry. The journey has been nothing short of revolutionary, impacting businesses and society at large. The evolution and the role of a data engineer have expanded, requiring not just technical skills, but a deep understanding of business, security, and the human element within technology.

  • Culture
  • Continuous Learning
  • Data Quality
  • Professional Growth
  • Data Pipeline
  • Data System Resilience
  • Team Collaboration
Saturday, March 30, 2024 Read