Ghost in the data
  • Home
  • About
  • Posts
  • Topics
  • Resources
  • RSS
  • Tags
  • 2026 Trends
  • AI
  • AI Agents
  • AI Bubble
  • AI Business Applications
  • AI Communication
  • AI Concepts
  • AI Ethics
  • AI Productivity
  • AI Prompting
  • AI Tools
  • AI Workflows
  • Airflow
  • Analytics
  • AnalyticsEngineering
  • Anonymization
  • Apache Airflow
  • Apache Iceberg
  • API Integration
  • Architecture
  • Athena
  • Automation
  • AVRO
  • AWS
  • AWS Glue
  • BankingData
  • Bedrock Edition
  • Best Practices
  • BigData
  • Blue-Green Deployment
  • Budgeting
  • Burnout
  • Business Case
  • Business Value
  • Business-Communication
  • Career Advice
  • Career Development
  • Career Growth
  • Career Planning
  • Career Strategy
  • Change Management
  • Chapter Lead
  • ChatGPT
  • CI/CD
  • Claude
  • Claude Code
  • Cloud Computing
  • Cloud Gaming
  • Code Comments
  • Code Review
  • Collaboration
  • Communication
  • ConceptualDataModeling
  • Continuous Learning
  • ContinuousIntegration
  • Cost Optimization
  • CSV
  • Culture
  • Customer Experience
  • Data Architecture
  • Data Contracts
  • Data Culture
  • Data Engineering
  • Data Ethics
  • Data Freshness
  • Data Governance
  • Data Impact
  • Data Ingestion
  • Data Leadership
  • Data Modeling
  • Data Modelling
  • Data Observability
  • Data Ownership
  • Data Pipeline
  • Data Pipelines
  • Data Platform
  • Data Platforms
  • Data Quality
  • Data Reliability
  • Data Solutions
  • Data System Resilience
  • Data Teams
  • Data Testing
  • Data Transformation
  • Data Validation
  • Data Vault
  • Data Warehouse
  • Data Warehouse Architecture
  • Data Warehousing
  • Database Design
  • DataDemocratization
  • DataEngineering
  • Datafold
  • DataGovernance
  • DataMinimization
  • DataModeling
  • DataPipelines
  • DataPrivacy
  • DataQuality
  • DataTools
  • DataValidation
  • DataWarehouse
  • Dbt
  • Decision Making
  • Delta Lake
  • Development
  • Development Tools
  • DevOps
  • Dimensional Modeling
  • DimensionalModeling
  • Documentation
  • DuckDB
  • Emergency Fund
  • Emotional Intelligence
  • EmpatheticDesign
  • Employee Engagement
  • Employee Experience
  • Employee Productivity
  • Engineering Career
  • Engineering Culture
  • Enterprise
  • ETL
  • ETL Pipeline
  • Family Gaming
  • Feedback
  • File Formats
  • Financial Crisis
  • Financial Independence
  • FinOps
  • Fivetran
  • Frameworks
  • Friendship
  • Future of Work
  • GCP
  • GDPR
  • Git
  • GitBash
  • GitHub
  • GitHub Actions
  • Grief
  • Hiring Strategies
  • Historical Load
  • Human Connection
  • Idempotency
  • Incident Response
  • Industry Trends
  • Innovation
  • Inspirational Quote
  • Intergroup Conflict
  • Interviews
  • Job Security
  • Journal
  • Journaling Techniques
  • JSON
  • Junior Engineer
  • Kimball
  • Kimball Methodology
  • Lakehouse
  • Lambda
  • Language Models
  • Leadership
  • Legacy Systems
  • Life
  • LLM
  • LLM Interaction
  • Loss
  • MacOS
  • Management
  • Mental Health
  • Mentorship
  • Mindfulness Practices
  • Minecraft
  • Modern Data Stack
  • Moral Development
  • Onboarding
  • One-on-One Meetings
  • Open Source
  • OpenFlow
  • OpenSource
  • ORC
  • Organisational Culture
  • Organizational Culture
  • Parquet
  • Performance Optimization
  • Personal
  • Personal Growth
  • Pipeline
  • Pipeline Architecture
  • Pipeline Design
  • Pipeline Optimization
  • Platform Strategy
  • PostegreSQL
  • Pragmatism
  • Presentation-Skills
  • Problem Solving
  • Production Issues
  • Productivity
  • Professional Development
  • Professional Growth
  • Professional Relationships
  • Professional-Skills
  • Promotion
  • Psychological Safety
  • Public-Speaking
  • Python
  • RAG
  • Recruitment
  • Redundancy
  • Refactoring
  • Remote Work
  • Reputation
  • RequirementGathering
  • RetentionPolicies
  • RFC 4180
  • Risk Management
  • Robbers Cave Experiment
  • ROI
  • Roleplaying
  • S3
  • Salesforce
  • SCD
  • SCD Type 2
  • Schema Drift
  • Schema Evolution
  • Self-Awareness
  • Self-Reflection
  • Server Setup
  • ServiceDesign
  • ShadowIT
  • Snowflake
  • Soft Skills
  • SQL
  • SQL Standards
  • Sql-Agents
  • Sql-Validation
  • SSH
  • SSH Keys
  • Staff Engineer
  • Stakeholder Engagement
  • Stakeholder Management
  • StakeholderManagement
  • Star Schema
  • Starburst
  • Step Functions
  • Strangler Fig
  • Strategy
  • Strengths
  • Success Habits
  • Talent Acquisition
  • Team Building
  • Team Collaboration
  • Team Culture
  • Team Enablement
  • Team Leadership
  • Team-Management
  • Technical Assessment
  • Technical Debt
  • Technical Leadership
  • Technical Strategy
  • Testing
  • Tools and Access
  • Trino
  • Trust
  • Trust Building
  • Trust Crisis
  • UserExperience
  • UV
  • UV Package Manager
  • Value Creation
  • Vector Databases
  • Virtual Environments
  • Visualization
  • Vocal-Techniques
  • VSCode
  • WAP Pattern
  • Wellbeing
  • Windows
  • Work-Life Balance
  • Workplace Communication
  • Workplace Relationships
  • Workplace Stress
  • Write-Audit-Publish
  • Zsh
Hero Image
The Guerrilla Guide to Data Engineering Interviews

The Scenario That Changes Everything Picture this: You’re sitting in an interview room—or more likely these days, staring at a Zoom window with your carefully curated bookshelf background—and the interviewer asks you about data quality. “Tell me about your experience with data quality,” they say. You have two choices. Choice A: “Data quality is really important in data engineering. It involves ensuring data is accurate, complete, consistent, and timely. I believe strongly in implementing data quality checks throughout the pipeline.”

  • Interviews
  • Career Growth
  • Technical Assessment
  • SQL
  • Data Modeling
  • Problem Solving
  • Delta Lake
  • dbt
  • Data Quality
Sunday, January 11, 2026 Read
Hero Image
Building AI Agents with Claude Code

Introduction Imagine you’re reviewing a pull request with dozens of SQL files, each containing complex queries for your data pipeline. You spot inconsistent formatting, or syntax which doesn’t work with your infrastructure. Sound familiar? It’s common for data professionals to struggle with maintaining consistent SQL standards across their projects, especially when working with specialized platforms and it can be time consuming to review these elements within a peer review. It would be better use of time to focus on the hard thinking elements, like logic etc. However these small syntax or style issues, can be distracting. Well at least they are for me.

  • claude-code
  • sql-agents
  • starburst
  • delta-lake
  • trino
  • sql-validation
  • dbt
  • data-engineering
  • ai-tools
  • vscode
Saturday, September 13, 2025 Read
Hero Image
Continuous Integration for Data Teams: Beyond the Buzzwords

The Day Everything Broke (And How CI Could Have Saved Us) Picture this: It’s 9 AM on a Monday, and your Slack is exploding. The executive dashboard is showing impossible numbers. Customer support is fielding complaints about incorrect billing amounts. The marketing team is questioning why their conversion metrics suddenly dropped to zero. You trace it back to a seemingly innocent change you merged Friday afternoon—a simple column rename that seemed harmless enough. But that “harmless” change cascaded through your entire data pipeline, breaking downstream models, dashboards, and automated reports.

  • ContinuousIntegration
  • DataQuality
  • dbt
  • DevOps
  • DataEngineering
  • GitHub
  • Datafold
  • DataValidation
Saturday, June 28, 2025 Read
Hero Image
dbt Fusion: The Engine Upgrade That's Got Everyone Talking

When Your Favorite Tool Gets a Makeover You know that feeling when your favorite app suddenly changes its interface? That mix of excitement and anxiety about whether the changes will actually improve your workflow or just mess with muscle memory you’ve spent years building. That’s exactly what happened when dbt Labs dropped dbt Fusion on the analytics engineering community. The reactions were… let’s call them passionate. Some folks were celebrating like they’d just discovered fire, while others were questioning whether this marked the beginning of the end for open-source dbt.

  • dbt
  • DataEngineering
  • AnalyticsEngineering
  • OpenSource
  • DataTools
  • SQL
  • DataModeling
Saturday, June 21, 2025 Read
Hero Image
Leveraging LLMs for Business Impact: Part 2 - Building an AI Data Engineer Agent

Introduction In Part 1 of this series, we explored the theoretical foundations of Large Language Models (LLMs), Retrieval Augmented Generation (RAG), and vector databases. Now, it’s time to put theory into practice. This is going to be a long read, so grab some coffee, and one (couple) of your favorite biscuits. One use case for leveraging LLM’s, is creating of a Agent - a Senior Data Engineer AI that automatically reviews Pull Requests in your data engineering projects. This agent will be that nit picky Data Engineer that enforces SQL formatting standards, ensure naming and data type consistency, validate data quality checks, and suggest improvements based on best practices. By integrating this into your GitHub workflow, you can maintain higher code quality, accelerate onboarding for new team members, and reduce the burden of manual code reviews.

  • GitHub Actions
  • CI/CD
  • AI Agents
  • Code Review
  • Data Quality
  • DBT
  • SQL Standards
Saturday, March 8, 2025 Read
Hero Image
Setting Up Your Data Engineering Environment on Windows

Introduction Setting up a development environment for data engineering on Windows requires some specific considerations that differ from Unix-based systems. This guide will walk you through creating a robust Python development environment on Windows, with detailed explanations of each component and why it’s important. Clean Slate: Removing Existing Python Installations Before starting, it’s important to remove any existing Python installations to avoid conflicts: Open Windows Settings > Apps > Apps & Features Search for “Python” Uninstall any Python versions listed Also check and remove Python from these locations:

  • Python
  • DBT
  • Windows
  • UV Package Manager
  • VSCode
Monday, February 3, 2025 Read
Hero Image
Setting Up Your Data Engineering Environment on MacOS

Introduction Setting up a development environment for data engineering on MacOS requires careful consideration of package management, Python version control, and tool configuration. This guide will walk you through the process, explaining not just how to set up these tools, but why each component is important. Clean Slate: Removing Existing Python Installations Before we begin, it’s important to ensure we’re starting with a clean slate. Multiple Python installations can cause confusion and conflicts. Let’s remove any existing Python installations:

  • Python
  • DBT
  • MacOS
  • UV Package Manager
  • VSCode
Sunday, February 2, 2025 Read
Hero Image
Data Vault Data Modeling with Python and dbt

Introduction Data Vault is a data modeling technique that is specifically designed for use in Data Warehouses. It is a hybrid approach that combines the best elements of 3rd Normal Form (3NF) and Star Schema to provide a flexible and scalable data modeling solution. Hubs, Links, Satellites A Data Vault consists of three main components: Hubs, Links, and Satellites. Hubs are the backbone of the Data Vault architecture and represent the entities within the data model. They are the core data elements and contain the primary key information.

  • Data Vault
  • Python
  • DBT
  • ETL
  • Data Warehouse Architecture
Sunday, February 26, 2023 Read
  • ««
  • «
  • 1
  • 2
  • »
  • »»