Ghost in the data
  • Home
  • About
  • Posts
  • Topics
  • Resources
  • RSS
  • Tags
  • 2026 Trends
  • AI
  • AI Agents
  • AI Bubble
  • AI Business Applications
  • AI Communication
  • AI Concepts
  • AI Ethics
  • AI Productivity
  • AI Prompting
  • AI Tools
  • AI Workflows
  • Airflow
  • Analytics
  • AnalyticsEngineering
  • Anonymization
  • Apache Airflow
  • Apache Iceberg
  • API Integration
  • Architecture
  • Athena
  • Automation
  • AVRO
  • AWS
  • AWS Glue
  • BankingData
  • Bedrock Edition
  • Best Practices
  • BigData
  • Blue-Green Deployment
  • Budgeting
  • Burnout
  • Business Case
  • Business Value
  • Business-Communication
  • Career Advice
  • Career Development
  • Career Growth
  • Career Planning
  • Career Strategy
  • Change Management
  • Chapter Lead
  • ChatGPT
  • CI/CD
  • Claude
  • Claude Code
  • Cloud Computing
  • Cloud Gaming
  • Code Comments
  • Code Review
  • Collaboration
  • Communication
  • ConceptualDataModeling
  • Continuous Learning
  • ContinuousIntegration
  • Cost Optimization
  • CSV
  • Culture
  • Customer Experience
  • Data Architecture
  • Data Contracts
  • Data Culture
  • Data Engineering
  • Data Ethics
  • Data Freshness
  • Data Governance
  • Data Impact
  • Data Ingestion
  • Data Leadership
  • Data Modeling
  • Data Modelling
  • Data Observability
  • Data Ownership
  • Data Pipeline
  • Data Pipelines
  • Data Platform
  • Data Platforms
  • Data Quality
  • Data Reliability
  • Data Solutions
  • Data System Resilience
  • Data Teams
  • Data Testing
  • Data Transformation
  • Data Validation
  • Data Vault
  • Data Warehouse
  • Data Warehouse Architecture
  • Data Warehousing
  • Database Design
  • DataDemocratization
  • DataEngineering
  • Datafold
  • DataGovernance
  • DataMinimization
  • DataModeling
  • DataPipelines
  • DataPrivacy
  • DataQuality
  • DataTools
  • DataValidation
  • DataWarehouse
  • Dbt
  • Decision Making
  • Delta Lake
  • Development
  • Development Tools
  • DevOps
  • Dimensional Modeling
  • DimensionalModeling
  • Documentation
  • DuckDB
  • Emergency Fund
  • Emotional Intelligence
  • EmpatheticDesign
  • Employee Engagement
  • Employee Experience
  • Employee Productivity
  • Engineering Career
  • Engineering Culture
  • Engineering Leadership
  • Enterprise
  • Estimation
  • ETL
  • ETL Pipeline
  • Family Gaming
  • Feedback
  • File Formats
  • Financial Crisis
  • Financial Independence
  • FinOps
  • Fivetran
  • Frameworks
  • Friendship
  • Future of Work
  • GCP
  • GDPR
  • Git
  • GitBash
  • GitHub
  • GitHub Actions
  • Grief
  • Hiring Strategies
  • Historical Load
  • Human Connection
  • Idempotency
  • Incentives
  • Incident Response
  • Industry Trends
  • Innovation
  • Inspirational Quote
  • Intergroup Conflict
  • Interviews
  • Job Security
  • Journal
  • Journaling Techniques
  • JSON
  • Junior Engineer
  • Kimball
  • Kimball Methodology
  • Lakehouse
  • Lambda
  • Language Models
  • Leadership
  • Legacy Systems
  • Life
  • LLM
  • LLM Interaction
  • Loss
  • MacOS
  • Management
  • Mental Health
  • Mentorship
  • Mindfulness Practices
  • Minecraft
  • Modern Data Stack
  • Moral Development
  • Motivation
  • Onboarding
  • One-on-One Meetings
  • Open Source
  • OpenFlow
  • OpenSource
  • ORC
  • Organisational Culture
  • Organizational Culture
  • Parquet
  • Pattern Bank
  • Performance Optimization
  • Performance Reviews
  • Personal
  • Personal Growth
  • Pipeline
  • Pipeline Architecture
  • Pipeline Design
  • Pipeline Optimization
  • Platform Strategy
  • PostegreSQL
  • Pragmatism
  • Presentation-Skills
  • Problem Solving
  • Production Issues
  • Productivity
  • Professional Development
  • Professional Growth
  • Professional Relationships
  • Professional-Skills
  • Project Management
  • Promotion
  • Psychological Safety
  • Public-Speaking
  • Python
  • RAG
  • Recruitment
  • Redundancy
  • Refactoring
  • Remote Work
  • Reputation
  • RequirementGathering
  • RetentionPolicies
  • RFC 4180
  • Risk Management
  • Robbers Cave Experiment
  • ROI
  • Roleplaying
  • S3
  • Salesforce
  • SCD
  • SCD Type 2
  • Schema Drift
  • Schema Evolution
  • Self-Awareness
  • Self-Reflection
  • Server Setup
  • ServiceDesign
  • ShadowIT
  • Snowflake
  • Soft Skills
  • Solution Design
  • SQL
  • SQL Standards
  • Sql-Agents
  • Sql-Validation
  • SSH
  • SSH Keys
  • Staff Engineer
  • Stakeholder Engagement
  • Stakeholder Management
  • StakeholderManagement
  • Star Schema
  • Starburst
  • Step Functions
  • Strangler Fig
  • Strategy
  • Strengths
  • Success Habits
  • Talent Acquisition
  • Team Building
  • Team Collaboration
  • Team Culture
  • Team Enablement
  • Team Leadership
  • Team-Management
  • Technical Assessment
  • Technical Debt
  • Technical Leadership
  • Technical Strategy
  • Testing
  • Tools and Access
  • Trino
  • Trust
  • Trust Building
  • Trust Crisis
  • UserExperience
  • UV
  • UV Package Manager
  • Value Creation
  • Vector Databases
  • Virtual Environments
  • Visualization
  • Vocal-Techniques
  • VSCode
  • WAP Pattern
  • Wellbeing
  • Windows
  • Work-Life Balance
  • Workplace Communication
  • Workplace Relationships
  • Workplace Stress
  • Write-Audit-Publish
  • Zsh
Hero Image
SQL Tells You What. Comments Tell You Why.

The best SQL doesn’t need comments. Write meaningful CTE names, descriptive aliases, clear column labels — and a skilled reader will follow your logic without a single annotation. That’s the right instinct. It’s also only half right. SQL is a declarative language. You’re not writing how the database retrieves your data; you’re writing what you want. That’s a useful distinction, because “what” and “why” are very different questions, and SQL can answer exactly one of them.

  • SQL
  • dbt
  • Documentation
  • Data Quality
  • Code Comments
  • Data Pipelines
  • Best Practices
Saturday, June 6, 2026 Read
Hero Image
Write-Audit-Publish with Iceberg Tables in Snowflake

It was a Tuesday afternoon when the analyst pinged me on Microsoft Teams: “Hey, the Total Portfolio numbers just jumped 40% overnight. Did we land a whale?” We hadn’t. What actually happened was more mundane and significantly more painful. A schema change in the source system introduced a currency conversion bug. Our pipeline dutifully loaded the corrupted data into production at 3 AM, the dashboards updated by 6 AM, and the Department Head opened her morning report to numbers that looked like champagne-worthy growth.

  • Apache Iceberg
  • Snowflake
  • WAP Pattern
  • Data Quality
  • SQL
  • Lakehouse
  • Data Pipelines
  • Best Practices
Friday, February 27, 2026 Read
Hero Image
Healing Tables: When Day-by-Day Backfills Become a Slow-Motion Disaster

It was 2 AM on a Saturday when I realized we’d been loading data wrong for six months. The situation: a customer dimension with three years of history needed to be backfilled after a source system migration. The previous team’s approach was straightforward—run the daily incremental process 1,095 times, once for each day of history. They estimated three weeks to complete. What they hadn’t accounted for was how errors compound. By the time I looked at the data, we had 47,000 records with overlapping date ranges, 12,000 timeline gaps where customers seemed to vanish and reappear, and an unknowable number of missed changes from when source systems updated the same record multiple times in a single day.

  • SCD
  • Historical Load
  • dbt
  • SQL
  • Data Quality
  • Dimensional Modeling
  • Delta Lake
  • Best Practices
Saturday, February 7, 2026 Read