Ghost in the data
  • Home
  • About
  • Posts
  • Topics
  • Resources
  • RSS
  • Tags
  • 2026 Trends
  • AI
  • AI Agents
  • AI Bubble
  • AI Business Applications
  • AI Communication
  • AI Concepts
  • AI Ethics
  • AI Productivity
  • AI Prompting
  • AI Tools
  • AI Workflows
  • Airflow
  • Analytics
  • AnalyticsEngineering
  • Anonymization
  • Apache Airflow
  • Apache Iceberg
  • Architecture
  • Athena
  • Automation
  • AVRO
  • AWS
  • AWS Glue
  • BankingData
  • Bedrock Edition
  • Best Practices
  • BigData
  • Blue-Green Deployment
  • Budgeting
  • Burnout
  • Business Case
  • Business Value
  • Business-Communication
  • Career Advice
  • Career Development
  • Career Growth
  • Career Planning
  • Career Strategy
  • Change Management
  • Chapter Lead
  • ChatGPT
  • CI/CD
  • Claude
  • Claude-Code
  • Cloud Computing
  • Cloud Gaming
  • Code Review
  • Collaboration
  • Communication
  • ConceptualDataModeling
  • Continuous Learning
  • ContinuousIntegration
  • Cost Optimization
  • CSV
  • Culture
  • Data Architecture
  • Data Contracts
  • Data Culture
  • Data Engineering
  • Data Ethics
  • Data Governance
  • Data Impact
  • Data Ingestion
  • Data Leadership
  • Data Modeling
  • Data Modelling
  • Data Ownership
  • Data Pipeline
  • Data Pipelines
  • Data Platforms
  • Data Quality
  • Data Reliability
  • Data Solutions
  • Data System Resilience
  • Data Teams
  • Data Testing
  • Data Transformation
  • Data Validation
  • Data Vault
  • Data Warehouse
  • Data Warehouse Architecture
  • Data Warehousing
  • Database Design
  • DataDemocratization
  • DataEngineering
  • Datafold
  • DataGovernance
  • DataMinimization
  • DataModeling
  • DataPipelines
  • DataPrivacy
  • DataQuality
  • DataTools
  • DataValidation
  • DataWarehouse
  • Dbt
  • Decision Making
  • Delta Lake
  • Development
  • Development Tools
  • DevOps
  • Dimensional Modeling
  • DimensionalModeling
  • Documentation
  • DuckDB
  • Emergency Fund
  • Emotional Intelligence
  • EmpatheticDesign
  • Employee Engagement
  • Employee Productivity
  • Engineering Career
  • ETL
  • ETL Pipeline
  • Family Gaming
  • Feedback
  • File Formats
  • Financial Crisis
  • Financial Independence
  • Frameworks
  • Friendship
  • Future of Work
  • GCP
  • GDPR
  • Git
  • GitBash
  • GitHub
  • GitHub Actions
  • Grief
  • Hiring Strategies
  • Historical Load
  • Idempotency
  • Incident Response
  • Industry Trends
  • Innovation
  • Inspirational Quote
  • Intergroup Conflict
  • Interviews
  • Job Security
  • Journal
  • Journaling Techniques
  • JSON
  • Junior Engineer
  • Kimball
  • Kimball Methodology
  • Lakehouse
  • Lambda
  • Language Models
  • Leadership
  • Legacy Systems
  • Life
  • LLM
  • LLM Interaction
  • Loss
  • MacOS
  • Management
  • Mental Health
  • Mentorship
  • Mindfulness Practices
  • Minecraft
  • Moral Development
  • Onboarding
  • One-on-One Meetings
  • OpenSource
  • ORC
  • Organizational Culture
  • Parquet
  • Performance Optimization
  • Personal
  • Personal Growth
  • Pipeline
  • Pipeline Design
  • PostegreSQL
  • Pragmatism
  • Presentation-Skills
  • Problem Solving
  • Production Issues
  • Productivity
  • Professional Development
  • Professional Growth
  • Professional Relationships
  • Professional-Skills
  • Promotion
  • Psychological Safety
  • Public-Speaking
  • Python
  • RAG
  • Recruitment
  • Redundancy
  • Refactoring
  • Remote Work
  • Reputation
  • RequirementGathering
  • RetentionPolicies
  • RFC 4180
  • Risk Management
  • Robbers Cave Experiment
  • ROI
  • Roleplaying
  • S3
  • SCD
  • SCD Type 2
  • Schema Drift
  • Schema Evolution
  • Self-Awareness
  • Self-Reflection
  • Server Setup
  • ServiceDesign
  • ShadowIT
  • Snowflake
  • Soft Skills
  • SQL
  • SQL Standards
  • Sql-Agents
  • Sql-Validation
  • SSH
  • SSH Keys
  • Staff Engineer
  • Stakeholder Engagement
  • Stakeholder Management
  • StakeholderManagement
  • Star Schema
  • Starburst
  • Step Functions
  • Strategy
  • Strengths
  • Success Habits
  • Talent Acquisition
  • Team Building
  • Team Collaboration
  • Team Culture
  • Team Enablement
  • Team-Management
  • Technical Assessment
  • Technical Debt
  • Technical Leadership
  • Technical Strategy
  • Testing
  • Tools and Access
  • Trino
  • Trust
  • Trust Building
  • Trust Crisis
  • UserExperience
  • UV
  • UV Package Manager
  • Value Creation
  • Vector Databases
  • Virtual Environments
  • Visualization
  • Vocal-Techniques
  • VSCode
  • WAP Pattern
  • Wellbeing
  • Windows
  • Work-Life Balance
  • Workplace Communication
  • Workplace Relationships
  • Workplace Stress
  • Write-Audit-Publish
  • Zsh
Hero Image
The Data Quality Test: 10 Questions That Predict Pipeline Disasters

I’ve been writing about data quality a lot lately. Enough that I notice myself doing it. Enough that a small voice says: haven’t you made this point already? Schema drift, NULL propagation, duplicate records, the whole catalogue of things that go wrong in the space between a source system and a warehouse. I keep circling back to it. And every time, I almost talk myself out of writing the piece. Then I reflect on what’s happened in the last few years of work. The postmortems I’ve read, the pipelines I’ve inherited — and the same pattern shows up with depressing regularity. Not exotic failures. Not edge cases. The boring stuff. The questions nobody asked before the first row hit the warehouse.

  • Data Quality
  • Pipeline Design
  • Schema Drift
  • Idempotency
  • Data Contracts
  • Incident Response
  • Data Ownership
Friday, April 11, 2025 Read
Hero Image
Leveraging LLMs for Business Impact: Part 2 - Building an AI Data Engineer Agent

Introduction In Part 1 of this series, we explored the theoretical foundations of Large Language Models (LLMs), Retrieval Augmented Generation (RAG), and vector databases. Now, it’s time to put theory into practice. This is going to be a long read, so grab some coffee, and one (couple) of your favorite biscuits. One use case for leveraging LLM’s, is creating of a Agent - a Senior Data Engineer AI that automatically reviews Pull Requests in your data engineering projects. This agent will be that nit picky Data Engineer that enforces SQL formatting standards, ensure naming and data type consistency, validate data quality checks, and suggest improvements based on best practices. By integrating this into your GitHub workflow, you can maintain higher code quality, accelerate onboarding for new team members, and reduce the burden of manual code reviews.

  • GitHub Actions
  • CI/CD
  • AI Agents
  • Code Review
  • Data Quality
  • DBT
  • SQL Standards
Saturday, March 8, 2025 Read
Hero Image
Maximizing Data Impact: A Guide to Effective Data Engineering

Introduction Creating impact goes far beyond writing efficient code or building robust pipelines. It’s about understanding how your work translates into tangible value for stakeholders across the organization. Types of Impact Our work forms the backbone of data-driven decision making in organizations. However, measuring and communicating this impact isn’t always straightforward. If you feel your work isn’t making a meaningful difference, it might be time to pivot your focus or approach. Understanding the various ways we create value helps guide these decisions and ensures we’re contributing in ways that matter.

  • Data Impact
  • Visualization
  • Stakeholder Management
  • Team Enablement
  • Data Quality
Saturday, February 15, 2025 Read
Hero Image
Mastering Data Engineering: Insights and Best Practices

Introduction I have been working with Data for a bit over 17 years now, I have seen it evolve from its nascent stages to a cornerstone of the tech industry. The journey has been nothing short of revolutionary, impacting businesses and society at large. The evolution and the role of a data engineer have expanded, requiring not just technical skills, but a deep understanding of business, security, and the human element within technology.

  • Culture
  • Continuous Learning
  • Data Quality
  • Professional Growth
  • Data Pipeline
  • Data System Resilience
  • Team Collaboration
Saturday, March 30, 2024 Read
  • ««
  • «
  • 1
  • 2
  • »
  • »»