Ghost in the data
  • Home
  • About
  • Posts
  • Topics
  • Resources
  • Tags
  • AI
  • AI Agents
  • AI Bubble
  • AI Business Applications
  • AI Communication
  • AI Concepts
  • AI Ethics
  • AI Productivity
  • AI Prompting
  • AI Workflows
  • Ai-Tools
  • Airflow
  • Analytics
  • AnalyticsEngineering
  • Anonymization
  • Apache Airflow
  • Apache Iceberg
  • Athena
  • Automation
  • AVRO
  • AWS
  • BankingData
  • Bedrock Edition
  • BigData
  • Blue-Green Deployment
  • Budgeting
  • Business Case
  • Business Value
  • Business-Communication
  • Career Advice
  • Career Development
  • Career Growth
  • Career Strategy
  • Chapter Lead
  • ChatGPT
  • CI/CD
  • Claude
  • Claude-Code
  • Cloud Computing
  • Cloud Gaming
  • Code Review
  • Communication
  • ConceptualDataModeling
  • Continuous Learning
  • ContinuousIntegration
  • CSV
  • Culture
  • Data Architecture
  • Data Culture
  • Data Engineering
  • Data Ethics
  • Data Governance
  • Data Impact
  • Data Ingestion
  • Data Leadership
  • Data Modeling
  • Data Modelling
  • Data Pipeline
  • Data Pipelines
  • Data Quality
  • Data Reliability
  • Data Solutions
  • Data System Resilience
  • Data Teams
  • Data Testing
  • Data Transformation
  • Data Validation
  • Data Vault
  • Data Warehouse
  • Data Warehouse Architecture
  • Database Design
  • DataDemocratization
  • DataEngineering
  • Datafold
  • DataGovernance
  • DataMinimization
  • DataModeling
  • DataPipelines
  • DataPrivacy
  • DataQuality
  • DataTools
  • DataValidation
  • DataWarehouse
  • Dbt
  • Decision Making
  • Delta-Lake
  • Development
  • Development Tools
  • DevOps
  • DimensionalModeling
  • Emergency Fund
  • Emotional Intelligence
  • EmpatheticDesign
  • Employee Engagement
  • Employee Productivity
  • Engineering Career
  • ETL
  • ETL Pipeline
  • Family Gaming
  • Feedback
  • File Formats
  • Financial Crisis
  • Financial Independence
  • Frameworks
  • GCP
  • GDPR
  • Git
  • GitBash
  • GitHub
  • GitHub Actions
  • Hiring Strategies
  • Incident Response
  • Industry Trends
  • Inspirational Quote
  • Intergroup Conflict
  • Interviews
  • Job Security
  • Journal
  • Journaling Techniques
  • JSON
  • Kimball
  • Lambda
  • Language Models
  • Leadership
  • LLM
  • LLM Interaction
  • MacOS
  • Management
  • Mental Health
  • Mentorship
  • Mindfulness Practices
  • Minecraft
  • Moral Development
  • Onboarding
  • One-on-One Meetings
  • OpenSource
  • ORC
  • Organizational Culture
  • Parquet
  • Performance Optimization
  • Personal Growth
  • Pipeline
  • PostegreSQL
  • Presentation-Skills
  • Problem Solving
  • Production Issues
  • Professional Development
  • Professional Growth
  • Professional Relationships
  • Professional-Skills
  • Promotion
  • Psychological Safety
  • Public-Speaking
  • Python
  • RAG
  • Recruitment
  • Redundancy
  • Remote Work
  • Reputation
  • RequirementGathering
  • RetentionPolicies
  • Risk Management
  • Robbers Cave Experiment
  • ROI
  • Roleplaying
  • S3
  • Schema Evolution
  • Self-Awareness
  • Self-Reflection
  • Server Setup
  • ServiceDesign
  • ShadowIT
  • SQL
  • SQL Standards
  • Sql-Agents
  • Sql-Validation
  • SSH
  • SSH Keys
  • Staff Engineer
  • Stakeholder Engagement
  • Stakeholder Management
  • StakeholderManagement
  • Star Schema
  • Starburst
  • Strategy
  • Strengths
  • Success Habits
  • Talent Acquisition
  • Team Building
  • Team Collaboration
  • Team Culture
  • Team Enablement
  • Team-Management
  • Technical Assessment
  • Technical Leadership
  • Testing
  • Tools and Access
  • Trino
  • Trust
  • Trust Building
  • Trust Crisis
  • UserExperience
  • UV
  • UV Package Manager
  • Value Creation
  • Vector Databases
  • Virtual Environments
  • Visualization
  • Vocal-Techniques
  • Vscode
  • WAP Pattern
  • Windows
  • Workplace Communication
  • Workplace Relationships
  • Workplace Stress
  • Write-Audit-Publish
  • Zsh
Hero Image
Embracing Defensive Engineering: A Proactive Approach to Data Pipeline Integrity

Introduction Have you ever had a data pipeline fall apart due to unexpected errors? In the ever-evolving landscape of data, surprises lurk around every corner. Defensive engineering, a methodology focused on preempting and mitigating data anomalies in data pipelines, plays a crucial role in building reliable data pipelines. It’s not just about fixing problems as they arise; it’s about anticipating potential issues and addressing them before they wreak havoc. Below I’ll explore the various facets of defensive engineering, from the basics of handling nulls and type mismatches to the more complex challenges of ensuring data integrity and handling late-arriving data. Whether you’re a seasoned data engineer or just starting out, understanding these principles is key to creating data pipelines that are not just functional, but also robust and secure in the face of unpredictable data challenges.

  • Data Modelling
Sunday, February 11, 2024 Read
Hero Image
Navigating the Data Labyrinth: The Art of Data Profiling

Introduction Imagine navigating a sprawling network of interconnected threads, each strand holding a vital clue. That’s the world of data for us, and profiling is our key to unlocking its secrets. It’s like deciphering a cryptic message, each character a piece of information waiting to be understood. But why is this so important? Ever encountered an error in your analysis, or a misleading conclusion based on faulty data? Data profiling helps us avoid these pitfalls by ensuring the data we work with is accurate, consistent, and ready to yield valuable insights. It’s like building a sturdy foundation before constructing a skyscraper.

  • Data Modelling
Sunday, January 28, 2024 Read
Hero Image
Taming the Chaos: Your Guide to Data Normalisation

Introduction Have you ever felt like you were drowning in a sea of data, where every byte seemed to play a game of hide and seek? In the digital world, where data reigns supreme, it’s not uncommon to find oneself navigating through a labyrinth of disorganised, redundant, and inconsistent information. But fear not, brave data navigators! There exists a beacon of order in this chaos: data normalisation. Data normalisation isn’t just a set of rules to follow; it’s the art of bringing structure and clarity to your data universe. It’s about transforming a jumbled jigsaw puzzle into a masterpiece of organisation, where every piece fits perfectly. Let’s embark on a journey to demystify this hero of the database world and discover how it can turn your data nightmares into a dream of efficiency and accuracy.

  • Data Modelling
Sunday, January 21, 2024 Read