Ghost in the data
  • Home
  • About
  • Posts
  • Posts
  • 2025
    • UV Tools
    • Zsh Virtual Environments
    • 2025 Data Trends
    • Data Modeling Approaches
    • MacOS Dev Setup
    • Windows Dev Setup
    • Business Context Guide
    • Data Impact
    • Data Engineering Interviews
    • First 90 Days as Data Engineer
    • Senior to Staff Engineer
    • LLMs for Business Part 1
    • LLMs for Business Part 2
    • Mastering 1:1 Meetings
    • AI Prompting Secret
    • Conceptual Data Modeling
    • WAP Pattern for Data Pipelines
    • AI Simplified
  • 2024
    • Delta-lake
    • Data Normalisation
    • Data Profiling
    • Defensive Engineering
    • CI/CD
    • Setup Docker and Airflow
    • Find and Attract Data Engineers
    • 17 Years of Insights
    • Relationship Building
    • Individual Contributor
  • 2023
    • GitBash with SSH
    • Journalling
    • Minecraft Server in GCP
    • Onboarding a data team
    • File Format for Big Data
    • Incident Management
    • Data Vault
    • Books that are worth you time?
Hero Image
Individual Contributor to Senior Manager of Data

Introduction Starting a new role at any organization—whether it’s a school, a workplace, or another setting—typically begins with a focus on individual contribution. Your success is directly tied to your personal efforts. You have control over the pace and quality of your work, and ultimately, you are solely accountable for your outcomes. This phase allows you to develop the skills and discipline necessary to excel in more complex roles. The Path to Success as an Individual Contributor During my time in this phase, I likely spent longer than most. I always had the mindset of making my manager—and by extension, my team—look good. This meant not only delivering quality work but also taking full accountability for my tasks.

    Saturday, August 10, 2024 Read
    Hero Image
    Enhance Workplace Relationships

    Introduction: A Tale of Two Tribes and the Modern Workplace Imagine a serene summer camp in the rugged heart of Robbers Cave State Park, Oklahoma, 1954. Two groups of boys, unaware of each other’s existence, are about to embark on an adventure that mirrors the timeless tale of rivalry and reconciliation—a story that still resonates in the corridors of contemporary workplaces. The Robbers Cave Experiment, conducted by social psychologist Muzafer Sherif, is not just a fascinating study on group dynamics; it’s a blueprint for understanding and enhancing cooperation in any setting where diverse minds meet. This experiment beautifully illustrates how perceived differences can dissolve into unity, given the right conditions and shared objectives.

    • Robbers Cave Experiment
    • Workplace Relationships
    • Team Collaboration
    • Intergroup Conflict
    • Employee Productivity
    Saturday, April 6, 2024 Read
    Hero Image
    Mastering Data Engineering: Insights and Best Practices

    Introduction I have been working with Data for a bit over 17 years now, I have seen it evolve from its nascent stages to a cornerstone of the tech industry. The journey has been nothing short of revolutionary, impacting businesses and society at large. The evolution and the role of a data engineer have expanded, requiring not just technical skills, but a deep understanding of business, security, and the human element within technology.

    • Culture
    • Continuous Learning
    • Data Quality
    • Professional Growth
    • Data Pipeline
    • Data System Resilience
    • Team Collaboration
    Saturday, March 30, 2024 Read
    Hero Image
    How to Find and Attract Top Data Engineers

    Introduction In my journey of filling open positions, I tend to get inundated with a multitude of resumes. Sifting through applications, your reaction varies from “this might work,” to a straightforward “no”. Rarely do I encounter a resume that makes me exclaim, “This person is exceptional! We need them on our team.” Despite reviewing thousands of job applications, the quest to find a standout Data Engineer often feels challenging. I believe there’s a reason for this rarity. The truth is, that the most talented Data Engineers, along with top professionals in any field, are seldom actively seeking employment.

    • Culture
    • Employee Engagement
    • Hiring Strategies
    • Talent Acquisition
    • Recruitment
    Thursday, March 14, 2024 Read
    Hero Image
    Docker and Airflow: A Comprehensive Setup Guide

    Introduction Docker and Airflow are like peanut butter and jelly for data engineers; they just work perfectly together. Docker simplifies deployment by wrapping your applications in containers, ensuring consistency across environments. It’s like having a genie that makes sure your software behaves the same, no matter where you deploy it. On the flip side, Airflow is the maestro of orchestrating complex workflows, making it a go-to tool for managing data pipelines in various organizations.

    • Apache Airflow
    • ETL Pipeline
    • Data Engineering
    • PostegreSQL
    Saturday, March 9, 2024 Read
    Hero Image
    Optimizing CI/CD with SlimCi DBT for Efficient Data Engineering

    Introduction In the rapidly evolving landscape of software development and data engineering, the ability to adapt and respond to changes quickly is not just an advantage; it’s a necessity. One of the core practices enabling this agility is Continuous Integration (CI), a methodology that encourages developers to integrate their work into a shared repository early and often. At its heart, CI embodies the “fail fast” principle, a philosophy that values early detection of errors and inconsistencies, allowing teams to address issues before they escalate into more significant problems.

    • Pipeline
    Saturday, February 17, 2024 Read
    Hero Image
    Embracing Defensive Engineering: A Proactive Approach to Data Pipeline Integrity

    Introduction Have you ever had a data pipeline fall apart due to unexpected errors? In the ever-evolving landscape of data, surprises lurk around every corner. Defensive engineering, a methodology focused on preempting and mitigating data anomalies in data pipelines, plays a crucial role in building reliable data pipelines. It’s not just about fixing problems as they arise; it’s about anticipating potential issues and addressing them before they wreak havoc. Below I’ll explore the various facets of defensive engineering, from the basics of handling nulls and type mismatches to the more complex challenges of ensuring data integrity and handling late-arriving data. Whether you’re a seasoned data engineer or just starting out, understanding these principles is key to creating data pipelines that are not just functional, but also robust and secure in the face of unpredictable data challenges.

    • Data Modelling
    Sunday, February 11, 2024 Read
    Hero Image
    Navigating the Data Labyrinth: The Art of Data Profiling

    Introduction Imagine navigating a sprawling network of interconnected threads, each strand holding a vital clue. That’s the world of data for us, and profiling is our key to unlocking its secrets. It’s like deciphering a cryptic message, each character a piece of information waiting to be understood. But why is this so important? Ever encountered an error in your analysis, or a misleading conclusion based on faulty data? Data profiling helps us avoid these pitfalls by ensuring the data we work with is accurate, consistent, and ready to yield valuable insights. It’s like building a sturdy foundation before constructing a skyscraper.

    • Data Modelling
    Sunday, January 28, 2024 Read
    Hero Image
    Taming the Chaos: Your Guide to Data Normalisation

    Introduction Have you ever felt like you were drowning in a sea of data, where every byte seemed to play a game of hide and seek? In the digital world, where data reigns supreme, it’s not uncommon to find oneself navigating through a labyrinth of disorganised, redundant, and inconsistent information. But fear not, brave data navigators! There exists a beacon of order in this chaos: data normalisation. Data normalisation isn’t just a set of rules to follow; it’s the art of bringing structure and clarity to your data universe. It’s about transforming a jumbled jigsaw puzzle into a masterpiece of organisation, where every piece fits perfectly. Let’s embark on a journey to demystify this hero of the database world and discover how it can turn your data nightmares into a dream of efficiency and accuracy.

    • Data Modelling
    Sunday, January 21, 2024 Read
    Hero Image
    Delta-lake - Z-Ordering, Z-Cube, Liquid Clustering and Partitions

    Introduction Ever feel like your data lake is more of a data swamp, swallowing queries whole and spitting out eternity? You’re not alone. Managing massive datasets can be a Herculean task, especially when it comes to squeezing out those precious milliseconds of query performance. But fear not, data warriors, for Delta Lake has hidden treasures waiting to be unearthed: Z-ordering, Z-cube, and liquid clustering. Partition Pruning: The OG Hero Before we dive into these exotic beasts, let’s pay homage to the OG hero of data organization: partition pruning. Imagine your data lake as a meticulously organized library, with each book (partition) shelved by a specific topic (partition column). When a query saunters in, it doesn’t have to wander through every aisle. It simply heads straight for the relevant section, drastically reducing the time it takes to find what it needs. That’s the magic of partition pruning!

    • Delta-lake
    Sunday, January 14, 2024 Read