Ghost in the data
  • Home
  • About
  • Posts
  • Posts
  • 2025
    • UV Tools
    • Zsh Virtual Environments
    • 2025 Data Trends
    • Data Modeling Approaches
    • MacOS Dev Setup
    • Windows Dev Setup
    • Business Context Guide
    • Data Impact
    • Data Engineering Interviews
    • First 90 Days as Data Engineer
    • Senior to Staff Engineer
    • LLMs for Business Part 1
    • LLMs for Business Part 2
    • Mastering 1:1 Meetings
    • AI Prompting Secret
    • Conceptual Data Modeling
    • WAP Pattern for Data Pipelines
    • AI Simplified
  • 2024
    • Delta-lake
    • Data Normalisation
    • Data Profiling
    • Defensive Engineering
    • CI/CD
    • Setup Docker and Airflow
    • Find and Attract Data Engineers
    • 17 Years of Insights
    • Relationship Building
    • Individual Contributor
  • 2023
    • GitBash with SSH
    • Journalling
    • Minecraft Server in GCP
    • Onboarding a data team
    • File Format for Big Data
    • Incident Management
    • Data Vault
    • Books that are worth you time?
Hero Image
2023 - Books that are worth you time?

Introduction As a Data Engineer, it’s crucial to constantly improve your skills and knowledge to stay ahead of the curve. Whether it’s working with large data sets, building efficient data pipelines, or collaborating with a team, there are many different aspects to consider. To help you succeed, I’ve put together a list of books that cover a range of topics, from culture and team building to Python and SQL. Each of the books I’ve selected offers valuable insights and practical advice to help you become a better Data Engineer. Whether you’re looking to strengthen your coding skills, learn how to effectively communicate with your team, or improve your organization’s data processes, there’s something here for everyone. So, without further ado, let’s dive into the books that can help you take your skills to the next level.

  • Development
Sunday, March 5, 2023 Read
Hero Image
Data Vault Data Modeling with Python and dbt

Introduction Data Vault is a data modeling technique that is specifically designed for use in Data Warehouses. It is a hybrid approach that combines the best elements of 3rd Normal Form (3NF) and Star Schema to provide a flexible and scalable data modeling solution. Hubs, Links, Satellites A Data Vault consists of three main components: Hubs, Links, and Satellites. Hubs are the backbone of the Data Vault architecture and represent the entities within the data model. They are the core data elements and contain the primary key information.

  • Data Vault
  • Python
  • DBT
  • ETL
  • Data Warehouse Architecture
Sunday, February 26, 2023 Read
Hero Image
Navigating Incident Response Management with DevOps

Introduction Incident response management (IRM) is a critical aspect of any organization’s overall security and risk management strategy. In today’s fast-paced, technology-driven world, IT incidents can occur at any time, and it’s important to have a plan in place to effectively manage these incidents and minimize the impact they have on your organization. The IRM lifecycle is a structured approach to managing incidents, from identification to resolution, and it involves a range of activities, including communication, coordination, and control. In this post, I’ll explore the IRM lifecycle in detail, and discuss the roles and responsibilities of different individuals during each stage. I’ll also compare traditional incident management with devops incident management, and discuss the advantages of adopting a devops approach.

  • Incident Response
  • Risk Management
Sunday, February 19, 2023 Read
Hero Image
Choosing the Right File Format for Big Data: A Comparison of Parquet, JSON, ORC, Avro, and CSV

Introduction How you store your data is a critical component of data engineering, as they determine the speed, efficiency, and compatibility of data storage and retrieval. Lets have a look at some of the popular file formats: Parquet, JSON, ORC, Avro, and CSV. We’ll compare their pros and cons, performance differences between reading and writing, and the importance of predicate pushdown and projection pushdown. What is Predicate pushdown and Projection pushdown? Predicate pushdown and projection pushdown are two performance optimization techniques used in big data processing. They allow query engines to reduce the amount of data that needs to be processed by pushing down filter conditions and column projections to the storage layer.

  • File Formats
  • ORC
  • AVRO
  • CSV
  • JSON
  • Parquet
  • Schema Evolution
Sunday, February 12, 2023 Read
Hero Image
Onboarding a data team

Introduction Onboarding is so important to give a great impression, but also setting up the scaffolding of what a new employee would expect the culture to be like at a company. Needless to say, it’s not normally a great experience, it normally starts with spending the first two weeks getting access to systems and tools. Then once you have access, especially in the remote working environment - sometimes you get only a brief introduction with your manager, and get a small glimpse as to what you are to be working on. This common scenario, makes people feel less connected to the team, and no excitement or passion for the work - they feel undervalued, which isn’t a great start from day one or two.

  • Development
  • Onboarding
  • Employee Engagement
  • Culture
  • Tools and Access
  • Remote Work
Sunday, January 29, 2023 Read
Hero Image
Create a Minecraft (PE)Bedrock Server in GCP

Introduction My eldest daughter, plays with her friends a lot with the Minecraft Pocket Edition (PE). The PE Edition, really limits it to WiiU, xbox or ipad. This is the most connivent gaming version for them as they all have iPads. Occasionally the world they are all playing on gets corrupted and she looses all the work they have done. There is ways to recover it, but its rather time consuming.

  • GCP
  • Minecraft
  • Cloud Gaming
  • Server Setup
  • Family Gaming
  • Bedrock Edition
Sunday, January 15, 2023 Read
Hero Image
Journalling

Introduction Journalling is a habit that helps us reflect on the year, but also allows us to coach ourselves to be a better you. There is a element of getting it down in writing that allows us to “slow” down and focus on what we are writing. There is something about Journalling in physical form, as when you reflect back on past Journal there is some sort of bond you have with your hand writing, and you can also go back to that moment in time and remember when you wrote it.

  • Development
  • Journal
  • Journaling Techniques
  • Success Habits
  • Self-Reflection
  • Mindfulness Practices
  • Self-Reflection
  • Inspirational Quote
  • Self-Reflection
  • Personal Growth
Sunday, January 8, 2023 Read
Hero Image
Setting up GitBash with SSH on Windows

Introduction Git is a everyday tool for me. I remember reading about Distributed Version Control from Joel Spolsky years ago, when we decided to switch from SVN to GIT. Although I use it daily, most of the time it is - set and forget, when it comes to SSH keys to repositories. So whenever I get a new laptop or need to re-initialize repo, I have to re-teach myself the steps to get it working. So this is a bit of a guide, to help me navigate and get back on track when im lost next time.

  • Github
  • SSH
  • Git
  • GitBash
  • SSH Keys
Sunday, January 1, 2023 Read