Ghost in the data
  • Home
  • About
  • Posts
  • Topics
  • Resources
  • Categories
  • AI Development
  • Analytics Engineering
  • Artificial Intelligence
  • AWS
  • Banking
  • Best Practices
  • Big Data
  • Business Technology
  • Career Development
  • Career Growth
  • Cloud Computing
  • Cloud Infrastructure
  • Communication
  • Conflict Resolution
  • Data Architecture
  • Data Culture
  • Data Engineering
  • Data Ethics
  • Data Governance
  • Data Modeling
  • Data Modelling
  • Data Pipelines
  • Data Privacy
  • Data Quality
  • Data Storage
  • Data Warehousing
  • Database Design
  • Dbt
  • Delta-Lake
  • Development
  • Development Tools
  • DevOps
  • Employee Engagement
  • Gaming Servers
  • Google Cloud Platform
  • Hiring
  • Industry Analysis
  • Interviews
  • IT Management
  • Leadership
  • Life Hacks
  • Mindfulness
  • Minecraft
  • Personal Development
  • Personal Finance
  • Pipeline
  • Pipeline Design
  • Productivity
  • Professional Development
  • Professional Growth
  • Promotion
  • Psychology
  • Python
  • Python Tools
  • Setup Guide
  • SQL
  • Stakeholder Management
  • Team Building
  • Team Culture
  • Team Management
  • Technical Architecture
  • Technology Trends
  • Tutorial
  • User Experience
  • Version Control
  • Workplace Dynamics
Hero Image
How to Find and Attract Top Data Engineers

Introduction In my journey of filling open positions, I tend to get inundated with a multitude of resumes. Sifting through applications, your reaction varies from “this might work,” to a straightforward “no”. Rarely do I encounter a resume that makes me exclaim, “This person is exceptional! We need them on our team.” Despite reviewing thousands of job applications, the quest to find a standout Data Engineer often feels challenging. I believe there’s a reason for this rarity. The truth is, that the most talented Data Engineers, along with top professionals in any field, are seldom actively seeking employment.

  • Culture
  • Employee Engagement
  • Hiring Strategies
  • Talent Acquisition
  • Recruitment
Thursday, March 14, 2024 Read
Hero Image
Data Vault Data Modeling with Python and dbt

Introduction Data Vault is a data modeling technique that is specifically designed for use in Data Warehouses. It is a hybrid approach that combines the best elements of 3rd Normal Form (3NF) and Star Schema to provide a flexible and scalable data modeling solution. Hubs, Links, Satellites A Data Vault consists of three main components: Hubs, Links, and Satellites. Hubs are the backbone of the Data Vault architecture and represent the entities within the data model. They are the core data elements and contain the primary key information.

  • Data Vault
  • Python
  • DBT
  • ETL
  • Data Warehouse Architecture
Sunday, February 26, 2023 Read
Hero Image
Choosing the Right File Format for Big Data: A Comparison of Parquet, JSON, ORC, Avro, and CSV

Introduction How you store your data is a critical component of data engineering, as they determine the speed, efficiency, and compatibility of data storage and retrieval. Lets have a look at some of the popular file formats: Parquet, JSON, ORC, Avro, and CSV. We’ll compare their pros and cons, performance differences between reading and writing, and the importance of predicate pushdown and projection pushdown. What is Predicate pushdown and Projection pushdown? Predicate pushdown and projection pushdown are two performance optimization techniques used in big data processing. They allow query engines to reduce the amount of data that needs to be processed by pushing down filter conditions and column projections to the storage layer.

  • File Formats
  • ORC
  • AVRO
  • CSV
  • JSON
  • Parquet
  • Schema Evolution
Sunday, February 12, 2023 Read
  • ««
  • «
  • 1
  • 2
  • 3
  • »
  • »»