Ghost in the data
  • Home
  • About
  • Posts
  • Topics
  • Categories
  • Analytics Engineering
  • Artificial Intelligence
  • Best Practices
  • Big Data
  • Business Technology
  • Career Development
  • Cloud Computing
  • Communication
  • Conflict Resolution
  • Data Engineering
  • Data Modeling
  • Data Modelling
  • Data Pipelines
  • Data Quality
  • Data Storage
  • Data Warehousing
  • Database Design
  • Dbt
  • Delta-Lake
  • Development
  • Development Tools
  • DevOps
  • Employee Engagement
  • Gaming Servers
  • Google Cloud Platform
  • Hiring
  • IT Management
  • Leadership
  • Life Hacks
  • Mindfulness
  • Minecraft
  • Personal Development
  • Pipeline
  • Pipeline Design
  • Productivity
  • Professional Development
  • Professional Growth
  • Promotion
  • Psychology
  • Python
  • Python Tools
  • Setup Guide
  • Stakeholder Management
  • Team Building
  • Team Management
  • Technology Trends
  • Tutorial
  • Version Control
  • Workplace Dynamics
Hero Image
Data Vault Data Modeling with Python and dbt

Introduction Data Vault is a data modeling technique that is specifically designed for use in Data Warehouses. It is a hybrid approach that combines the best elements of 3rd Normal Form (3NF) and Star Schema to provide a flexible and scalable data modeling solution. Hubs, Links, Satellites A Data Vault consists of three main components: Hubs, Links, and Satellites. Hubs are the backbone of the Data Vault architecture and represent the entities within the data model. They are the core data elements and contain the primary key information.

  • Data Vault
  • Python
  • DBT
  • ETL
  • Data Warehouse Architecture
Sunday, February 26, 2023 Read
Hero Image
Choosing the Right File Format for Big Data: A Comparison of Parquet, JSON, ORC, Avro, and CSV

Introduction How you store your data is a critical component of data engineering, as they determine the speed, efficiency, and compatibility of data storage and retrieval. Lets have a look at some of the popular file formats: Parquet, JSON, ORC, Avro, and CSV. We’ll compare their pros and cons, performance differences between reading and writing, and the importance of predicate pushdown and projection pushdown. What is Predicate pushdown and Projection pushdown? Predicate pushdown and projection pushdown are two performance optimization techniques used in big data processing. They allow query engines to reduce the amount of data that needs to be processed by pushing down filter conditions and column projections to the storage layer.

  • File Formats
  • ORC
  • AVRO
  • CSV
  • JSON
  • Parquet
  • Schema Evolution
Sunday, February 12, 2023 Read