Ghost in the data
  • Home
  • About
  • Posts
  • Tags
  • AI
  • AI Agents
  • AI Business Applications
  • Apache Airflow
  • Apache Iceberg
  • Automation
  • AVRO
  • Bedrock Edition
  • Business Value
  • Career Advice
  • Career Growth
  • Chapter Lead
  • CI/CD
  • Cloud Gaming
  • Code Review
  • Communication
  • Continuous Learning
  • CSV
  • Culture
  • Data Architecture
  • Data Culture
  • Data Engineering
  • Data Governance
  • Data Impact
  • Data Leadership
  • Data Modeling
  • Data Modelling
  • Data Pipeline
  • Data Quality
  • Data Solutions
  • Data System Resilience
  • Data Testing
  • Data Transformation
  • Data Vault
  • Data Warehouse
  • Data Warehouse Architecture
  • Database Design
  • DBT
  • Delta-Lake
  • Development
  • Development Tools
  • Emotional Intelligence
  • Employee Engagement
  • Employee Productivity
  • Engineering Career
  • ETL
  • ETL Pipeline
  • Family Gaming
  • Feedback
  • File Formats
  • GCP
  • Git
  • GitBash
  • Github
  • GitHub Actions
  • Hiring Strategies
  • Incident Response
  • Industry Trends
  • Inspirational Quote
  • Intergroup Conflict
  • Interviews
  • Journal
  • Journaling Techniques
  • JSON
  • LLM
  • MacOS
  • Management
  • Mentorship
  • Mindfulness Practices
  • Minecraft
  • Onboarding
  • One-on-One Meetings
  • ORC
  • Parquet
  • Performance Optimization
  • Personal Growth
  • Pipeline
  • PostegreSQL
  • Problem Solving
  • Professional Development
  • Professional Growth
  • Promotion
  • Python
  • RAG
  • Recruitment
  • Remote Work
  • Risk Management
  • Robbers Cave Experiment
  • Schema Evolution
  • Self-Reflection
  • Server Setup
  • SQL
  • SQL Standards
  • SSH
  • SSH Keys
  • Staff Engineer
  • Stakeholder Engagement
  • Stakeholder Management
  • Star Schema
  • Success Habits
  • Talent Acquisition
  • Team Collaboration
  • Team Enablement
  • Technical Assessment
  • Technical Leadership
  • Tools and Access
  • Trust Building
  • UV
  • UV Package Manager
  • Value Creation
  • Vector Databases
  • Virtual Environments
  • Visualization
  • VSCode
  • Windows
  • Workplace Communication
  • Workplace Relationships
  • Zsh
Hero Image
Delta-lake - Z-Ordering, Z-Cube, Liquid Clustering and Partitions

Introduction Ever feel like your data lake is more of a data swamp, swallowing queries whole and spitting out eternity? You’re not alone. Managing massive datasets can be a Herculean task, especially when it comes to squeezing out those precious milliseconds of query performance. But fear not, data warriors, for Delta Lake has hidden treasures waiting to be unearthed: Z-ordering, Z-cube, and liquid clustering. Partition Pruning: The OG Hero Before we dive into these exotic beasts, let’s pay homage to the OG hero of data organization: partition pruning. Imagine your data lake as a meticulously organized library, with each book (partition) shelved by a specific topic (partition column). When a query saunters in, it doesn’t have to wander through every aisle. It simply heads straight for the relevant section, drastically reducing the time it takes to find what it needs. That’s the magic of partition pruning!

  • Delta-lake
Sunday, January 14, 2024 Read