Data Ownership

Tags

AI Business Applications

AI Communication

AI Productivity

AnalyticsEngineering

Bedrock Edition

Blue-Green Deployment

Business-Communication

Career Development

Career Planning

Career Strategy

Change Management

Cloud Computing

ConceptualDataModeling

Continuous Learning

ContinuousIntegration

Cost Optimization

Data Architecture

Data Engineering

Data Governance

Data Leadership

Data Reliability

Data System Resilience

Data Transformation

Data Validation

Data Warehouse Architecture

Data Warehousing

Database Design

DataDemocratization

DataEngineering

DataMinimization

Decision Making

Development Tools

Dimensional Modeling

DimensionalModeling

Emotional Intelligence

EmpatheticDesign

Employee Engagement

Employee Productivity

Engineering Career

Financial Crisis

Financial Independence

Hiring Strategies

Historical Load

Incident Response

Industry Trends

Inspirational Quote

Intergroup Conflict

Journaling Techniques

Kimball Methodology

Language Models

LLM Interaction

Mindfulness Practices

Moral Development

One-on-One Meetings

Organizational Culture

Performance Optimization

Personal Growth

Pipeline Design

Presentation-Skills

Problem Solving

Production Issues

Professional Development

Professional Growth

Professional Relationships

Professional-Skills

Psychological Safety

Public-Speaking

RequirementGathering

RetentionPolicies

Risk Management

Robbers Cave Experiment

Schema Evolution

Self-Reflection

Stakeholder Engagement

Stakeholder Management

StakeholderManagement

Talent Acquisition

Team Collaboration

Team Enablement

Team-Management

Technical Assessment

Technical Leadership

Technical Strategy

Tools and Access

UV Package Manager

Vector Databases

Virtual Environments

Vocal-Techniques

Work-Life Balance

Workplace Communication

Workplace Relationships

Workplace Stress

Write-Audit-Publish

The Data Quality Test: 10 Questions That Predict Pipeline Disasters

I’ve been writing about data quality a lot lately. Enough that I notice myself doing it. Enough that a small voice says: haven’t you made this point already? Schema drift, NULL propagation, duplicate records, the whole catalogue of things that go wrong in the space between a source system and a warehouse. I keep circling back to it. And every time, I almost talk myself out of writing the piece. Then I reflect on what’s happened in the last few years of work. The postmortems I’ve read, the pipelines I’ve inherited — and the same pattern shows up with depressing regularity. Not exotic failures. Not edge cases. The boring stuff. The questions nobody asked before the first row hit the warehouse.

Friday, April 11, 2025 Read