The Broken Window in Your Data Pipeline
There’s a particular kind of data problem that doesn’t announce itself. It accumulates.
We were receiving Salesforce data through delta extraction — sensible in theory, because full snapshots can run to hundreds of terabytes and less than 1% of records change on any given day. The problem is that deltas require someone to know what “changed” means. In Salesforce, that’s less obvious than it sounds. Watch a last_modified column and you’ll miss objects that get updated when a related object changes, without their own timestamp reflecting it.