Stop Building Salesforce Integrations From Scratch

Chris Hillman — Sat, 04 Apr 2026 09:00:00 +1100

Let me tell you about Marcus.

Marcus was on a team I led a few years back. Sharp, motivated, the kind of engineer who actually read documentation before writing code. When the business asked us to get Salesforce data into our warehouse, Marcus volunteered. He’d done API work before. He figured a few weeks, tops.

He scoped it carefully. Built a Python service that authenticated via OAuth, pulled Account, Contact, and Opportunity objects through the Bulk API, flattened the nested JSON into relational tables, handled pagination, managed rate limits. Wrote solid tests. Documented everything. The kind of work you’d point to in a code review and say this is how it’s done.

It worked beautifully. For about three months.

Then our Salesforce admin added a custom field to the Opportunity object — Renewal_Likelihood__c — and nobody told the data team. The pipeline didn’t fail. That’s the insidious part. It kept running, kept landing data. It just quietly dropped the new field on the floor.

When Marcus tracked it down, he added the field. Then he realised there were eleven other custom fields that had been added since go-live that the pipeline was silently ignoring. And the compound address fields — BillingAddress, ShippingAddress — had never worked with the Bulk API in the first place. He’d been extracting the component parts (BillingStreet, BillingCity) as a workaround, but the workaround had a bug that truncated postal codes for international addresses.

Marcus spent three weeks patching all of it. Three weeks of a talented engineer doing work that a managed connector handles automatically.

When I watched a good engineer spend the best part of a month on a problem that shouldn’t exist. Marcus wasn’t learning anything. He wasn’t growing. He was hand-coding schema detection logic for the fourth time because Salesforce ships three major releases a year and our sales ops team adds custom fields like they’re decorating a Christmas tree.

I’m telling you this because Snowflake OpenFlow is one of those things that sounds like “just another integration tool” until you’ve lived through the alternative. What I want to show you today is how to set it up with Salesforce — step by step, with enough detail that you could do it tomorrow — and more importantly, why the schema evolution feature alone justifies the switch from custom code.

If you’ve ever maintained a custom Salesforce integration, you already know why this matters.

The Problem Nobody Warns You About

Here’s what the “just build it yourself” crowd doesn’t tell you: the Salesforce API isn’t one API. It’s an ecosystem of overlapping interfaces, each with its own quirks, limits, and failure modes.

The REST API handles small, synchronous requests well — but try pulling a million Opportunity records through it and you’ll burn through your rate limits before lunch. The Bulk API 2.0 is designed for volume — it can handle up to 150 million records in a 24-hour window — but it doesn’t support compound fields like BillingAddress or MailingAddress. Those silently return nothing. Not an error. Nothing.

Then there’s SOQL, Salesforce’s proprietary query language, which looks enough like SQL to trick you into thinking you understand it. Until you hit the 2,000-record offset limit, or try to resolve a polymorphic relationship where a WhoId field on a Task could reference either a Contact or a Lead depending on the record. That’s not a data modelling problem — that’s a “your pipeline logic needs to branch based on the referenced object type” problem.

And rate limits. Enterprise Edition gives you 100,000 API calls per 24 hours, plus 1,000 per user licence. Sounds generous until you remember that your BI tool, marketing automation platform, customer support system, and your data pipeline are all drinking from the same well.

But schema evolution is the silent killer.

Every Salesforce org is different because every business is different. Your Salesforce admin creates custom objects and fields to match how your company actually works. That’s the point — it’s a customisation platform. But every custom field added to Salesforce is a potential break point for any integration that doesn’t handle schema changes automatically.

And those changes happen constantly. Research suggests schemas in enterprise SaaS tools change roughly every three days on average. In Salesforce specifically, between admin-driven customisation and Salesforce’s own three-times-a-year release cycle (Spring, Summer, Winter), the schema you built against last quarter may not be the schema you’re ingesting from today.

This is the trap. Building the initial integration isn’t hard. Any competent engineer can get Salesforce data into a warehouse. Keeping it working — across schema changes, API version deprecations, authentication token rotations, and rate limit adjustments — is where it eats your life.

I’ve seen teams eliminate two full days of monthly engineering maintenance by migrating off custom connectors. Two days, every month, spent on work that a managed tool does automatically. That’s not engineering — it’s janitorial work wearing an engineering costume.

What Is Snowflake OpenFlow?

OpenFlow is Snowflake’s native, fully managed data integration service. It’s not a partnership or a marketplace app — it’s a first-party feature built into the platform. If you’re already running Snowflake, OpenFlow runs inside your existing environment.

The backstory matters. In November 2024, Snowflake acquired a company called Datavolo for roughly $110 million. Datavolo was founded by Joe Witt, who co-created Apache NiFi back when it was an NSA project. That NiFi heritage is the foundation of OpenFlow — curated, versioned connector definitions powered by NiFi’s flow engine, wrapped in a managed Snowflake experience.

Architecturally, it splits into two pieces. The control plane lives inside Snowflake and gives you pipeline management, scheduling, and monitoring through the Snowsight UI. The data plane handles the actual work and can run in two modes: Snowflake Deployments (running on Snowpark Container Services within Snowflake’s own infrastructure) or BYOC (Bring Your Own Cloud, running as a Kubernetes cluster in your VPC). For most teams starting out, the SPCS option is simpler — it went generally available in November 2025 on AWS and Azure.

Data flows into Snowflake primarily through Snowpipe Streaming for the initial load, with merge queries handling incremental updates. The result is structured tables in your Snowflake database — not staged JSON files, not semi-structured VARIANT columns. Actual, queryable tables with proper column types.

The connector catalogue currently covers about 20 curated sources: databases (PostgreSQL, MySQL, SQL Server, Oracle), SaaS platforms (Salesforce, Workday, Jira, LinkedIn Ads, Meta Ads, Google Ads), cloud storage (Google Drive, SharePoint, Box), and streaming sources (Kafka, Kinesis). It’s not Fivetran’s 500+ connector library, and nobody’s pretending it is. But the sources it does cover are the ones that account for the vast majority of enterprise data movement.

One important caveat up front: the Salesforce Bulk API connector is still in preview as of March 2026. It works. Preview in Snowflake terms means the feature is functional but not yet covered by production support SLAs. Keep that in your mental model as we walk through the setup.

Setting Up the Salesforce Connector: A Walkthrough

This section is the step-by-step. I’m going to walk through the full setup — Salesforce configuration, Snowflake preparation, and connector deployment — with enough detail that you could follow along on your own org.

Phase 1: Salesforce Configuration

The connector uses JWT Bearer Flow for authentication, which means you need an RSA key pair and a Connected App configured in Salesforce.

Generate your RSA key pair. Open a terminal and run:

# Generate 2048-bit RSA private key
openssl genrsa -out salesforce_private_key.pem 2048

# Generate the corresponding public certificate (valid 1 year)
openssl req -new -x509 -key salesforce_private_key.pem \
 -out salesforce_certificate.pem -days 365

Keep that private key safe. You’ll need it when configuring the connector in Snowflake.

Create the External Client App in Salesforce. Navigate to Setup → Apps → App Manager → New External Client App. Enable OAuth and add two scopes: api and refresh_token (sometimes labelled offline_access). Enable the JWT Bearer Flow toggle and upload your salesforce_certificate.pem public certificate. Save the app and record the Consumer Key and Consumer Secret that Salesforce generates.

Configure access. On the app’s Manage settings, set the Permitted Users policy to “Admin approved users are pre-authorized.” Then assign the appropriate user profiles or permission sets. The user account the connector will authenticate as needs read access to every object you intend to sync.

Phase 2: Snowflake Preparation

Before deploying the connector, you need a database, schema, role, and warehouse ready for it.

-- Create a dedicated database for Salesforce data
CREATE DATABASE IF NOT EXISTS SALESFORCE_RAW;
CREATE SCHEMA IF NOT EXISTS SALESFORCE_RAW.OPENFLOW;

-- Create a connector-specific role
CREATE ROLE IF NOT EXISTS OPENFLOW_SF_CONNECTOR;

-- Grant the role permissions to write to the destination
GRANT USAGE ON DATABASE SALESFORCE_RAW TO ROLE OPENFLOW_SF_CONNECTOR;
GRANT USAGE ON SCHEMA SALESFORCE_RAW.OPENFLOW TO ROLE OPENFLOW_SF_CONNECTOR;
GRANT CREATE TABLE ON SCHEMA SALESFORCE_RAW.OPENFLOW TO ROLE OPENFLOW_SF_CONNECTOR;

-- Create a warehouse for merge operations
CREATE WAREHOUSE IF NOT EXISTS OPENFLOW_WH
 WAREHOUSE_SIZE = 'SMALL'
 AUTO_SUSPEND = 60
 AUTO_RESUME = TRUE;

GRANT USAGE ON WAREHOUSE OPENFLOW_WH TO ROLE OPENFLOW_SF_CONNECTOR;
GRANT OPERATE ON WAREHOUSE OPENFLOW_WH TO ROLE OPENFLOW_SF_CONNECTOR;

For SPCS deployments, you also need a network rule allowing the connector to reach your Salesforce instance. OpenFlow runs inside Snowflake’s container environment, and by default it can’t reach external endpoints without explicit permission:

-- Allow egress to your Salesforce instance
CREATE NETWORK RULE IF NOT EXISTS OPENFLOW_SF_EGRESS
 TYPE = HOST_PORT
 MODE = EGRESS
 VALUE_LIST = ('yourinstance.salesforce.com:443');

-- Wrap it in an external access integration
CREATE EXTERNAL ACCESS INTEGRATION IF NOT EXISTS OPENFLOW_SF_ACCESS
 ALLOWED_NETWORK_RULES = (OPENFLOW_SF_EGRESS)
 ENABLED = TRUE;

Replace yourinstance.salesforce.com with your actual Salesforce instance hostname. If you’re not sure what that is, log in to Salesforce and look at the URL bar — it’s the domain before .salesforce.com.

Phase 3: Deploying the Connector

Now the fun part. Navigate to Snowsight → Data → Openflow in the left sidebar. Find the “Openflow connector for Salesforce Bulk API” tile and add it to your runtime.

This opens the NiFi canvas — a visual flow editor where the connector’s pre-built process groups appear. Right-click the Salesforce process group and select Configure Parameters. Here’s where you’ll enter:

Salesforce Instance URL — e.g., https://yourinstance.salesforce.com
Consumer Key — from the Connected App you created
Consumer Secret — same source
Private Key — the contents of salesforce_private_key.pem (the full PEM text, headers included)
Username — the Salesforce user the connector authenticates as
Filter — the objects to sync, by API name

The Filter parameter is where you define what to pull. List standard objects by name (Account, Contact, Opportunity, Lead, Task, Event) and custom objects using their API name with the __c suffix:

There’s also a Special Objects Filter for objects that require different handling (like User or RecordType).

Once the parameters are configured, enable all controller services in the process group, then start the connector. The initial sync will pull a full snapshot of every object in your filter list. Subsequent runs perform incremental syncs using Salesforce’s SystemModstamp field to identify changed records.

What You Get

Once the connector runs, you’ll find one table per Salesforce object in SALESFORCE_RAW.OPENFLOW. The tables are flattened and structured — every Salesforce field becomes a column with an appropriate Snowflake data type. No VARIANT columns to parse. No nested JSON to unpack.

An Account table will have columns like ID, NAME, INDUSTRY, ANNUALREVENUE, BILLINGSTREET, BILLINGCITY, BILLINGSTATE, BILLINGPOSTALCODE, CREATEDDATE, LASTMODIFIEDDATE, and so on. Custom fields like Renewal_Likelihood__c appear as RENEWAL_LIKELIHOOD__C.

Two metadata columns are added automatically: ISDELETED (tracking Salesforce soft deletes) and the system timestamp fields. This is important — hard deletes in Salesforce are not reflected. If a record is permanently deleted in Salesforce, it won’t disappear from your Snowflake table. You’ll need to handle that in your downstream dbt models if it matters for your use case.

A Note on Multiple Sync Frequencies

Not every Salesforce object changes at the same pace. Your Opportunity table might need hourly syncs while Account is fine with daily. OpenFlow handles this by letting you deploy multiple connector instances in the same runtime — each with its own filter list and schedule — at no additional compute cost beyond the shared runtime.

Schema Evolution: Where OpenFlow Earns Its Keep

This is the feature that matters most, and the one that would have saved Marcus three weeks.

When someone adds a new custom field to a Salesforce object — say the sales ops team creates Deal_Confidence_Score__c on the Opportunity object — the OpenFlow connector automatically detects the new field and adds a corresponding column to the Snowflake destination table on the next sync. No configuration change. No redeployment. No Teams message at 7am asking the data team to “add that new field we made yesterday.”

The column appears, data starts flowing into it, and your dbt models can reference it whenever you’re ready.

For field renames, OpenFlow takes a conservative approach: the old column stays in place (with stale data from the last sync before the rename), and a new column is created under the new field name. This means you’ll have both Old_Field_Name__c and New_Field_Name__c in your table for a period. That’s actually useful for audit purposes — you can see exactly when the rename happened by comparing timestamps — but it does mean your downstream queries need updating.

Here’s the honest gap: new custom objects are not auto-discovered. If someone creates an entirely new custom object in Salesforce, you have to manually add it to the connector’s Filter parameter. Schema evolution handles field-level changes automatically, but object-level discovery is still a manual step.

And field type changes — say someone changes a text field to a picklist, or a number to a formula — aren’t comprehensively documented in the current OpenFlow materials. In practice, this is a rare enough occurrence that it hasn’t been a problem in my experience, but it’s worth knowing the edge case exists.

Compare this to what Marcus had to maintain. His custom integration used a hardcoded list of fields per object. Every time a field was added, renamed, or retyped, someone had to update the Python code, test it, deploy it, and verify the data. Usually that “someone” was Marcus, usually at an inconvenient time, and usually because nobody told him the change was coming.

The value of automatic schema evolution isn’t technical elegance. It’s that your engineers stop spending time on schema babysitting and start spending it on work that actually matters — building models, improving data quality, answering the questions the business is actually asking.

A Word About Fivetran

I’d be doing you a disservice if I wrote about managed Salesforce connectors without mentioning Fivetran. For most of the last decade, Fivetran has been the gold standard here, and their Salesforce connector is genuinely excellent.

Fivetran’s schema evolution handling is more mature and more configurable than OpenFlow’s current offering. They offer three schema change policies: ALLOW_ALL (sync everything automatically, including new tables), ALLOW_COLUMNS (auto-add new columns but not new tables), and BLOCK_ALL (manual opt-in for everything). They also generate a LOG table that records every schema change event — create_table, alter_table, drop_table — giving you an audit trail that’s useful for debugging and compliance. Field type widening (like INT to LONG) happens automatically; narrowing changes are blocked to prevent data loss.

Their pricing model is fundamentally different from OpenFlow. Fivetran charges per Monthly Active Row (MAR) — the count of distinct rows synced per connection per month. A unique primary key counts once regardless of how many times it’s updated. Approximate pricing clusters around $500 per million MAR on the Standard plan, with a $12,000 annual minimum commitment. Since early 2025, MAR is calculated per connection rather than account-wide, which eliminated the bulk discount that previously benefited multi-connector setups.

OpenFlow, by contrast, charges for compute — SPCS credits for the container runtime, Snowpipe Streaming costs for ingestion, and warehouse costs for merge operations. There’s no per-row fee and no separate licence. It’s included with Snowflake. The trade-off is that your management compute pool runs continuously while a deployment exists, which means you’re paying a base cost even when no data is flowing. One estimate I’ve seen puts idle costs at roughly $10 per day, though this varies with configuration.

For a single large Salesforce org, OpenFlow’s compute-based model may end up cheaper than Fivetran’s per-row pricing — especially if your Salesforce data doesn’t churn heavily. For environments where you need 30+ connectors across different source types, Fivetran’s breadth and maturity is hard to argue with. They have 500+ connectors. OpenFlow has about 20.

The honest framing: OpenFlow is the better choice if you’re already deeply invested in Snowflake, value platform consolidation, and your primary sources are in the current connector catalogue. Fivetran is the safer choice if you need broad connector coverage, battle-tested reliability, and you’d rather pay a premium for someone else to handle the ops.

They’re not mutually exclusive, either. I’ve seen teams run Fivetran for the long tail of small connectors while using OpenFlow for their highest-volume sources where compute-based pricing wins.

The Zero-Copy Alternative

There’s a third path I haven’t mentioned yet, and it’s worth understanding even if it’s not right for every team: zero-copy data federation.

Salesforce Data Cloud offers a zero-copy integration with Snowflake that flips the entire model on its head. Instead of extracting data from Salesforce and loading it into your warehouse, zero-copy gives you direct read access to Salesforce data without moving it at all. Salesforce’s query pushdown engine handles the work — when you query the federated data, Salesforce pushes the query to the source, filters and aggregates there, and returns only the results you need. No pipelines to build. No schemas to manage. No sync schedules to configure.

It works bidirectionally, too. You can share data from Snowflake back into Salesforce Data Cloud, letting sales reps see warehouse-enriched signals directly on Account and Opportunity pages without anyone exporting a CSV or building a reverse ETL pipeline. The integration runs on Apache Iceberg under the hood, which at least means the underlying format is open.

For handling new custom objects — the gap I flagged with OpenFlow — zero-copy sidesteps the problem entirely. There’s nothing to discover because there’s nothing to sync. The data stays where it is, and you access it in place. As your Salesforce org evolves, the federated view evolves with it.

That’s genuinely appealing. For proof-of-concept work, rapid prototyping, or use cases where you need real-time access to Salesforce data without building infrastructure, zero-copy is fast and effective.

But here’s where I get cautious.

It comes at a premium. Salesforce Data Cloud runs on a consumption credit model — credits are purchased in bundles (roughly $500 per 100,000 credits at list price), and every action burns them. Zero-copy federation queries consume about 70 credits per million records, which sounds reasonable until you’re running it at scale across multiple business units. The pricing model has simplified since late 2025 — Salesforce consolidated multiple credit types into a single fungible credit and made ingestion from core Salesforce products free — but it’s still a consumption model with real costs that compound quickly if you’re not watching.

More importantly, zero-copy creates a deep platform dependency. Your data access is mediated entirely through Salesforce and Snowflake’s partnership. If Salesforce changes their terms, adjusts their pricing (which they’ve done repeatedly), or if you decide to move off Snowflake to Databricks or BigQuery, that zero-copy integration doesn’t come with you. You’d need to build actual data movement infrastructure — the thing you avoided — under time pressure and with no existing pipeline to fall back on.

I value keeping infrastructure portable. Not because I’m paranoid about vendor lock-in — I’m pragmatic about it. I’ve been in this industry long enough to see what happens when a new CEO arrives and wants to renegotiate every vendor contract, or when a cloud provider changes their pricing structure overnight, or when the company you depend on gets acquired and the roadmap shifts. Having your data physically in your own warehouse, in formats you control, with pipelines you can redirect to a different destination — that’s insurance. Not glamorous insurance, but the kind you’re grateful for when you need it.

The balance is real, though. You always have to weigh portability against the time you spend building and maintaining pipelines. If your team is drowning in pipeline maintenance, zero-copy might buy you breathing room while you stand up proper infrastructure. Just go in with your eyes open about what you’re trading for that convenience.

There’s a more fundamental concern, though, and it’s the one that won’t appear in any vendor comparison chart.

Zero-copy makes it dangerously easy to skip the data warehouse entirely. When you give analysts and report builders direct read access to raw Salesforce data, the temptation to build dashboards straight from it is enormous. And for a quick proof of concept or a one-off analysis, fine. But the moment those PoC dashboards become “the dashboard the VP checks every Monday,” you’ve got a problem.

Raw source data doesn’t tell you which CustomerNo field to use — the one from the legacy migration, the one from the 2023 CRM consolidation, or the one from the current integration. It doesn’t encode the business rule that says “Closed Won” opportunities in APAC exclude training deals under $5,000 because those are tracked separately. It doesn’t flag that the Last_Activity_Date field is unreliable for accounts owned by the partner team because they log activities in a different system.

That knowledge lives in your data warehouse layer. It lives in the dbt models that your team built over months of conversations with stakeholders, through change management cycles, through debugging sessions where someone finally said “oh, we stopped using that field in Q3 because…” That’s not transformation logic — it’s institutional memory encoded as code. It connects raw data across domains and gives the business a full landscape picture instead of a narrow, single-source view. It catches the behavioural anomalies unique to your organisation — the sales rep who bulk-updates 500 records every Friday afternoon, the integration that occasionally double-fires during DST transitions, the custom object that gets repurposed every time the business restructures.

So by all means, evaluate zero-copy for the use cases where it shines. Real-time signals on Account pages for sales reps? Great fit. Federated access for a specific analytics team that knows the source data intimately? Reasonable. But don’t let it replace the warehouse layer. That layer is where the hard-won understanding of your data lives, and it’s harder to recreate than any pipeline.

The Honest Trade-Offs

I’ve been positive about OpenFlow because I think it solves a real problem well. But I’d be violating my own writing principles if I didn’t lay out the limitations clearly.

The Salesforce connector is still in preview. It works, but it’s not covered by production support SLAs. If you’re running a mission-critical pipeline that your CFO stares at every Monday morning, that matters.

Custom Salesforce domains aren’t supported. If your org uses a vanity URL (like mycompany.my.salesforce.com with a custom domain), check the documentation carefully before committing.

Hard deletes aren’t tracked. The ISDELETED column catches soft deletes, but records that are permanently purged from Salesforce will persist in your Snowflake tables indefinitely. You’ll need a reconciliation process if that matters for your use case.

Certain field types are silently dropped. location, address (compound), and base64 fields are not synced. The connector doesn’t error on these — it just ignores them. This is the kind of thing that bites you three months in when someone asks why geographic data isn’t in the warehouse.

Formula fields require full refresh. Because formula field values are computed server-side and don’t update SystemModstamp, they can’t be synced incrementally. You need a separate connector instance running full refreshes to capture formula field changes. That’s not a bug — it’s how Salesforce works — but it’s a gotcha if you’re not expecting it.

No relationship traversal. You can’t configure the connector to follow lookups and pull related objects automatically. Each object is pulled independently. Joining them is your job in the transformation layer, which is where it belongs anyway if you’re running dbt.

None of these are dealbreakers. But every one of them is the kind of thing that bites an engineer at 4pm on a Friday if they weren’t expecting it. Now you’re expecting it.

Getting Started

If you want to try this on your own Salesforce org, here’s my recommended order of operations:

Start with a non-production Salesforce sandbox and a Snowflake trial account if you don’t have a dev environment handy. Pick two or three standard objects — Account, Contact, Opportunity — and one custom object if your org has them. Run through the three-phase setup I described above. Get data flowing, verify the table structures, and then add a custom field to one of the objects in Salesforce. Wait for the next sync. Watch the column appear automatically in Snowflake.

That’s the moment it clicks. That’s the moment you stop thinking about managed connectors as a luxury and start thinking about custom API code as technical debt you’re choosing to carry.

Once you’ve validated the core flow, layer on your dbt models. OpenFlow lands raw data — it doesn’t transform it. Your staging models handle naming conventions, type casting, soft delete filtering, and the join logic that stitches related objects together. This is the same pattern you’d use with Fivetran or any other EL tool: land it raw, transform it in the warehouse.

For teams already running Fivetran for Salesforce, I’m not suggesting you rip it out tomorrow. If it’s working and the cost is acceptable, that’s a solved problem. But the next time you’re adding a new high-volume source that’s in OpenFlow’s catalogue — especially if you’re already paying for Snowflake — it’s worth running the numbers. You might find that native integration with compute-based pricing is the better deal.

The Real Why

I started this article with Marcus’s story because it illustrates something I feel strongly about: your engineers’ time is the most expensive resource in your data organisation, and spending it on solved problems is a leadership failure.

Building a custom Salesforce integration isn’t impressive. It was impressive in 2016, when the tooling didn’t exist. Today, it’s a choice to take on maintenance burden that a managed service handles automatically. It’s choosing to babysit schema changes instead of building the models and analyses that actually move the business forward.

Marcus is a senior engineer now. He builds dimensional models and designs data products that directly influence how the sales team allocates resources. He doesn’t write API integration code anymore. Not because he can’t — because his time is worth more than that.

But here’s what I keep coming back to. The tools keep getting better — OpenFlow, Fivetran, zero-copy, and yes, AI-assisted pipeline generation. Every year there’s a new thing that promises to automate another piece of the data engineering workflow. And many of them genuinely do.

What they don’t automate is the part that actually matters.

No tool — not OpenFlow, not Fivetran, not an AI agent — is going to sit in a room with your head of sales and ask “what keeps you up at night?” No connector is going to notice that the way your company defines “active customer” has quietly drifted from what the CRM tracks. No pipeline is going to push back and say “we could build that dashboard, but the metric you’re asking for doesn’t answer the question you actually have.”

That’s the connective tissue between raw data and business value. It’s a person who understands the domain, who’s been present through the change management cycles, who’s built relationships with the stakeholders and knows which CustomerNo field to use because they were in the room when the decision was made three years ago. It’s someone who asks clarifying questions instead of presuming they already know the answer.

AI will keep getting better at the mechanical parts of data engineering. Schema detection, pipeline code generation, anomaly detection — those are well-defined problems that automation is suited for. But the assumption that any tool can skip the human step — the clarification, the context, the judgment about what’s actually worth building — that’s where I’ve seen projects go sideways. Not because the technology failed, but because nobody asked the right questions before building.

OpenFlow isn’t perfect. The connector catalogue is young, the Salesforce connector is in preview, and Fivetran still wins on breadth and battle-hardened maturity. But OpenFlow represents something important: the data platform taking responsibility for data movement, not just data storage and compute. It’s Snowflake saying “we’ll handle getting the data in — you focus on making it useful.”

That last part — making it useful — is still your job. And it’s the part that no tool can do for you. The best thing a managed connector gives you isn’t fewer lines of code. It’s time back. Time to spend on the work that actually requires a human: understanding the business, building the right models, and making sure the data tells an accurate story.

The next time someone on your team says “I’ll just build a quick Salesforce integration” — send them this article. And the next time someone says “AI can just handle the data pipeline” — ask them who’s going to sit with the stakeholders and figure out what the pipeline should actually deliver.

That’s still you. Make sure you have the time for it.

OpenFlow on Ghost in the data