A marketer builds an audience in the CDP: customers likely to churn in the next 30 days, high value, not contacted in the last week. The segment populates, the journey fires, the report comes back. It looks like the system worked.
It often did not. The churn score that audience depends on was very likely calculated somewhere else, in the data warehouse, by a model the marketing team does not own. Whether it reached the CDP, and how stale it was when it did, decides whether that segment is real or fiction. The marketer cannot see any of that from inside the tool. The CDP shows a clean, confident profile. It does not show what is missing from it.
This is the quiet problem with customer data platforms. The pitch, going back to when the category got its name around 2013, is a single unified profile: every interaction, one customer, one record any downstream tool can use. Buyers hear "complete picture." What they get is a complete picture of a specific slice of customer data, and a partial picture of everything else. The gap is not a bug in a particular vendor's product. It is structural, and it shapes every segment and journey built on top.
Where the gap comes from
The first CDPs were built to solve one painful problem: anonymous and known behavior was scattered across an email tool, a web analytics tool, a mobile app, and a CRM, with nowhere to unify it. So the CDP shipped with JavaScript and mobile SDKs to capture events at the source, an identity graph to stitch those events into profiles, and a database to hold the result.
That origin still defines what a CDP is naturally good at. It is excellent at digital behavioral data: page views, clicks, product views, add-to-carts, app screens, email opens. This data is generated by code the CDP controls, in real time, in a shape the CDP designed. It flows in cleanly because the CDP was built for exactly it.
Everything else is an import. And imports are where the picture thins out.
Consider what a typical CDP does not generate itself and therefore has to be fed:
- Predicted lifetime value. A model output, almost always computed in the warehouse by a data science team using Python, R, or SQL.
- Churn and propensity scores. Same story. These are model predictions, not events. They are produced where the modeling tools and the historical training data live, which is the warehouse.
- Product usage data. For a software company, the depth of usage that actually predicts renewal often sits in a product analytics system or the application's own database, not the marketing CDP.
- Support and customer success interactions. Ticket history, contact center notes, satisfaction scores. These live in the support platform.
- Finance and returns data. Refunds, chargebacks, payment failures, net revenue after returns. This sits in finance systems and the warehouse. A customer who bought 600 dollars of product and returned 550 looks like a great customer to a CDP counting purchase events.
- Offline events. In-store purchases, point-of-sale activity, branch visits, call center orders. These reach the CDP late, in batches, if at all.
None of this is exotic. It is the data that decides most real marketing questions. And most of it does not originate in a place the CDP can stream from.
Why the import is so leaky
There are two honest reasons the non-behavioral data arrives incomplete, late, or not at all.
The first is latency. A CDP can collect digital events in real time because it owns the collection code. It cannot do the same for a finance system or an on-premise order database. One practitioner analysis estimates that for most large enterprises, roughly 80 to 90 percent of the data sources you want in a CDP cannot deliver data in real time, so you end up loading them in batches. Batch means a schedule. A schedule means drift. The warehouse recalculates a churn score overnight; the CDP does not see it until the next sync; the marketer acts on yesterday's number believing it is today's.
The second reason is the data model. Most CDPs organize everything under two object types: users and events. That shape fits clickstream behavior well. It fits the rest of a business badly. A retailer has orders, returns, subscriptions, stores, products, and inventory, with relationships between them. A B2B company has accounts, contracts, and seats. Hightouch, which argues this point directly, puts it plainly: a real business does not fit into the narrow boxes of users and events. Adobe's Real-Time CDP requires data to conform to its Experience Data Model, and data that does not fit is simply not accepted for profile unification. So when returns data or account hierarchy reaches the CDP, it either gets flattened into a crude event, mashed onto the user record as an attribute, or left out. The warehouse, by contrast, models arbitrary relational data natively. That is what relational databases do.
Put latency and the data model together and you get the result practitioners keep running into. The CDP holds a fast, rich record of what customers clicked. It holds a slow, lossy, or empty record of what they are worth, what they returned, what they asked support, and what they did offline.
Why a partial profile quietly breaks things
A CDP with most of the behavioral data and little of the rest does not fail loudly. It fails by being confidently wrong, which is harder to catch.
Segments act on whatever is present. Build a "high value customer" audience and, if net-of-returns revenue never made it into the CDP, you are segmenting on gross purchases. The serial returner sits in your best tier. Build a churn-risk journey on a score that syncs nightly and you are intervening on a 14-hour-old prediction, after the customer has already had the bad week the model was meant to catch.
Measurement bends the same way. If the CDP cannot see in-store purchases, an email campaign that drove a customer into a shop to buy looks like it failed. The org then optimizes against a number that is missing its best outcomes. Suppression breaks too: you cannot stop emailing a furious customer about a product issue if the support ticket describing that issue never reached the profile the journey reads.
This is part of why CDP satisfaction runs low. eMarketer, citing the CDP Institute's 2024 member survey, reported that only 64 percent of deployed CDPs deliver significant value, a figure that has fallen over time. Hightouch points to research that only about 10 percent of CDP owners say their CDP actually meets their needs. There are several reasons for that disappointment, but a central one is expectation: teams bought a complete customer view and got a complete view of one slice. The tool is not broken. It was scoped smaller than the brief in the buyer's head.
It compounds with a related mistake. Buyers often expect the CDP to clean their data. It does not. As one CDP Institute piece on data quality puts it, a CDP exposes flaws rather than fixing them: duplicates, inconsistent formats, and missing fields surface mid-implementation. A partial profile and a dirty profile are different problems, and a CDP solves neither on its own.
The warehouse-as-source-of-truth pattern
The structural fix the market has converged on is to stop treating the CDP as the source of truth and treat the cloud data warehouse as the source of truth instead.
The logic is straightforward once you see where the missing data already sits. Predicted lifetime value, churn scores, and propensity models are computed in Snowflake, BigQuery, or Databricks. Finance and returns data is loaded there. Offline and point-of-sale data is loaded there. Product usage can be loaded there. The warehouse is already the one place where behavioral data and all the decisive non-behavioral data can sit together, modeled relationally, with consistent definitions. Fivetran's argument for this is blunt: the warehouse already holds the data, and a separate CDP database does not unify the picture so much as fragment it again.
In this pattern, the unified, decision-ready customer record, the segment logic, the scores, gets built in the warehouse, where nothing is missing. The CDP's job, or a reverse ETL tool's job, is to take that record and activate it: push the audience and its attributes out to the email platform, the ad networks, the support desk, the channels that act. Reverse ETL is the standard term for moving modeled warehouse data back out to operational tools. It runs traditional analytics ETL in reverse: instead of pulling data into the warehouse for analysis, it pushes finished data out for action.
This is the engine behind the composable, or warehouse-native, CDP, and it is now mainstream rather than fringe. Hightouch, a composable vendor built on reverse ETL, reached the Leader quadrant in the 2026 Gartner Magic Quadrant for Customer Data Platforms. The packaged vendors have moved the same way: Salesforce Data 360 and others can now query and activate warehouse data without persisting their own copy. Both camps increasingly agree the warehouse should hold the truth. They mainly differ on how much of the CDP's logic and interface sits in the vendor's platform versus in your warehouse.
How to close the gap
You do not need to rip out a working CDP to fix this. You need to be deliberate about what the CDP is for and where the complete picture lives.
Inventory what is actually missing. List the data that decides your important segments and journeys: lifetime value, churn and propensity scores, returns and net revenue, support history, product usage, offline purchases. For each, write down where it originates, whether it reaches the CDP, and on what schedule. Most teams have never done this and are surprised by the answer.
Make the warehouse the place the customer record is assembled. Build scores and segment definitions where all the inputs exist, rather than trying to recreate them inside the CDP from whatever data happened to sync. This also fixes definition drift. As Fivetran notes, what counts as an "active" customer should be defined once, in SQL, not redefined differently across every downstream tool.
Be honest about latency per use case. A nightly churn score is fine for a weekly retention campaign and useless for an in-session intervention. Match the sync schedule to the decision. Do not let "real-time CDP" on the box convince you that a batch-loaded attribute is fresh.
Use the CDP for what it is genuinely good at. It is strong at real-time digital identity resolution, stitching anonymous behavior to known profiles, and at giving marketers a self-service interface to build audiences and activate them without filing a ticket to the data team for every segment. As the CDP-versus-warehouse comparisons consistently put it, a warehouse is the source of truth for understanding what happened; a CDP is built to act on it. Let each do its half.
Where this is heading
The gap matters more, not less, as AI agents enter the picture. Both Gartner and Forrester now describe the CDP's future around agentic AI, with the stack reframed as warehouse plus CDP plus agents. An agent that reads a customer profile and acts on it, with no human reviewing the decision, is only as good as the data in that profile. A human marketer might sense that a "high value" customer is actually a serial returner and hesitate. An agent will not. It acts on exactly what the record says, fast, which turns a missing returns feed or a stale churn score from a soft reporting problem into a wrong action executed at speed.
That raises the value of getting the foundation right: a complete, current, consistently defined customer record in the warehouse, with the CDP or activation layer as the path from that record to the channels. This is the unglamorous work underneath the agentic pitch, and it is where Perform Digital tends to start with clients, because an agent on a partial profile is not an upgrade. It is a faster way to be wrong.
The realistic takeaway is not that the CDP is a bad tool. It is that the CDP was never the whole picture, and treating it as one is the mistake. It holds the behavioral slice well. The data that decides what a customer is worth, whether they are leaving, and what they have already told you lives elsewhere, mostly in the warehouse. Know which is which, assemble the truth where all the inputs are, and use the CDP for the job it was actually built to do.
Council summary
This post argues that a CDP holds the behavioral slice of customer data well and the decisive non-behavioral slice, lifetime value, churn scores, returns, support history, offline purchases, poorly or not at all, because that data neither originates in code the CDP controls nor fits a users-and-events data model. The fix it lays out is the warehouse-as-source-of-truth pattern: assemble the customer record where all the inputs already sit, then use the CDP or a reverse ETL tool to activate it. The council verified every named figure against its source: the 64 percent significant-value number from the CDP Institute's 2024 member survey via eMarketer, the 10 percent satisfaction figure and the users-and-events critique from Hightouch, the 80 to 90 percent batch estimate from David Chan's practitioner analysis, Adobe's XDM conformance requirement, and Hightouch's Leader placement in the Gartner Magic Quadrant for Customer Data Platforms. No claim was invented or misattributed, so the edits were limited to tightening. The reader takeaway is concrete: inventory what the CDP is actually missing, build the customer record in the warehouse, match sync latency to each decision, and let the CDP do the identity resolution and activation it was built for.
Comments