The implementation plan had a line item for it. Identity resolution: configure the match rules, validate the unified profiles, sign off. Six weeks, maybe eight. Then the project closed and the box stayed ticked.
That tick is the problem. Identity resolution is not a thing you finish. It is a thing you run. The rules you signed off at go-live were tuned against the data, sources, and channels that existed that month, and none of those stay still. The match quality you validated is a snapshot, and it ages from the day after launch. A year later the system is quietly worse, and because nothing alarms when identity degrades, nobody notices until a campaign goes to the wrong people or a report stops adding up.
This piece skips the definitions and the deterministic-versus-probabilistic trade-off, both of which deserve their own treatment. The focus is narrower: why a correctly configured identity system degrades on its own, what failure looks like when you treat it as a project, and what running it as an operation means in practice.
Where the problem comes from
Resolving whether two records describe the same person is an old problem, much older than the CDP. Statisticians have called it record linkage since the 1940s, and the field got its mathematical spine in 1969 when Ivan Fellegi and Alan Sunter published a formal theory in the Journal of the American Statistical Association, proving that a probabilistic rule weighing the evidence in each field was optimal under stated conditions. Public health, census work, and genealogy used record linkage for decades before marketing did. The discipline understood from the start that linkage is a judgment under uncertainty, not a lookup.
Marketing inherited the problem without inheriting that mindset. When Customer Data Platforms made identity resolution a headline feature in the 2010s, they packaged it as something a buyer configures once. You map your sources, pick which fields count as identifiers, choose how strictly they must match, and the platform builds an identity graph: the web of links connecting a person's emails, phone numbers, device IDs, loyalty numbers, and cookies into one profile. This often runs as a scheduled job. Salesforce Data Cloud, for example, processes rule changes on a daily batch after the initial build.
The configure-once framing is where the trouble starts. It is technically accurate, you really do set the rules up at the start, yet it implies a finish line that does not exist. The CDP era made record linkage look like a finish-line problem. It never was one.
The present: why a working system drifts
Assume the implementation was good: sensible rules, honest validation, unified profiles that looked right. Here is why that system still degrades.
New sources arrive. No customer data stack stays frozen. A new loyalty platform, a support tool, an events system, an acquired company's database, each shows up after go-live carrying identifiers in formats the original rules never saw. A graph tuned for the old set does not automatically absorb the new one. Every source added after launch is one the resolution logic was not tested against.
The data itself decays. People change jobs, emails, phone numbers, and addresses constantly. Estimates vary by method and segment, but the Dun and Bradstreet benchmark cited across the data-quality industry puts B2B contact decay near 22.5 percent a year, roughly 2 percent a month, with work email among the fastest-rotting fields. Every stale identifier is a thread in the graph that no longer connects what it used to. The matches do not break loudly. They stop being true.
New channels change the identifier mix. Add a mobile app and you add device IDs and push tokens. Add a CTV or retail-media channel and you add hashed-email and partner identifiers. Each changes the shape of what you are matching, and a rule set weighted for a web-and-email world was not built for it.
Links decay on their own. Cookies get cleared and capped, device identifiers reset, logged-out sessions pile up. The anonymous-to-known links that looked solid at launch thin out as the underlying signals expire. The graph loses resolution the way a photo loses sharpness, gradually, from every edge.
Merge and split errors accumulate. Every identity system makes two kinds of mistake. A false merge joins two different people into one profile. A false split, also called fragmentation, scatters one person across several profiles. Both compound. Set the rules slightly too loose and false merges build up: two people who share a household email or a recycled phone number collapse into one profile. Set them slightly too tight and the same person fragments because their work and personal emails never got linked. Neither error announces itself. They accumulate between audits.
Household and individual blur. Families share devices, IP addresses, surnames, and addresses. A graph has to decide, continuously, whether two records are the same person or two people in the same home. Get it wrong toward the household and a spouse receives the other spouse's recommendations. Get it wrong toward the individual and you fragment a household you meant to treat as one. The right answer shifts as your data and your use cases change.
Put together, the picture is plain. An identity graph is not a structure you build and own. It is a balance you hold against constant pressure. Stop pushing and it drifts.
The failure mode of the project mindset
Run identity resolution as a project and the failure is not a crash. It is silent. There is no error log for a degrading identity graph. The CDP keeps running. Profiles keep getting built, segments keep getting sent, nothing turns red. The degradation shows up weeks or months later, somewhere downstream, and usually gets blamed on the wrong thing.
It surfaces as fragmentation. The same customer exists three times because their identifiers never linked, so each profile holds a third of their history. Lifetime value is understated, frequency caps fail because the channel sees three people, the cart-abandonment flow fires for someone who already bought on another profile, a returning customer gets a new-customer offer. Marketing blames the channel tool. The channel tool is fine. The graph fragmented and nobody was watching.
Or it surfaces as over-merging, which is worse, a privacy and trust failure rather than an efficiency one. Two people merged into one profile means one person can see another's purchase history, order details, or recommendations. In a regulated industry, an over-merged profile is a data subject access request that returns the wrong person's data. Teams walk into this honestly: they tighten matching to chase personalization, push the rules too hard, and merge people who should never have been joined.
The trap underneath both is that the CDP made a hard call look easy. Many traditional CDPs let you set the rules but not safely change them later. There is often no unmerge button, and reconfiguring identity logic after live data has flowed through can mean rebuilding the instance. So the rules get treated as permanent because changing them is painful, which is exactly backward: the thing that needs continuous tuning is the thing the architecture discourages you from touching.
Because identity sits under everything, a quiet identity problem reads as a dozen unrelated ones. Attribution looks off. Personalization misfires. Suppression leaks. Reports disagree. Each gets investigated alone. The common cause is one drifting graph that no one owns.
Running it as an operation
The fix is not a better algorithm. It is treating identity resolution like any production system: monitored, owned, reviewed, fed by a loop. Four things make the difference.
Monitor match quality, not just match rate. Match rate, the share of records that resolved to a known person, is the number vendors lead with, and on its own it is misleading: a high match rate can hide a pile of false merges. Borrow the language record linkage has used for decades: precision, how often a merge is correct, and recall, how much fragmentation you actually removed. Track an over-merge signal and an under-merge signal separately, because they move in opposite directions and one global number hides both. Watch the trend, not the level. Profile count rising faster than known customers suggests fragmentation; profile count falling oddly fast suggests aggressive merging. Set thresholds and alert on them, so the graph tells you it is drifting instead of waiting for a campaign to tell you.
Run periodic audits. Monitoring catches the trend. Audits catch what monitoring cannot. On a monthly or quarterly cadence, pull a sample of merged profiles and inspect the merge explanations: why did the system link these records, and does the reason hold up. Pull a sample of suspected duplicates and ask why they failed to link. This is slow, human, unglamorous work, and the only way to see the specific bad joins a dashboard averages away. A black-box graph you cannot interrogate is one you cannot govern, so explainable merge decisions are themselves a requirement.
Build a feedback loop. Findings have to change something. An audit that surfaces a recycled-phone-number false merge should lead to a rule adjustment. A new source should trigger a review of how its identifiers are weighted before it goes live, not after it has polluted the graph. Without that loop, audits are just a record of decline.
Govern it. Identity decisions are consent and privacy decisions. Which identifiers may be used for matching depends on what the customer agreed to, and merge and unmerge actions need audit trails. Governance here is a rhythm, not a one-time policy document: a standing review where identity metrics, source changes, and audit findings get seen by someone empowered to act.
Staffing and ownership
The reason identity resolution drifts is usually not technical. It is that nobody owns it. It falls in a gap. Marketing treats it as data plumbing and assumes IT has it. IT treats it as a marketing configuration and assumes marketing has it. The CDP vendor configured it during implementation and has gone. So the graph becomes a nobody-touch-it asset: critical, fragile, and unowned.
Give it a name on an org chart. It does not need a large team. It needs one accountable owner, often in marketing operations or a data function, whose job explicitly includes identity health, plus a defined cadence and escalation path. The owner watches the monitors, schedules the audits, reviews new sources before they connect, and decides when rules change. Around that owner sits a light governance group, marketing, data engineering, and privacy or legal, meeting on a fixed rhythm to review identity metrics and approve changes that carry consent or compliance weight.
The skill matters. Tuning identity rules is a judgment call about thresholds and trade-offs, the same precision-versus-recall balance record linkage has always involved. It sits closer to data engineering than campaign management, and assuming a busy campaign manager will absorb it in spare time is how drift sets in.
This is a small ongoing cost, and the case for it is concrete. Amperity, an enterprise CDP built around identity resolution, analyzed a set of consumer retail and hospitality brands and found roughly 23 percent of their best customers misidentified, with those buyers driving over half of revenue. The reason is intuitive: high spenders shop across more channels over more years, so they carry the messiest data and are the hardest to unify. The people your graph is most likely to fragment or over-merge are the people who matter most. An operation that keeps the graph honest is not overhead. It is protection on the revenue most exposed to getting identity wrong.
Where this is heading
Two forces are about to make this discipline non-negotiable.
The first is agents. As AI agents start reading customer profiles and acting on them with no human in between, the cost of a wrong profile rises sharply. A human marketer glancing at a fragmented profile often senses something is off. An agent does not. It treats the profile as ground truth and acts on whatever the graph says. Feed an agent an over-merged profile and it acts on a person who does not exist. Feed it a fragmented one and it acts on a third of someone. Agentic marketing only works on an identity layer that is actively kept honest, which makes monitoring and audits a precondition, not a nice-to-have.
The second is the tooling, which is finally moving toward continuous resolution. Vendors are adding AI-assisted matching that handles messy real-world data better than fixed rules, plus configurable unmerge and rule adjustment, directly addressing the old no-undo trap. Amperity's machine-learning matching can decompose records that were incorrectly merged. Hightouch ships configurable identity logic in a warehouse-native model where the resolved graph stays queryable in your own warehouse rather than locked in a black box, so you can monitor and audit it with your normal data tools. The direction is clear: identity resolution is becoming something built to be tuned continuously, because the vendors have accepted what record linkage always knew, that it is never finished.
None of that removes the need for an owner and a loop. Better tools make the operation easier to run. They do not run it. Identity resolution is not a feature you switch on. It is a system you keep alive. Treat it as a project and it degrades in silence until something downstream breaks and gets misdiagnosed. Treat it as an operation, monitored, owned, audited, governed, and the layer everything else depends on stays trustworthy. Most of the disappointment with CDPs traces back to which choice a team made.
Council summary
This post argues that identity resolution is a production system to run, not a project to close, because a correctly configured identity graph drifts on its own as sources, data, channels, and merge errors shift after go-live. The fix it lays out is operational: monitor match quality rather than match rate, audit merged profiles on a cadence, feed findings back into the rules, govern the consent angle, and name one accountable owner. The council verified the load-bearing claims and tightened one: the Fellegi and Sunter 1969 theory is correctly credited to the Journal of the American Statistical Association, the Dun and Bradstreet decay benchmark near 22.5 percent a year holds, and the Amperity figure was corrected to its real number, about 23 percent of best customers misidentified. The takeaway is to fund an identity owner and a review rhythm now, because agentic marketing acts on whatever the graph says with no human to catch a wrong profile.
Comments