data clean rooms

Data Clean Rooms: How Rivals Share Data Without Sharing Data

A brand and a retailer both hold records on the same shopper. Data clean rooms let them combine those records without either side ever seeing the other's files.

A consumer packaged goods brand and a large grocery retailer both have detailed records about the same shopper. The brand knows she saw three of its ads last month. The retailer knows she walked in and bought the product on the fourth visit. Put those two facts in the same table and you would finally know whether the ads worked. You would have closed the loop.

They cannot put them in the same table. Handing the retailer a file of ad exposures, or handing the brand a file of named purchases, would expose customer records that privacy law protects and that neither company will surrender to the other. The retailer will not give a supplier its shopper list. The brand will not give a retailer its customer file. The single most useful join in retail marketing is blocked by a wall that exists for good reasons.

A data clean room is the workaround that respects the wall. It is a neutral, governed environment where both parties load their data, the records are matched on a shared key, and only aggregated, privacy-protected answers come back out. Neither side ever sees the other's raw rows. The brand learns its ads drove a measurable lift in sales. The retailer never learns who the brand advertised to. The question gets answered and the files stay home.

Origin: a fix the walled gardens built for themselves

The term comes from semiconductor manufacturing, where a clean room is a sealed space kept free of dust so chips can be built without contamination. Advertising borrowed the word for a sealed space kept free of one specific contaminant: anyone seeing data they should not.

The first clean rooms were not built for fairness. They were built by the largest platforms to solve their own problem. Google launched Ads Data Hub in beta in 2017, and it became the replacement for the DoubleClick log-level data export once the General Data Protection Regulation took effect in 2018: as the Wikipedia entry on data clean rooms notes, European advertisers who wanted to analyze Google ad IDs had to do it inside the hub, because Google would no longer ship user-level logs. Facebook followed with its own version. Amazon began developing what became Amazon Marketing Cloud, a service AdExchanger first reported in August 2019 and launched out of beta in October 2021.

Read that history honestly and the clean room starts as a defensive move. A platform holds enormously valuable log-level data. Advertisers want to analyze it. Regulators will not allow raw user records to leave. A clean room lets the platform say yes to analysis and no to export at the same time. It protects the platform's data asset as much as it protects the consumer. That dual purpose still shapes the technology and the argument around it.

Present: how a clean room actually works

Strip away the jargon and a clean room runs four steps.

First, both parties load data into a neutral environment rather than emailing files. The data is encrypted, and in the better setups it is queried where it sits in each party's cloud, never copied into one shared pile.

Second, the records are matched on a common key. Usually that key is a hashed email address: the email is run through a one-way function that turns it into a fixed string of characters, so the systems can tell two rows refer to the same person without either system reading the actual address. Other identifiers and privacy-safe protocols can serve the same role.

Third, analysis runs inside the room. Each project defines in advance what data is in scope, which queries are allowed, and what may leave. As LiveRamp describes the model, roles and permissions govern who can run what and who can see which result. Common jobs are audience overlap, reach and frequency, and attribution that links an ad exposure to a later purchase.

Fourth, only protected outputs come out. The controls here are the heart of the thing. A minimum aggregation threshold means a result is suppressed unless it covers enough distinct people. AWS Clean Rooms enforces this as a rule that each output row must represent at least a configured number of distinct users, and Google's Ads Data Hub aggregates exported results to a floor of 50 users. On top of that, some rooms apply differential privacy, which adds a calibrated amount of statistical noise so that no single person's presence changes the answer enough to be detected. The IAB Tech Lab, whose clean room guidance is the closest thing the industry has to a standard, treats noise injection as the only method that gives a measurable guarantee against output attacks. Query limits and audit logs sit alongside these so nobody can run a thousand slightly different queries to triangulate an individual back out.

There are two broad families. The walled-garden clean rooms are run by the big platforms and answer questions about that platform. Amazon Marketing Cloud and Google Ads Data Hub are the leading two, joined by retailer and broadcaster rooms from the likes of Instacart, Disney and NBCUniversal. The neutral or platform-agnostic clean rooms are built to sit between parties rather than inside one: Snowflake, AWS and Google Cloud offer them as data-warehouse features, and independent specialists include LiveRamp, InfoSum and Decentriq. The independent layer is consolidating fast. LiveRamp bought the clean room vendor Habu in 2024, WPP folded InfoSum into GroupM in 2025, and in May 2026 LiveRamp itself agreed to be acquired by Publicis for roughly 2.5 billion dollars. The truly unaligned clean room vendor is becoming a rare thing. LiveRamp was named a leader in the 2025 IDC MarketScape for clean room technology, one sign the category matured past its experimental phase before the buyers moved in.

Retail media is what moved clean rooms from niche to mainstream. The logic is tight. Retailers own purchase data, the most valuable signal left now that third-party cookies are unreliable. Brands spending on retail media want proof those ads moved real sales. A clean room is the only structure that connects an ad impression to a basket without the retailer handing over its shopper file. It is the closed-loop bridge for an ad format growing fast. US retail media ad spend is approaching 70 billion dollars in 2026 by eMarketer's forecast, and that money wants measurement. Adoption has followed: eMarketer reports that 66 percent of organizations now use clean rooms in some form, citing the 2025 State of Retail Media research. That this is the same closed-loop problem at the center of retail and commerce media is not a coincidence. The clean room is the measurement layer that makes retail media sellable.

The most visible signal of how mainstream this has become arrived in September 2025. Amazon Marketing Cloud had previously sat behind a demand-side platform contract with a reported minimum near 60,000 dollars a year. On 18 September 2025, as Marketing Dive reported, Amazon opened AMC directly to every advertiser running Sponsored Products, Sponsored Brands, Sponsored Display or Sponsored TV campaigns, with no contract and no minimum. A clean room went from an enterprise purchase to a checkbox in the ads console.

The honest limits

A clean room is a real tool, not a magic privacy loophole, and the gap between the pitch and the reality is wide enough to plan around.

Match rates are imperfect. The join only works for people who appear in both datasets under a key that lines up, and email addresses go stale, vary, and multiply. Independent clean rooms commonly see match rates well below what brands expect from Google or Meta, because those platforms hold a near-complete logged-in view of the population and an independent room does not. A measurement built on a partial match is a measurement with a blind spot, and the blind spot is rarely disclosed cleanly. As identity specialists have argued, a clean room does not solve identity resolution by itself; it still needs a reliable key, and that key is the hard part.

The walled-garden rooms still favor the platform. This is the structural catch. Amazon's own clean room is plugged into Amazon's ad business and identity graph, and as AdExchanger noted on launch, AMC is not a neutral cross-channel measurement product; it is built to support Amazon's media empire. A platform-run clean room verifies that platform's own performance. It is a better window than a self-reported dashboard, but it is still a window the platform built, and it will not grade a rival's ads. That is the same conflict at the heart of walled gardens and self-attribution, and the clean room narrows it without removing it.

They are complex and skill-intensive. A clean room is data engineering, not a dashboard. eMarketer, citing retail media research, reports that 39 percent of organizations struggle to drive actionable insights from clean room data, and Skai's State of Retail Media work finds integration into existing workflows, scaling, and a shortage of in-house expertise are the top hurdles practitioners name. Setting up a project, agreeing the legal terms, mapping identifiers and writing queries takes weeks and specialist staff. The legal step alone is often the slowest.

And a clean room is not automatic privacy. In a November 2024 note bluntly titled Data Clean Rooms: Separating Fact From Fiction, the US Federal Trade Commission pointed out that clean rooms are not rooms, do not clean data, and can be used for privacy washing, where the label implies a protection the configuration does not deliver. Worse, the agency noted, granting another system access to a dataset enlarges the surface that has to be defended. The protections are real only if the thresholds, query limits and noise are set and enforced properly. A badly governed clean room is just a shared database with a reassuring name.

Future and impact: the plumbing disappears

The direction of travel is that clean rooms become invisible. AdExchanger argued in 2025 that the era of the data clean room as a distinct product is ending, not because the technology failed but because it succeeded enough to become background infrastructure. The industry is folding the term into broader language about data collaboration and interoperability. Marketers increasingly describe what they want to achieve, a closed-loop measurement, an audience overlap, rather than the box that does it.

The contested question is interoperability. Today a clean room mostly answers questions inside one platform or one cloud, and stitching across several remains manual and partial. The independent providers, now absorbed into holding companies and the major clouds, are pushing toward rooms that work across environments, and the IAB Tech Lab is standardizing how matching and measurement should behave. If that succeeds, a brand could measure across several retailers and publishers without a separate project for each. If it stalls, the walled-garden rooms keep their structural advantage and clean rooms remain a set of disconnected windows. There is a tension worth watching here: the same agency groups now pitching neutral cross-platform measurement also buy media, which is the conflict clean rooms were meant to escape.

The honest near-term posture for a marketing or data decision-maker is unglamorous. A clean room is the right tool for closed-loop measurement and privacy-safe audience work, and for retail media it is close to mandatory. Treat a walled-garden room as a better but still partial view of that platform, not as a neutral referee. Budget for the engineering and the legal time, not just the license. Ask hard questions about match rates before you trust a result. The clean room genuinely lets two rivals answer a shared question without surrendering their data. It does not make the answer complete, and it does not make either side disinterested. Used with those limits in mind, it is one of the few honest measurement tools the privacy era has produced.

Council summary

This post argues that a data clean room is a genuine but narrow tool: it lets two parties who will not exchange files still answer a shared question, by matching records on a hashed key inside a governed environment and releasing only aggregated, threshold-protected outputs. It is honest about the catch. The technology began as a defensive move by Google, Facebook and Amazon to permit analysis without export, and the walled-garden rooms still grade their own homework, while the independent layer is being bought up, most recently by Publicis taking LiveRamp. The reader's takeaway is practical. Use a clean room where it fits, above all for retail media measurement, but budget for the engineering and legal time, interrogate match rates before trusting any number, and treat a platform-run room as a better window, not a neutral referee.

Comments

Leave a comment

Your email won't be published. Comments are reviewed before they appear.
★ Read next