Open any affiliate dashboard and it will tell you, to two decimal places, how much revenue the channel drove last month. Every conversion has a name attached: a coupon site, a cashback portal, a review article, a creator. The numbers add up. The report looks finished.

It is also, in a specific and expensive way, lying to you.

The dashboard is answering one question: which affiliate cookie was on the buyer's device when the order completed. That is a real question, and for paying commissions it is the one the contract usually settles on. But it tells you almost nothing about the question a marketer actually cares about, which is whether that affiliate caused the sale. A shopper who already knew the brand, already had the product in their cart, and stopped to search for a discount code on the way to checkout will still get tagged to whichever coupon site they found. The commission gets paid. The customer was never won.

Incrementality testing is the discipline of separating those two things. It asks, of every partner and every publisher type, a question last-click reporting cannot: how many of these sales would have happened anyway?

Where the question came from

The idea is older than affiliate marketing and older than the web. Direct mail companies in the twentieth century already understood it. If you mail a catalog to a million households and 30,000 of them buy, the catalog did not necessarily produce 30,000 sales. Some of those households were going to order regardless. The way to find out was to hold a slice of the list back, mail them nothing, and compare. The gap between the mailed group and the held-back group was the real effect of the catalog.

That logic, a treated group against an untreated control, is the entire foundation of incrementality testing. It is just an experiment. For most of affiliate marketing's history, nobody ran it.

They did not need to. Through the 2000s and 2010s, last-click attribution was good enough because it was cheap, unambiguous, and nobody was seriously challenging it. A merchant read one cookie at checkout, paid one party, and moved on. The affiliate channel reported strong return on ad spend because it was measuring itself with a ruler it had designed, and the ruler was generous. Coupon and cashback affiliates, which intercept buyers at the moment of purchase, looked like some of the best performers in the program precisely because they were always last.

Two things broke that quiet. The first was the Honey scandal in late 2024, when the browser extension was accused of overwriting affiliate cookies at checkout and collecting commissions on sales it had not originated. It made a mainstream audience understand that being last in the funnel and being responsible for the sale are not the same thing. The second was budget pressure. As acquisition costs rose and finance teams started asking marketing to prove its numbers, "the dashboard says so" stopped being an acceptable answer.

Incrementality testing was the answer, sitting in the direct-mail playbook the whole time.

How the testing actually works

There is no single incrementality test. There is a family of designs, and the right one depends on how much control you have over your audience, how much traffic you have to work with, and which question you are trying to settle. Four designs do most of the real work in affiliate programs.

The publisher on/off test is the bluntest and often the most convincing. You take a partner, or a whole partner type, and you switch it off. No commissions, no placements, no activity, for a defined window, usually four to seven weeks. Then you look at total orders and revenue. If the channel was creating customers, turning it off should cost you sales. If revenue holds flat, those sales were not the channel's to claim. The catch is that you need something to compare the switched-off period against, because demand moves on its own for a hundred reasons.

That is where the geo test comes in, and it is the workhorse design for a reason. You split your markets into two comparable groups of regions. In one group, the partner stays on. In the other, it goes off. Because the two groups face the same season, the same weather, the same general economy, the difference in their results is a clean read on what the partner did. Geo tests have a property that matters more every year: they need no individual-level tracking, so they are unaffected by third-party cookie loss, browser privacy changes, and Apple's tracking restrictions. They tend to run four to six weeks.

The matched-market test is a refinement of the geo design for brands that do not have neat, balanced regions. Instead of hoping two halves of the country are comparable, you build a synthetic control: a weighted blend of untreated markets, engineered to mirror the treated market's recent sales history as closely as possible. You then run the treated market against its synthetic twin. This is how the most rigorous affiliate incrementality studies are now done.

The holdout test, also called a conversion lift study, works at the level of individual users rather than regions. You randomly withhold affiliate exposure from a slice of your audience, commonly 20 to 30 percent, and compare their conversion rate to everyone else's. It is the most precise design when you can execute it. Industry analyses of how far attribution overstates true incremental return tend to land in the 30 to 50 percent range, and a holdout is the cleanest way to measure that gap for your own program. The cost is that it needs platform support for audience exclusions and a healthy volume of conversions, hundreds to thousands per group, to reach statistical significance.

Sitting above all of these is marketing mix modeling, a statistical approach that estimates each channel's contribution from time-series data across the whole media plan. It is useful for the strategic picture and needs no experiments, but it is less granular than a real test and depends heavily on how the model is built. The strongest programs treat modeling and experiments as partners: the model points to where the questionable spend is, and a test settles it.

Whichever design you pick, the metrics you watch should not be attributed conversions. They should be total orders, total revenue, new customers specifically, customer acquisition cost, and contribution margin per order. Attributed volume is the number you are trying to get away from.

What a realistic result looks like

Here is the part that makes incrementality testing uncomfortable, and useful. The results are not uniform. Different publisher types produce genuinely different incremental value, and the pattern is consistent enough that you can predict it before you test, though you should still test.

Content and editorial publishers tend to score well. Review sites, comparison articles, and genuine recommendation content do their work early, when a buyer is still deciding what to buy and whether to trust the brand at all. A shopper who reads a comparison piece and then buys was, in many cases, genuinely moved. CJ's incrementality study, built on 21 million retail consumers and 5.5 million transactions, found that shoppers who touched the affiliate channel converted at a 46 percent higher rate and produced 88 percent more revenue per shopper, with a positive effect on both new and returning customers. The study looked at the channel in aggregate rather than splitting it by publisher type, but the upper-funnel logic is why content partners are usually the ones holding up that average.

Creators and influencers generally land in similar territory, because their value is also discovery and trust. Someone who buys after a creator's recommendation often did not know the product existed an hour earlier. That is close to the definition of incremental.

Coupon and cashback publishers are where the testing tends to hurt. Their mechanism activates at or near the moment of purchase, which means a large share of the conversions they get credited for involve buyers who had already decided. The honest finding across many programs is that this category shows mixed-to-low incrementality. Mixed, because a coupon placement on a deal site's homepage can genuinely introduce a brand to a bargain hunter. Low, because the same coupon, surfaced by a code-finding extension at the checkout of a buyer who came in through brand search, is closer to a last-click toll than a marketing channel.

Loyalty and cashback portals warrant the strictest testing of all, and the clearest published evidence of why comes from an on/off test run by the measurement firm Haus for a multinational direct-to-consumer brand operating in eight countries. The brand switched off all of its affiliate loyalty partners, the cashback and reward sites, in the United States for seven weeks. The other seven countries, where the partners stayed on, were blended into a synthetic control. The result: no change in revenue between the US and its comparable control. The brand cut its US loyalty investment, saw no material revenue drop, and moved the savings into channels that actually moved the number, improving its overall acquisition cost.

That is the shape of a real incrementality finding. Not "the channel is worthless," but "this slice of the channel was being paid for work it was not doing, and the money is better spent elsewhere."

One practical note keeps coming up from practitioners: a partner's incrementality depends heavily on how and where they promote, not just what category they fall in. To measure coupon partners cleanly, give them exclusive, trackable codes rather than generic discounts, so you can see which placements introduce buyers and which simply harvest them.

Acting on the findings

A test that does not change a commission rate is a slide, not a decision. The point is to make the program pay for outcomes rather than for proximity to checkout.

The first move is to stop running one commission rate for everyone. A flat rate means content publishers who create customers and cashback portals who capture them are paid identically, which quietly subsidizes the low-incrementality end of the program with margin the high-incrementality end earned. Tiered structures fix this. Many direct-to-consumer programs now pay content and review partners materially more than coupon and cashback sites, with some setting coupon commissions in the low single digits specifically because the testing says those sales are mostly not incremental.

The second move is to pay for the thing you actually want, which is new customers. A new-to-brand bonus, a higher rate or a flat payment that triggers only on a first-time buyer, points commission money straight at acquisition and away from rewarding a partner for being present when an existing customer reorders. Programs that take incrementality seriously now separate first-order economics from repeat-order economics.

The third move is restraint when a partner tests poorly. The instinct is to cut the partner. The better first step is to test a commission decrease. Reduce the rate by 10 or 20 percent and watch. If the partner was genuinely driving incremental conversions, a lower payout will usually make them pull back their promotional effort, and you will see traffic or revenue fall. If you cut the rate and nothing moves, you have confirmed the partner was not creating demand, without the disruption of a full removal. Some programs go further and use platform rules directly: suppressing commission when an affiliate enters the click path in the final few minutes before purchase, or paying more when an affiliate is the only touch in the journey and less when the touch is shared.

The last move is to treat this as maintenance, not a one-time audit. Incrementality is not a fixed property of a partner. A coupon site can become more incremental if it shifts to upper-funnel placements; a content site can become less incremental if its traffic decays. A full review at least quarterly keeps the commission structure honest as the program changes underneath it.

Why this is becoming non-optional

Incrementality testing used to be a sophistication, something the most advanced programs did. It is rapidly becoming standard practice. In a July 2025 survey from eMarketer and TransUnion, 52 percent of US brand and agency marketers said they already use incrementality experiments to measure campaigns, and 36 percent said they planned to invest in the method over the following year. The reason is not fashion. It is that the alternative is failing.

Last-click attribution depends on a clean, orderable sequence of clicks. That sequence is dissolving. Third-party cookies are unreliable, browser privacy controls keep tightening, and a growing share of product research now happens inside AI assistants that compare options and surface deals without ever setting a cookie. When the click happens late, or never, a model built on ordering clicks has nothing solid to stand on. Incrementality testing does not care about cookies or click order. It cares about one thing: with the channel, and without it, what changed. That question survives the loss of tracking that is breaking everything else.

The affiliate channel is not in trouble here. Plenty of it creates real customers, and the testing proves it where it is true. What is in trouble is the comfortable habit of reading a last-click dashboard and calling it performance. The marketers who will defend their affiliate budgets are the ones who can say, with an experiment behind them, exactly which partners grow the business and which ones were standing at the door collecting a fee. The dashboard will never tell you that. A holdout will.

Council summary

This post argues that a last-click affiliate dashboard measures cookie position, not cause, and that incrementality testing is the only honest way to learn which partners actually create customers. It walks through five test designs (publisher on/off, geo, matched-market, holdout, and marketing mix modeling) and explains why content and creator partners usually test as incremental while coupon and loyalty publishers often do not. The council verified every named figure: the CJ study numbers (46 percent higher conversion, 88 percent more revenue per shopper, drawn from 21 million consumers), the Haus loyalty-site on/off test, the July 2025 eMarketer and TransUnion survey, and the late-2024 Honey scandal. We corrected an unsupported "19.1 million non-affiliate shoppers" claim that the CJ source never states, and tightened the survey wording to match what TransUnion actually reported. The takeaway for a marketer: run the test, then move commission money from partners that sit near checkout to partners that bring new buyers in.

Which Affiliates Create Customers? Incrementality Testing

Where the question came from

How the testing actually works

What a realistic result looks like

Acting on the findings

Why this is becoming non-optional

Council summary

Comments

Leave a comment

Where the question came from

How the testing actually works

What a realistic result looks like

Acting on the findings

Why this is becoming non-optional

Council summary

Comments

Leave a comment

Agentic programming security: the fundamentals most teams skip

Privacy best practices for agentic AI: a consultant's checklist

AI agent governance: the framework most teams build too late