In summer, ice cream sales and a brand's email open rates both climb. Nobody concludes that ice cream drives email engagement, because the hidden cause is obvious: it is summer, and summer lifts both. The logic is so clear it is almost funny.
Now move the same logic one room over. A team sees that conversions are highest among the people who received a retargeting ad, so retargeting gets the credit and a bigger budget next quarter. This time nobody laughs. It looks like measurement. It is the same mistake as the ice cream, wearing a suit.
That mistake has a price, and for most marketing budgets it is the single most expensive one being made. Not fraud, not a tracking bug, just a quiet confusion between two things moving together and one thing making the other happen. This post explains how that confusion works, why marketing metrics are built to commit it, and what it costs.
Origin: a distinction older than marketing
The difference itself is not subtle. Correlation means two things move together: when one goes up, the other tends to move alongside it. Causation means one of them makes the other happen. Remove the cause and the effect changes. Remove one half of a mere correlation and the other half may not move at all.
The two get confused so easily because of the confounder, sometimes called a lurking variable. A confounder is a hidden third factor that drives both of the things you are watching, so they rise and fall in step without either one touching the other. Summer is the confounder behind ice cream sales and drowning deaths, the textbook example of a spurious relationship. Statisticians keep a long catalogue of these, from stork populations that tracked Dutch birth rates to a football result that called 16 straight US presidential elections by pure coincidence. The relationships are real in the data and empty in the world.
Science settled on one tool to cut through this: the randomized controlled trial. Split a population at random into two groups, treat one, leave the other alone, and because the split was random the two groups are alike in every respect except the treatment. Any gap in outcomes has only one place to come from. The control group is not a nicety. It is the mechanism that turns a correlation into a causal claim, and a confounder cannot survive it, because random assignment scatters the hidden factor evenly across both sides.
Marketing measurement grew up without that tool. The early web could log a click, tie it to a cookie, and draw a line from ad to purchase. That line felt like proof. It was a correlation with good production values, and the industry has been paying for the difference ever since.
Present: why almost every marketing metric is correlational
Here is the uncomfortable part. This is not a flaw in one bad model; it is the default state of nearly every standard marketing number.
Start with attribution, the system that decides which channel gets credit for a sale. Last-click, multi-touch, data-driven attribution: under the hood they all do the same thing. They look at the conversions that happened, inspect the touchpoints present along the way, and divide the credit among them. Notice what that procedure can and cannot see. It records which channels were in the room when the purchase occurred. It never tests whether the purchase needed them. As one plain-spoken account puts it, attribution models assume that because touchpoints appear alongside conversions they caused them. There is no control group anywhere in the method. Attribution is a correlational instrument by construction, no matter how advanced the math on top.
That has a predictable consequence. The channels closest to the moment of purchase look strongest, because they are the most reliably present when conversions land. Branded search is the clearest case: someone already decided to buy from you, types your name into Google, clicks the ad above the organic link, and converts. The ad was present, so it earned the credit, and it changed nothing, because that customer was arriving regardless. Retargeting works the same way, since it shows almost exclusively to people who already visited your site. A channel does not have to cause demand to look powerful in a report. It only has to show up near demand that already exists.
Then there is the confounder, doing in marketing exactly what summer does to ice cream. The holiday season lifts both ad spend and sales, so December ads look brilliant. A brand with loyal customers gets repeat purchases and runs ads to those same customers, so the ads look brilliant again. In each case a hidden factor, the season or the loyalty, drives the exposure and the sale at once, and the ad is a bystander collecting a reward it did not earn. The ad and the conversion share a cause. Neither caused the other.
There is one more layer, and it is the most subtle, because it survives even when nothing is broken. It is selection bias. Modern ad platforms are optimization engines: they study who clicks and converts, then steer impressions toward the people most likely to do both. That is the platform working as designed. But think about what it does to your measurement. The algorithm has deliberately filled your exposed group with the people who were already the best bets, so that group converts at a higher rate than the unexposed group, and it would have done so with the ads switched off. As Florian Zettelmeyer of Kellogg described the trap, even if the ad did nothing, the person who saw it will look like they bought more than the person who did not. Targeting that good is a feature for performance and a poison for measurement.
The studies that put a number on it
This is not theory. Researchers have measured the gap directly.
The most cited example is eBay. Economists Thomas Blake, Chris Nosko, and Steven Tadelis ran a large-scale field experiment that switched off eBay's paid search ads across a set of US markets. For branded keywords the short-term effect was not measurably different from zero: when the brand ads went dark, 99.5 percent of the traffic those ads would have delivered arrived through organic results anyway. Attribution had credited that spend for years.
The Facebook study went wider. Brett Gordon, Florian Zettelmeyer, and colleagues ran 15 large advertising experiments on Facebook and compared each randomized result against the observational methods marketers normally rely on: demographic matching, propensity scoring, before-and-after comparisons. The observational methods overestimated ad effectiveness, and not by a sliver. The estimates were often off by a factor of three or more. In one campaign, observational analysis reported a 416 percent lift where the controlled experiment put the true figure near 77 percent. Not a single observational method performed reliably across the board. There is even a name for the everyday version of this. Randall Lewis, Justin Rao, and David Reiley documented activity bias: a person who is more active online on a given day is more likely both to see your ad and to search for your brand, for reasons that have nothing to do with the ad. Being online is the confounder.
Why this is the most expensive mistake
A measurement error is harmless until money moves on it. This one moves a lot of money, in a consistent and costly direction.
Run a correlational report as a scoreboard and the logic seems sound: feed the channels that win, cut the channels that lose. But the winners on that scoreboard are the demand-harvesting channels, branded search and retargeting, the ones that look strong because they sit near purchases that were always going to happen. The losers are the demand-creating channels, the prospecting and upper-funnel work whose payoff lands weeks later and far from the final click, where attribution cannot see it. The rational-looking move is to fund the channels that would have delivered anyway and starve the ones that generate new demand.
This is why the efficiency trap is so convincing. Concentrate spend into harvesting and your blended return on ad spend climbs, because you are increasingly paying to reach people already close to buying. The dashboard looks better every quarter while new-customer growth flattens, because you have stopped filling the top of the funnel and are recirculating the same audience until everyone who knows you has bought or decided not to. By the time the slowdown is undeniable it looks like a market problem, and the measurement that caused it is still running, still reporting nothing wrong.
The hard evidence that this is real money: when Procter and Gamble cut roughly 200 million dollars of digital ad spend it had identified as ineffective, it reported no negative effect on its business and saw reach rise. That spend had sat inside correlational reports looking productive. For retargeting specifically, practitioner analyses estimate the incremental return is often 40 to 70 percent lower than the reported figure. The gap between credited and caused is not a rounding error. It is a budget line.
The cure: build the counterfactual
The fix follows from the diagnosis. If the disease is mistaking presence for cause, the cure is the one thing every correlational metric lacks: a genuine counterfactual. What would these same people, in this same period, have done with the ad switched off?
You cannot find that in a dashboard. You have to build it, and that means an experiment. An incrementality test does what the drug trial does. It holds out a randomized group from your ads, runs the campaign to everyone else, and measures the difference. Because the holdout was chosen at random, selection bias is gone: the two groups are alike, so the platform's targeting cannot tilt the result. Because they live through the same calendar, confounders are gone: the holiday season hits both equally. The gap that remains is incremental, the share of sales the ad genuinely caused. A companion piece on incrementality testing walks through how to run one. It is the only standard method that builds the comparison group into its design, which is why marketers increasingly treat it as the causal benchmark for the rest. In Haus's January 2026 survey of 500 US decision-makers, incrementality testing was the most-trusted method at 60 percent, 20 points clear of marketing mix modeling.
Marketing mix modeling sits in between. MMM uses regression to relate aggregate spend to aggregate sales while statistically controlling for known confounders like seasonality, price, and promotions. That control is real and valuable, far better than raw attribution. But controlling for the confounders you can name is not the same as randomizing away the ones you cannot. MMM observes; it does not run an experiment. It is correlational evidence, carefully adjusted, which is why the strongest modern practice calibrates the model with experimental results rather than trusting it alone. Attribution tells you what was nearby. MMM tells you what was nearby, adjusted for the obvious confounders. Only an experiment tells you what the ad caused.
Future and impact: one question for every number
Causal measurement is getting cheaper and more automated, and AI agents are starting to design and read incrementality tests rather than just report dashboards. That helps. But the durable fix is not a tool. It is a habit, and it costs nothing to adopt today.
Before you move budget on any marketing number, ask one question: could this be explained by something other than the ad? Run it through the three traps. Is the channel just present near demand that already existed, the way branded search sits in front of customers who already chose you? Is a confounder driving both the exposure and the sale, the way the holiday season or plain customer loyalty does? Did the platform's targeting hand-pick an audience that was always going to convert, so the ad looks effective by selection? If any of those could explain the number, you do not yet know what the channel is worth. You only know it was in the room. This is the same reasoning that makes last-click attribution a comfortable lie: it reports the final touch as a cause when it is usually just the closest correlation.
That one question separates the marketers who are measuring from the marketers flattering themselves with a chart. The most expensive mistake in marketing analytics is not a bad model or a broken pixel. It is the quiet, comfortable assumption that a metric showing two things moving together has told you which one is paying the bills. It has not. Only a counterfactual can, and a counterfactual has to be built.
Council summary
The post argues that nearly every standard marketing metric, attribution above all, measures correlation and quietly sells it as causation, and that this confusion is the most expensive error in marketing analytics because it rewards demand-harvesting channels that would have delivered anyway while starving the work that creates new demand. It earns the claim with hard evidence: the eBay branded-search experiment, the 15-experiment Facebook study where observational methods overestimated lift by a factor of three or more, and P&G's $200 million cut that left its business unharmed. The confounder and selection-bias sections carry the teaching value and are concrete and correct. The takeaway is a discipline, not a tool: before moving budget on any number, ask whether something other than the ad could explain it, and trust only a built counterfactual to tell you what a channel truly caused.
Comments