1. Marginal contribution
The lift attributable to the campaign versus a control group. Not engagement rate, not click-through rate, not conversion rate, marginal contribution. If your digital performance platform cannot run a holdout (a randomly-selected cohort that does not receive the campaign so you can measure the counterfactual), it is not really a performance platform, it is a reporting tool. The r/AskStatistics and r/marketing subreddits run good threads on holdout design; the consensus at enterprise scale is that 5 to 10 percent holdouts are usually enough statistical power for the lift sizes that matter, and you should refresh the control cohort quarterly so survivorship bias does not accumulate. For deeper reading, Recursive Partitioning for Heterogeneous Causal Effects (Athey and Imbens, 2016) is the foundational paper on uplift modelling, which is the academic name for what serious performance platforms now do automatically. Geo-experiments (the Google MMM team has a useful 2017 paper) are the alternative when individual-level holdouts are not feasible.
2. Decision velocity
Hours from "we have new data" to "the system is acting on it". A digital performance platform with a seven-day decision cycle is competing on the wrong axis, because the competitor with a 12-hour cycle is iterating fourteen times faster on the same data points. Sub-24-hour is the bar; sub-2-hour is the differentiator for the highest-performing operations we have audited. The slower the cycle, the more your competitors learn from the data points you generate, because the same impression you bought yesterday becomes a shared market signal everyone sees today. This is the metric most teams skip because it requires instrumenting your own platform, not just your campaigns: you have to measure the wall-clock latency from event ingestion to bid update or audience refresh. The Google SRE four golden signals (latency, traffic, errors, saturation) generalise here; latency is the one that matters most for performance marketing platforms.
3. Channel-mix marginal returns
Not "what does each channel return on average", but "what does the next dollar in each channel return". This is the only number that should drive a reallocation decision, and most dashboards report the wrong one. The r/marketing subreddit and the Marketing Mix Modelling academic literature both make this point at length; the average is a vanity number, the marginal is the operating number, and they diverge once any channel has crossed its saturation point. The Meta Robyn and the Google LightweightMMM frameworks (both open source, both 2022 to 2024 vintage) make the marginal return curve the central output of the model, not a footnote. If your digital performance platform reports the average and not the marginal, push back and ask why. The honest answer is usually that the marginal requires saturation-curve modelling and the platform does straight-line attribution.
4. Brand consideration delta (long-window)
Most digital performance platforms ignore brand entirely. That is a mistake supported by no real data. Brand consideration delta over a 90 to 180 day window correlates with future performance more reliably than any short-window metric, including the conversion rates the dashboard celebrates today. The Ehrenberg-Bass Institute's body of work (How Brands Grow, by Byron Sharp, summarises it) is the standard reference; Les Binet and Peter Field at the IPA have shown the same pattern with the 60:40 brand-to-activation split from a different angle. Run a brand-lift study every quarter, not every year, with Kantar or YouGov or a panel provider you trust, and put the result on the same dashboard as the short-window performance metrics. The CMOs we have worked with who run quarterly brand-lift have produced more durable growth than the ones who do not, and we have a long enough sample now to treat that as a strong observation, not a hunch.
5. Cohort retention at 90 days
90-day retention by acquisition cohort tells you the truth about acquisition quality, which the topline blended numbers obscure for months. Cheap acquisitions that churn at 70 percent in 90 days are not cheap; they are expensive acquisitions you are not yet paying for. Build the cohort view: rows are acquisition month, columns are months since acquisition, cells are percentage still active. Look at it monthly. The r/dataengineering subreddit and the Stack Overflow [cohort-analysis] tag have working SQL patterns for the windowed cohort calculation; the canonical pattern is a self-join with a DATEDIFF and a COUNT DISTINCT, and most modern warehouses have window functions that make the query cleaner. The classic textbook reference is Daniel McCarthy's work on customer-base analysis at Wharton, which is the academic basis for most modern LTV models.
6. Customer dispute rate
A sleeper metric for paid acquisition quality, almost never reported on a marketing dashboard, almost always available from the payments team. If your paid-acquisition cohort has twice the dispute rate of your organic cohort (chargebacks, refund requests, bank-initiated reversals), you have an acquisition-quality problem that the topline numbers will not show for two to six months until the disputes settle. This is a finance metric that belongs on the marketing platform too. The Stripe and Adyen dashboards both expose dispute rate by acquisition source if you tag the source on the transaction. The pattern that catches the problem early: weekly review of dispute rate by cohort, side-by-side with paid acquisition CAC.
7. Agent eval score
If you are running at least one AI agent in the marketing loop (which most performance operations are by now, whether for variant generation, audience design, send-time decisioning, or post-campaign analysis), the agent's eval score is a first-class platform metric. Score against a written test set with ground-truth labels, refreshed quarterly so the test set does not stale. The Anthropic Building with Claude documentation and the OpenAI Evals open-source framework both make the same point from different angles: the moat is the eval, not the model. r/MachineLearning has a long-running discussion on eval-set design that has fully crossed over from ML into enterprise marketing platforms in the last 18 months. A score with no trendline is not useful; the value is the regression detection. Watch the score weekly and treat a drop the way you would treat a drop in delivery rate.
Further reading
Real, named sources the editor can swap in for specific URLs. We do not auto-link these because the right link changes over time. If you find a great primary source, write us and we will update the note.
- r/AskStatistics, r/AskMarketing. Practitioner threads on holdout design and control-group hygiene at enterprise scale.
- Ehrenberg-Bass Institute publications. The standard reference on brand vs activation. Cited in most serious marketing-mix conversations.
- r/dataengineering and Stack Overflow [cohort-analysis]. Working SQL patterns for windowed cohort retention and dispute-rate calculations.
- r/MachineLearning. Eval-set design threads. The discipline that crossed over from ML into marketing-platform measurement.
- Anthropic's and OpenAI's eval guidance. Both published the same operating principle from different angles: the moat is the eval, not the model.
- Mark Ritson columns (Marketing Week). The most-quoted modern source on brand-vs-performance trade-offs.
Comments