Open the conversion report in any Google Ads account and you will see numbers with decimal points. A Search campaign credited with 4.3 conversions. A YouTube line at 1.8. Display sitting at 0.6. Nothing in that report ever clicked a checkout button 0.6 times. Those fractions are the output of a model called data-driven attribution, which Google made the default for almost everyone. Most marketers accept the fractions, build budget decisions on them, and present them to a finance team that assumes a real method sits underneath.

A real method does sit underneath. It is just one that almost nobody who uses it could explain. Ask why YouTube got 1.8 and not 2.4 and the honest answer is usually a shrug. That gap, between a number a team reports every week and a mechanism the team cannot describe, is worth closing. The mechanism is not magic and it is not nonsense. It also is not as trustworthy as the decimal points make it look.

Origin: how this became the only real choice

For most of the digital marketing era, attribution meant picking a rule. Last-click gave everything to the final touch. First-click gave everything to the first. Linear split credit evenly, time decay leaned toward recent touches, position-based loaded the ends of the path. Every one of these was a guess dressed as a policy. None of them looked at your data to decide; they applied a fixed shape to every conversion regardless of what actually happened.

Data-driven attribution was the alternative, and the idea is older than GA4. Google built it into Universal Analytics and Google Ads years before, offering it as a premium, opt-in upgrade for accounts with enough volume to support it. For a long time it sat alongside the rule-based models as one option among several. A marketer could choose.

That choice ended. In April 2023 Google announced that first-click, linear, time decay and position-based attribution were going away in Google Ads and GA4. The ability to select them was removed for new conversion actions in June 2023, and by September any conversion action still using one was switched to data-driven attribution automatically. The stated reason, in Google's own help documentation, was that rule-based models do not adapt to changing customer journeys, and that less than 3 percent of Google Ads web conversions still used those four models anyway. After the cull, two options remained: data-driven attribution, or last-click. The middle ground was gone.

So data-driven attribution did not win an argument so much as inherit the field. It is the default not because every marketer was persuaded, but because the alternatives were removed and the one fallback left is the comfortable lie of last-click. When a tool gives you DDA or a model everyone already knows is biased, DDA is what you run.

Present: what the model is actually doing

Here is the part most explanations skip or fudge. Data-driven attribution does not score touchpoints by gut feel or by a hidden Google preference. It does something specific, and the logic is followable.

Start with the raw material. The model looks at conversion paths, the ordered sequences of channels a user touched. Crucially, it looks at two kinds: the paths of people who converted, and the paths of people who did not. That second group is what separates DDA from every rule-based model. A rule never sees a failure. DDA learns from them. Google's methodology documentation describes the model as using path data from both converting and non-converting users to estimate how the presence of a given touchpoint changes the probability of a conversion.

That word, probability, is the engine. The model builds an estimate of conversion likelihood for a path, then asks a counterfactual question about each touchpoint: how much does the probability change if this touchpoint is removed? Google's own example is concrete. Take a path of Organic Search, then Display, then Email, with a modeled conversion probability of 3 percent. Remove Display and the comparable probability drops to 2 percent. That one-point drop is Display's measured contribution on that path. Credit follows the contribution. The channel is rewarded for the lift it is associated with, not for where it happened to sit in the sequence.

Behind that comparison is a piece of mathematics worth naming, because it is the conceptual backbone and it is not new. It is the Shapley value, from cooperative game theory, published by Lloyd Shapley in 1953. The Shapley value answers a clean question: when several players cooperate to produce a payoff, what is each player's fair share? Its answer is each player's average marginal contribution across every order in which the players could have joined. Treasure Data has a readable walkthrough of the idea applied to marketing.

Translate that out of the math. Imagine three salespeople closed a deal together. You could credit whoever spoke last, which is last-click. Or you could ask a fairer question: across every possible order those three could have worked the prospect, how much did each one add on average when they joined? The teammate who consistently moved the deal forward, whether they went first or last, earns the larger share. The smec team uses almost exactly this analogy for DDA. A channel's fair share is its average marginal contribution across the orders it could appear in. That is the whole idea. The rest is engineering.

That engineering is not trivial. Computing an exact Shapley value means evaluating every possible coalition of channels, and that count doubles with each channel added, so at production scale Google samples orderings and approximates rather than computing it exactly. The principle holds; the precision is traded for the ability to run it on billions of paths.

There is a hard practical gate on all of this, and it is the single most important fact a GA4 user should know. The model only runs if it has enough data. For a GA4 property, Google's requirement is roughly 400 conversions per conversion type with a path length of at least two interactions, plus 10,000 paths in the reporting view, all within the past 28 days. Miss that threshold and DDA cannot generate a model. The behavior below the line is the trap: GA4 does not stop and warn you in plain terms across every report. Practitioner guides note it effectively falls back to last-click for properties that do not qualify, so a team can believe it is reading data-driven numbers while looking at last-click in a different coat. On the Google Ads side the recommendation is lower, around 200 conversions and 2,000 ad interactions in 30 days, and Ads will run DDA below that, but the same warning applies: thin data makes the model jumpy.

Why it still feels like a black box

DDA is more honest than last-click. It learns from real paths, it accounts for non-converters, it has a respectable theory underneath. And yet marketers still call it a black box, and they are not being lazy when they do. Three things earn the label.

First, there is no rule a person can state. Last-click is wrong, but you can explain it to a stakeholder in one sentence: the final touch gets the credit. DDA has no such sentence. The answer to "why did this channel get this number" is "the model's estimate of its marginal contribution across sampled orderings of observed paths," which is true and also unusable in a budget meeting. A measurement nobody in the room can articulate is hard to defend and harder to challenge.

Second, the numbers move. Because DDA is retrained on recent data, the credit split shifts as the data shifts. Launch a new channel, change a campaign mix, hit a seasonal swing, and the allocation can move noticeably. Below the volume threshold it gets worse, and analyst write-ups warn that results grow unstable on sparse data, because correlated channels and thin path segments make the credit split lurch on small changes. A model that gives a different answer this month than last, for reasons a marketer cannot pin down, feels less like a measurement and more like weather.

Third, and most important, it is correlational, not causal. This is the deep limit, and it is easy to miss because the counterfactual language sounds causal. When DDA "removes" Display and watches probability fall, it is not running an experiment. It is comparing groups in historical data, and historical data is full of confounding. A channel that shows up late in many converting paths gets credited for lift it may not have caused, because users near the end of a journey were already likely to buy. The critique is consistent: the model assumes the path data carries a genuine cause-and-effect signal, and often it does not. Pricing changes, a competitor outage, a PR moment, an offline conversation, none appear in the path, yet all move conversions. DDA hands that movement to whichever trackable channel happened to be nearby.

There is a narrower blind spot on top of the deep one. DDA can only weigh touchpoints Google can observe. Cross-device journeys break when the cookie does not carry across, ad blockers and consent refusals remove paths entirely, an in-store purchase after an online ad is invisible, and any channel without a trackable click barely registers. As privacy changes erode signal, more of the input is modeled rather than observed, and accuracy degrades with it. And because Google both builds the model and sells the ad inventory it scores, GA4 troubleshooting guides note the standing concern that the algorithm cannot be audited for neutrality toward Google's own channels. There is no public proof of bias. There is also no way for an advertiser to check.

Future and impact: when to trust it, and when not to

The useful verdict is not "DDA is good" or "DDA is bad." It is a question of fit.

Trust data-driven attribution for what it is genuinely better at than last-click: tactical, within-channel signal where you have real volume. Which keyword, which creative, which audience is pulling its weight across a multi-touch path. If the property clears the conversion threshold comfortably and the channel mix is stable, DDA gives a more credible read than last-click and a far more credible one than the multi-touch rule-based models it replaced. For day-to-day optimization inside Google's ecosystem, it is a reasonable instrument. Used that way, it earns its place.

Do not trust it as causal proof for big budget moves. The question "what happens if I shift a large sum from this channel to that one" is causal, and DDA cannot answer causal questions, because it observes correlations in past paths rather than testing a counterfactual. For that, the field has turned to incrementality testing, where a real holdout group gives a true counterfactual: show the channel to one group, withhold it from a comparable one, measure the difference. The momentum is clear. eMarketer reports that 52 percent of US brand and agency marketers already run incrementality experiments. Measured's decision framework puts it simply: use attribution for tactical signal, marketing mix modeling for strategy, and incrementality experiments for causal validation, rather than asking any one of them to do all three.

The honest near-term picture is triangulation. No single model is the source of truth. DDA gives a fast, granular tactical read; incrementality tests check whether the credited channels add net-new outcomes; marketing mix modeling sets the strategic allocation without user-level tracking at all. AI agents are starting to make that loop cheaper, drafting holdout tests and flagging when a channel's data-driven credit and its measured incremental lift drift apart. At Perform Digital, that drift, the moment the black box and the experiment disagree, is exactly the signal worth building an agent to watch, because it is where budget quietly leaks.

So keep reading the fractions. Just hold them at the right weight. Data-driven attribution is a real model doing a defensible thing: estimating each channel's average marginal contribution across the orders it could appear in. It is also opaque, restless, and correlational, and it sees only what Google can see. Treat it as a sharp tactical instrument and a poor courtroom witness. The 0.6 next to Display is an estimate, produced by game theory and a lot of sampling, of an effect the model can never actually prove.

Council summary

This post argues that GA4's data-driven attribution is neither magic nor nonsense: it is a real model that estimates each channel's average marginal contribution using Shapley-value logic from cooperative game theory, learning from both converting and non-converting paths. It earns the "black box" label honestly, because no plain rule describes it, its credit splits shift with every retrain, and it silently falls back to last-click below the 400-conversion, 10,000-path threshold most small properties never clear. The reader's takeaway is a question of fit: trust DDA for tactical, within-channel optimization where volume is real, but never treat it as causal proof for large budget moves, since it reads correlations in past paths rather than testing a counterfactual. For that, the post points to incrementality testing and marketing mix modeling, and to the gap between credited and measured lift as the place where budget quietly leaks.

Inside Data-Driven Attribution: What GA4's Black Box Does

Origin: how this became the only real choice

Present: what the model is actually doing

Why it still feels like a black box

Future and impact: when to trust it, and when not to

Council summary

Comments

Leave a comment

Origin: how this became the only real choice

Present: what the model is actually doing

Why it still feels like a black box

Future and impact: when to trust it, and when not to

Council summary

Comments

Leave a comment

Agentic programming security: the fundamentals most teams skip

Privacy best practices for agentic AI: a consultant's checklist

AI agent governance: the framework most teams build too late