Here is a small test. A retailer has a budget to send a 20 percent discount code to 50,000 of its customers, and someone has to decide which 50,000. A marketer asks for a lookalike audience. An analyst suggests a propensity model. A data scientist mentions uplift. Everyone nods, the meeting moves on, and three people have just agreed to three different campaigns while believing they agreed to one.
The three terms get used as if they were synonyms for "smart targeting." They are not. Each is a real, well-defined technique, and each answers a question the other two cannot. Pick the wrong one and the campaign still runs, the dashboard still fills with numbers, and the spend still looks productive. It just quietly goes to the wrong people. The confusion is expensive precisely because it never shows up as an error.
What follows is a plain-language separation of the three, built around that one discount decision, and a rule for which to reach for when.
Origin: three problems, three decades, three answers
The three models were not designed as a set. They came from different fields solving different problems, which is part of why they fit together so badly in conversation.
The oldest idea is the causal one. In 1983 the statisticians Paul Rosenbaum and Donald Rubin published a paper in Biometrika called The Central Role of the Propensity Score in Observational Studies for Causal Effects. Their concern was medical and social science research: when you cannot run a clean randomized trial, how do you estimate whether a treatment actually caused an outcome? Their propensity score, confusingly for marketers, is the probability that a unit receives the treatment, a device for making treated and untreated groups comparable. That paper seeded the whole modern apparatus of causal inference, and uplift modeling is its direct descendant.
Uplift itself arrived later, from direct marketing. Nicholas Radcliffe and Patrick Surry described differential response analysis in 1999, and Victor Lo published The True Lift Model in 2002. The motivating insight was uncomfortable: a mailing with a good response rate is not necessarily producing extra sales, because some responders were going to buy anyway. A turning point for practitioners came in 2018, when Criteo released a large-scale uplift benchmark dataset of 25 million rows assembled from real incrementality tests.
Propensity modeling as marketers use it, scoring how likely a person is to do something, grew out of database marketing and credit scoring, the same statistical machinery pointed at "will this customer buy" instead of "will this borrower default." Lookalike modeling is the youngest of the three and the most commercial. It became mainstream when ad platforms productized it. Meta's Lookalike Audiences made the idea famous, and Google, TikTok, LinkedIn, Pinterest and Snapchat all now sell an equivalent under their own names. The platform did the modeling. The marketer just uploaded a list.
Three origins, three instincts. A statistician thinks about causes, a direct marketer about incremental sales, a media buyer about reach. When all three say "targeting model," they are not picturing the same thing.
Present: what each one actually answers
Strip away the jargon and each model answers one question. The questions are different enough that confusing them should be hard. It is not, so here they are side by side, the discount decision running through all three.
A lookalike model answers: who resembles my existing customers? You give it a seed audience, a list of people you already value, and it scans a much larger pool to find the people whose attributes and behavior most closely match the seed. The output is an audience, not a score you would inspect customer by customer. It is built for one job, finding new prospects you have no data on, which is why it lives on ad platforms rather than in your own database. The scikit-uplift documentation frames the mechanics cleanly: a lookalike model is trained on known positive cases and a sample of random unconfirmed cases, so it learns the shape of "good customer" rather than a verified yes or no. For the discount decision, a lookalike model is the wrong tool, because the 50,000 are already your customers. There is nothing to find. Lookalike is for the prospecting campaign, not the retention one.
A propensity model answers: how likely is this specific person to take an action? The action is something concrete, buy, churn, convert, open, upgrade, and the output is a probability from 0 to 1 for each individual. A score near 1 means very likely, near 0 means very unlikely. CleverTap's overview lists the usual family: purchase propensity, churn propensity, engagement propensity, upsell propensity, each one the same machinery aimed at a different action. The model learns from history, past purchases, visit frequency, email behavior, and produces a ranked list. For the discount decision, a propensity model gives you 50,000 customers ranked by how likely they are to buy. That sounds exactly right. Hold that thought, because it is the trap.
An uplift model answers a question that sounds almost identical and is not: how much will the discount change this person's likelihood of buying? Not their likelihood, the change in it. It estimates two probabilities for each person, buying if they get the code and buying if they do not, and reports the difference. Statisticians call that difference the conditional average treatment effect. In plain terms it is the causal effect of the discount on that one person, and it can be positive, near zero, or negative. To estimate it, the model has to learn from an experiment: a treated group and a control group, some people who got the offer and some comparable people who did not. Propensity needs only outcomes. Uplift needs a counterfactual.
The shortest version: lookalike measures similarity, propensity measures likelihood, uplift measures the effect of an intervention. Similar, likely, movable.
The expensive mistake hiding inside the propensity model
Now the part worth the read. The ranked propensity list looks like the obvious answer to the discount question, and reaching for it is the single most common and most costly modeling error in marketing.
Walk through what that list contains. Sort your customers by purchase propensity and the top is full of people scoring 0.9 and above. They are very likely to buy, so you send them the discount. Many of them buy, the campaign reports a strong conversion rate, and everyone is satisfied. But ask the question the report does not: how many of those high-propensity buyers would have bought without the code? A person at 0.92 was already, by the model's own estimate, almost certain to purchase. The discount did not persuade them. It handed them a price cut on a sale you already had.
This is the gap between propensity and uplift, and it has a clean name. Uplift modeling divides every customer into four groups. The sure things buy whether or not you treat them. The lost causes do not buy either way. The sleeping dogs, sometimes called do-not-disturbs, actually become less likely to buy when contacted, an offer reminds them of a subscription they meant to cancel, or a discount makes them suspicious. And the persuadables buy only if treated. They are the one group where the discount creates a sale that otherwise would not exist.
A propensity model cannot tell these groups apart, because likelihood does not separate them. A sure thing and a persuadable can both score 0.8. One is 0.8 with the code and 0.8 without it. The other is 0.8 with the code and 0.3 without it. Same propensity, completely different value to the campaign, and the model sees them as twins. As Customer Science puts it, targeting high propensity can look successful while adding little net revenue once you account for what the control group did anyway.
A propensity-ranked discount campaign therefore spends most heavily on the sure things, because they dominate the top of a propensity list almost by definition. The budget flows to the people who needed it least. Worse, any sleeping dogs in that high-propensity group mean the campaign is paying to reduce sales. Uplift modeling targets the opposite. It ranks people by the change the treatment produces, so it sends the discount to persuadables, skips the sure things to protect margin, skips the lost causes to save spend, and avoids the sleeping dogs. The reframing is the whole point: you are not looking for people likely to buy, you are looking for people your action moves.
The effect is not subtle. One case summarized in the California Management Review cut the number of customers contacted by 80 percent, taking campaign cost from 400,000 dollars to 80,000, while improving renewals, because most of the dropped customers were sure things who would have renewed anyway. The same article makes a sharper point about how to judge a campaign. In one example the plain treated group converted at 18.04 percent and the uplift-optimized group at a lower 17.32 percent, yet the uplift group produced more profit per customer, 5.46 dollars against 5.18. The lower conversion rate was the better outcome, because the conversions it bought were incremental rather than free giveaways. Reported campaigns in banking and telecom have credited causal targeting with revenue increases of 29 to 59 percent over prior methods, the figure depending heavily on the use case.
This is also where the two models relate. Uplift is the causal cousin of propensity. Propensity asks what someone will probably do. Uplift asks what your action does to that probability. Same statistical family, one extra and demanding requirement: a control group to reveal the counterfactual. You cannot bolt uplift onto a campaign after the fact. The holdout has to be designed in from the start.
The decision rule
The three models are not ranked. Uplift is not "better" than propensity, and propensity is not "better" than lookalike. They answer different questions, so the only real question is which one you are actually asking.
Use a lookalike model when you have no data on the people you want to reach. Prospecting, audience expansion on an ad platform, finding new customers who resemble your best existing ones. The honest limits: a lookalike is only as good as its seed, so feed it verified high-value customers rather than anyone who clicked, and remember that similarity is not the same as persuadability. People who resemble your buyers are a sensible place to start, not a guarantee. Meta advises a seed of 1,000 to 50,000 people and lets you trade similarity against reach, with a tighter percentage staying closer to the seed.
Use a propensity model when you need to rank known people by how likely they are to do something, and either the treatment is unavoidable or its cost is low. Lead scoring so sales calls the warmest contacts first. Forecasting churn to size a retention budget. Prioritizing a newsletter that costs almost nothing to send. When contacting everyone is cheap and harmless, knowing who is likely to act is enough, and propensity is the simpler, faster model because it needs only historical outcomes. The same logic shapes the propensity versus uplift targeting decision in any retention program.
Use an uplift model when the treatment is expensive, capacity is limited, or contact can backfire, and you need to spend on the people your action actually moves. Discounts and promotions, where every offer to a sure thing is lost margin. Retention saves, where a clumsy outreach can wake a sleeping dog. Costly channels like direct mail or outbound calls, where the contact list has to earn its keep. The price of uplift is real: you must run a proper treatment-and-control experiment for the training data, and the modeling, meta-learners, uplift trees, causal forests, is harder than fitting a propensity score. The California Management Review piece is candid that uplift is not always worth it. If treatments are nearly free, sleeping dogs are rare, and nobody minds spending a little on sure things, a propensity model will do the job at lower cost.
A compact way to hold all three: lookalike finds strangers who look like your customers, propensity ranks your customers by what they will probably do, and uplift finds the customers your money can actually change. If you remember one line from this, make it that one.
The discount decision from the opening has a clean answer now. Not a lookalike, because the 50,000 are already customers. Not a straight propensity ranking, because that pours budget into sure things who would buy at full price. An uplift model, targeting the persuadables, because the discount only earns its cost on the people it actually persuades. Three models, three questions, one that fits. The expensive mistake is not failing to use a model. It is answering the wrong question with great precision.
Council summary
This post argues that propensity, lookalike, and uplift are not three names for smart targeting but three answers to genuinely different questions: who resembles my customers, how likely is this person to act, and how much does my action move them. Its central lesson is the trap inside the propensity model, a ranked list of likely buyers pours budget into sure things who would have bought anyway, while only uplift isolates the persuadables whose purchase the treatment actually creates. The reader should leave able to map any campaign to the right model: lookalike for prospecting strangers, propensity when contact is cheap, uplift when the treatment is expensive or can backfire. The figures, including the California Management Review case that cut targeting cost from 400,000 dollars to 80,000 with no loss of renewals, hold up to scrutiny and make the cost of the wrong choice concrete. The takeaway is blunt: the expensive mistake is rarely failing to use a model, it is answering the wrong question with great precision.
Comments