A vendor demo is a controlled environment. The data is clean, the use case is hand-picked, the agent does exactly what it was rehearsed to do, and twenty minutes later you have watched something that looked autonomous. None of that tells you whether the platform holds up against your data, your definitions and your edge cases.
Part 1 of this series argued that the agentic CDP is one genuine change under a great deal of rebranding: the customer profile stops being something a person reads and becomes something software acts on. This post is the practical follow-up, the script you take into the vendor meeting. Twenty questions, grouped by theme, each with a note on what a strong answer sounds like and what a hand-wave sounds like. Work through them and you run a real evaluation, not a guided tour.
A note before the list. Gartner has a term, "agent washing," for the rebranding of chatbots, assistants and automation scripts as agents without the substance. In a May 2026 assessment of the supply chain planning market, its analysts pressed buyers to scrutinise agentic claims, because many products do not actually re-sequence their objectives, weigh trade-offs or adapt how they execute. The same caution applies to customer data platforms. The questions below make a vendor either demonstrate that substance or run out of room.
Theme one: what the agent actually does
The first job is to find the line between an agent that acts and an assistant that suggests. Vendors blur it on purpose, because "agentic" sells and "suggests things a human then does" does not.
1. Walk me through one task your agent completes end to end, with no human in the middle. A strong answer is specific and unglamorous. The vendor names a real task, a churn-risk intervention, a stalled-checkout nudge, and walks the whole loop: the agent reads the profile, decides, acts on a live channel, sees the result, adjusts. A hand-wave stays at the level of "the agent helps your team work faster" and never names a task that finishes without a person.
2. Where exactly does a human have to approve something, and where does the agent proceed alone? You want a clear map, not a mood. A good vendor can draw the line: these actions run autonomously inside guardrails, these always wait for sign-off. A weak vendor calls everything "autonomous" and then, under questioning, reveals a human approves every send. That is assisted work. It can be the right choice, but it should not carry the price the word promises.
3. Does the agent change its own plan when something fails, or just retry the same step? This is the Gartner test in plain language. A real agent, told a channel is unavailable or a segment is empty, re-routes: picks another channel, revises the approach, escalates. A scripted automation retries the identical step or stops. Ask for a live example of the agent hitting a wall and taking a different path.
Theme two: the semantic layer and data definitions
An agent has no tribal knowledge. A human analyst knows "active customer" means one thing in finance and another in marketing, and silently adjusts. An agent does not know, does not ask, and acts with full confidence on whichever definition it was handed. This is the most underrated part of an agentic CDP evaluation.
4. Where do the definitions of "revenue," "active customer" and "churned" live, and is that the single source the agent reads from? The answer you want: there is one governed semantic layer, the definitions are versioned and owned, and the agent reads from it rather than inferring meaning from raw column names. A hand-wave is "the agent understands your data" with no artefact behind it. An agent that infers meaning will infer wrong.
5. Do you support an open semantic standard, or is the semantic layer proprietary to you? This matters for the long run. In January 2026 the Open Semantic Interchange initiative, co-led by Snowflake, dbt Labs, Salesforce and BlackRock, published a version 1.0 specification for sharing metric definitions across tools. A vendor moving toward that standard is letting your definitions stay portable. A vendor whose semantic layer only works inside its own walls is quietly building lock-in into the most important layer you have.
6. What happens when a definition is ambiguous or two teams disagree? A serious platform has an answer: the agent flags the ambiguity and stops rather than guessing, and there is a workflow to resolve the conflict. A platform that lets the agent pick a definition and proceed will act on the wrong population and tell you it succeeded.
Theme three: identity and data trustworthiness
An agent acts on whatever profile it is given. If identity resolution has merged two people into one profile, a human reading it might catch the contradiction. An agent will act on the merged ghost without hesitation: message the wrong person, suppress the wrong audience, personalise to a stranger's history.
7. How confident is each identity match, and can the agent see that confidence? Strong vendors expose a match-confidence score and let agents treat a shaky match differently from a certain one. The weak version returns every profile as equally true, deterministic and probabilistic guesses flattened into one object, so the agent cannot tell a near-certain identity from a coin-flip.
8. Can I set a confidence threshold below which the agent will not act autonomously? This is the practical safety valve: act freely on high-confidence profiles, hold low-confidence ones for review. If the vendor cannot offer that, every probabilistic match becomes an autonomous action waiting to misfire.
9. How does the agent handle a profile with missing or stale data? Ask what the agent does with an incomplete profile: does it proceed on partial data, pause, or fall back to a safe default? "It just works" is not an answer. You want to know the failure behaviour before it happens to a customer.
Theme four: guardrails and human approval
Guardrails are the rules an autonomous system runs inside: budget ceilings, frequency caps, brand limits, compliance boundaries. An agent acting wrong at machine speed can do real damage before anyone looks.
10. Show me where I configure guardrails, and show me the agent being stopped by one. A real platform has a guardrail console: spend limits, contact-frequency caps, audience exclusions, content rules. And it can demonstrate, live, an agent trying to exceed a limit and being blocked. A hand-wave describes guardrails as a roadmap item or a slide. If you cannot see one stop an agent in the demo, assume it does not exist yet.
11. Are guardrails enforced for every action, or only checked when the agent starts? An agent's behaviour shifts as it reasons through a task, so a guardrail checked only at the beginning can be drifted past by the end. The strong answer is enforcement at the moment of each action. The weak answer is a one-time check at kickoff.
12. Can I match the approval requirement to the risk of the action? Not every action deserves the same gate. A mature platform lets you tier it: low-risk actions run free, an action touching sensitive data waits for a fast human check, a high-stakes action gets a firmer gate. Ask, too, what happens when an approval request times out. The safe default is to deny and log, not proceed because nobody answered.
Theme five: observability and audit
If you cannot see what an agent did and why, you cannot govern it, debug it, or defend it to a regulator. Observability is not optional once software is acting on customers at volume.
13. After the agent acts, can I see the full chain: what it read, what it decided, why, and what it did? A strong platform treats this as a first-class feature: a readable trace linking the action to the data consulted, the reasoning, the guardrails applied and the outcome. A weak platform shows a log of sends with no decision context, telling you what happened but never why.
14. When an agent does something wrong, how do I find the root cause? Agentic failures are often quiet, and small per-step errors compound across a multi-step task, so a process that looks reliable in a one-shot demo degrades once it runs ten steps in sequence. Ask the vendor to walk a real misfire back to its cause. If the honest answer is that the agent's reasoning is opaque, you are buying something you cannot supervise.
15. Is the audit record complete enough to show a regulator? This is now concrete. Under the EU AI Act, obligations for high-risk systems take effect on 2 August 2026, and they require human oversight to be technically built into the system, including a stop control that halts it in a safe state, not merely described in a document. Ask whether the platform's audit trail and oversight controls were designed against that bar.
Theme six: where the agent runs and what it can touch
An agent is software running somewhere, with credentials, reaching into systems. Treat it as an abstract helper rather than a process with real access and access surprises follow.
16. Where does the agent execute, and what credentials does it carry? You want specifics: where the agent runs, what identity it uses, and whether its permissions are scoped to the task in front of it. The strong pattern emerging across the industry is short-lived, narrowly scoped credentials minted per task and expired after use. The weak pattern is a single broad service account reused for everything, which makes one compromised agent a wide-open door.
17. If the agent reaches data through an MCP server or API, what can it see through that doorway, and what is walled off? The Model Context Protocol is becoming a standard way agents, including agents you did not build, reach customer data. A good vendor can show you exactly which profiles, fields and actions are reachable and which are not, and confirms that permission is checked at request time rather than only at connection time. "It connects to everything" is a warning, not a feature.
Theme seven: governance and permissions
Governing a dashboard means controlling who can see what. Governing an agent means controlling what it may do, to whom, how often, and with whose authority. Most organisations are not ready: Deloitte research found most companies still lack mature governance for autonomous agents, including clear boundaries on which decisions an agent may make alone.
18. Does an agent have its own identity and permission set, separate from the human who launched it? The strong answer is yes: an agent is a governed entity with its own role, its own permissions and its own audit identity. The weak answer is that the agent inherits the launching user's access, which means it can do anything that person can do, at machine speed, with no separate accountability. As you add agents, that becomes the core of your governance model.
Theme eight: the cost model
Agentic pricing is genuinely unsettled, and that uncertainty is a real budget risk. Agents consume far more compute than dashboards because they run reasoning loops, and in agentic workloads the input side of that consumption can outweigh the output many times over. Costs can climb quietly.
19. Walk me through three years of total cost, and tell me which line scales with agent activity. A trustworthy vendor walks the full picture: licence, implementation, the consumption component, and which costs grow as agents do more work. You want to know whether you pay per agent, per action, per outcome, or per token of compute, and what happens when an agent becomes busier than planned. A vendor who deflects with "we will scope pricing after the demo" is signalling an unpredictable bill. Ask for a worked example at your real volume.
Theme nine: failure handling and lock-in
The last two questions are about what happens when things go wrong, and what happens when you want to leave. Both are easy to skip in a hopeful buying mood and expensive to skip in hindsight.
20. When an agent fails mid-task, what happens, and how hard is it to get our data and definitions out? Two questions in one, because they share a root. On failure, you want a clear behaviour: the agent stops safely, partial work is recorded, a human is alerted, nothing is left half-done in a customer's experience. On exit, you want your raw customer data, your identity graph and your semantic definitions to be portable in a usable form. A platform that activates data in your own warehouse and uses an open semantic standard is structurally easier to leave. A platform that holds your profiles, your identity logic and your definitions in proprietary formats has built an exit cost that compounds every year you stay. Ask the exit question while you still have negotiating power, which is now, before you sign.
How to use the script
Do not read these out like a checklist. Pick the themes that match your real risk. A regulated enterprise should lean hardest on guardrails, observability and audit. A team without data engineers should press on the semantic layer and identity, because that is where an under-resourced agentic project quietly stalls. Everyone should ask question 20.
Watch for two demo tactics. The first is the vendor who shows the agent only against perfect synthetic data; ask to see it run against a messy, realistic sample, because your data is messy and that is the test that counts. The second is the vendor who answers a capability question with a roadmap. Mark anything that exists only in a future release as absent, because for your evaluation it is.
Gartner expects more than 40 percent of agentic AI projects to be cancelled by the end of 2027, undone by cost, unclear value or weak controls. Working through a script like this one is no guarantee of landing in the surviving share, but skipping it means making the most consequential martech decision of the next five years on the strength of a rehearsed twenty-minute demo. The agentic CDP is real. The interrogation is how you find out whether the one in front of you is.
Council summary
This post argues that an agentic CDP cannot be evaluated from a rehearsed demo, and it hands the reader a working script: twenty questions across nine themes, each pairing a strong answer with the hand-wave it should replace. The council verified every external claim. Gartner's May 2026 warning on agent washing, its forecast that more than 40 percent of agentic AI projects will be cancelled by the end of 2027, the Open Semantic Interchange version 1.0 specification from January 2026, and the EU AI Act high-risk obligations taking effect on 2 August 2026 all check out as cited. One correction was made: the OSI backers were named as Snowflake, dbt Labs, Salesforce and BlackRock, since BlackRock is a founding co-lead and Databricks joined later. The takeaway for a buyer is to treat any roadmap-only capability as absent, insist on a run against messy data, and always ask question 20 about failure behaviour and exit cost while negotiating leverage still exists.
Comments