A shopper opens ChatGPT and types a real question. Best waterproof hiking boots for wide feet under 150 dollars. The assistant thinks for a moment and returns three product names with a short reason for each. The shopper picks one and buys it, maybe inside the chat, maybe after a quick search for the brand.

Now ask the merchant whose boot was recommended a simple question. Did that happen? In almost every case the honest answer is that they have no idea. Their web analytics shows nothing. Their search rank reports show nothing. The single most important commercial event of the week, a machine choosing their product over a competitor's in front of a buyer, left no trace they can see.

That blind spot is the reason a new analytics category exists. Call it answer-engine analytics, or AI-visibility analytics. Its entire job is to answer one question for a commerce team: when AI assistants talk about products like yours, do they recommend you, mention you, or leave you out.

Why traditional analytics went blind

To see why this needs new tooling, you have to see exactly where the old tooling fails.

Web analytics, the Google Analytics family, measures what happens on your site. It is built on a clickstream: a visitor arrives, the browser passes a referrer header that says where they came from, and every page and action gets logged. The whole model assumes a click that lands somewhere you control.

An AI recommendation breaks that model in two places. First, much of it never produces a click at all. The assistant reads product information, forms a recommendation, and delivers it inside the conversation. The shopper may act on it without ever touching your site, and increasingly the purchase itself can complete inside the chat. There is no session to record because nothing happened on your property.

Second, even when a click does happen, it usually arrives stripped of identity. Users frequently copy a brand name out of a chat and search for it separately, or paste a URL directly. AI apps and in-app browsers often drop the referrer header for privacy reasons. The result is that AI-driven visits pile up in the bucket analytics platforms label "direct." One analysis of around 446,000 visits found that roughly 70 percent of AI-referred traffic was misclassified as direct in GA4. From the dashboard, a shopper sent by a glowing ChatGPT recommendation looks identical to someone who already knew you and typed your address.

SEO analytics fails for a different reason. It is built around a fixed, public artifact: the search results page. Rank trackers can load that page on a schedule, find your link, and report its position, because the page is the same for everyone and stays put long enough to measure. An AI answer has none of those properties. It is generated fresh each time, it is different for different users, and there is no public page to load. There is no "position 3" to track. The thing you want to measure does not sit still and is not the same twice.

So the gap is real and structural. The old instruments were built for a web of pages and clicks. The recommendation now happens inside a generated answer, and that answer is invisible to both of them.

What answer-engine analytics actually measures

Since there is no clickstream to read, these tools cannot passively observe. They have to go and ask. The core method across the category is the same: send a large, structured set of prompts to the AI assistants, capture the answers, and analyze what comes back at scale.

A commerce team starts by defining the questions a real shopper would ask in its category. Best budget office chair. Sustainable running shoes for marathon training. Gift ideas for a coffee lover under 50 dollars. A tool then runs those prompts, often many times each, across the major assistants, and parses every response. From that raw material it produces a few specific measures.

Presence and recommendation rate. Out of the prompts where a product like yours could appear, how often does your brand actually show up, and how often is it a genuine recommendation rather than a passing mention. This is the headline number, the AI-era equivalent of asking whether you are on the shelf.

Share of voice. The same question framed competitively. Across the answer set, what percentage of product mentions are yours versus each named rival. This is where a merchant learns it owns 35 percent of mentions for one query type and almost none for another.

Citation tracking. Which sources the assistant leaned on to build its answer. AI shopping advice draws heavily on review sites, retailer pages, and forum discussion, and the tools record which URLs and domains keep getting cited. That tells a merchant where the assistant is actually getting its opinion of them.

Sentiment and attributes. Not just whether you appear, but how you are described. The tools classify mentions as positive, neutral, or negative, and extract the labels the assistant attaches to a product. Profound's shopping product, for example, surfaces the attributes an answer engine assigns, the difference between being framed as "best budget" and "premium performance." For a brand, the adjective the machine chose can matter as much as the placement.

Profound is the most visible name here. The company raised a 35 million dollar Series B in August 2025 led by Sequoia, bringing total funding to 58.5 million dollars, and in November 2025 launched a dedicated Shopping Analysis product aimed squarely at retailers. It is not alone. Alhena, Otterly, Peec AI, Evertune, Authoritas, and SE Ranking all sell versions of AI-visibility tracking, and the incumbents have moved in: Adobe shipped an LLM Optimizer and in April 2026 completed its acquisition of Semrush, folding AI-visibility data into its enterprise stack. A category that did not exist two years ago now has a crowded vendor list and real venture money behind it.

How they sample, and why the method matters

Here is the part a commerce team has to understand before trusting any number on these dashboards. Because there is no clickstream, the data is generated by the tool, not observed from real users. Every figure is the result of a sampling design, and the design has consequences.

The first consequence is variability. Run the same shopping prompt through the same assistant twice and you can get different products, different wording, and different cited sources, sometimes minutes apart. Research on AI search visibility, including an academic paper bluntly titled to warn against measuring once, found that single readings are close to meaningless. Reliable tracking needs the same prompt run many times, and aggregating five to ten observations per query gives a far steadier estimate of whether you genuinely appear or just appeared once by chance. A tool that runs each prompt a single time is reporting a coin flip.

The second consequence is sample design. A visibility score is only a score against the specific list of prompts the tool was given. Pick the wrong questions, or too few, and the number measures performance on that test set rather than anything about the real market. Prompt selection is the methodology, and a vague or lazy prompt list quietly produces a confident but hollow figure.

The third consequence is the gap between the test and the real world. Most of these tools query the assistants through APIs or by automating the chat interface. Neither is identical to what your actual customers experience. The consumer apps wrap the model in extra instructions, personalization, memory of past chats, and location signals that a clean automated prompt does not carry. So the tool measures what an assistant tends to say to a generic questioner, which is a useful proxy, not a recording of what it said to your buyer.

That is the honest line between measurable and estimated. What is genuinely measurable: whether your brand tends to appear for a defined set of questions, roughly how often, who shows up alongside you, what sources get cited, and the tone of the description, all as patterns across many runs. What is estimated or modeled: any single "visibility score" as a precise number, and especially any claim to convert that into traffic or revenue. Some practitioners are sharp about this, calling current visibility scores closer to lottery tickets than measurements. The defensible use is directional. The category measures tendencies and trends well, and exact figures badly.

How a commerce team should use it

Treat answer-engine analytics as a compass, not a speedometer. It is good at telling you which direction to walk and bad at telling you your exact speed. Used that way, it earns its place.

Start with a baseline and a competitive set. Run a solid, representative list of category prompts across the assistants, repeated enough times to be stable, and read the pattern. You will usually find you are strong on some question types and absent on others, and that the picture differs by assistant. That uneven map is the actual finding. It tells you where the problem is.

Then move to diagnosis, which is the genuinely valuable output. The citation data shows which sources the assistant trusts for your category. If a particular review site or forum thread keeps getting cited and you are absent from it, that is a concrete, fixable gap. If competitors are described with attributes you would want, that tells you how the machine is positioning the category.

Most of the fixes then land on the product feed and the catalog, not on the dashboard. AI shopping recommendations lean heavily on structured product data: clean titles, accurate pricing, in-stock status, complete attributes, valid identifiers like GTINs, and review and rating fields. There is also strong overlap with existing commerce plumbing. A March 2026 analysis of around 43,000 carousel products, reported in Search Engine Land, found that roughly 83 percent of the products ChatGPT recommended in its shopping carousels matched Google Shopping's top 40 organic listings. Read that carefully and the practical message is encouraging: for shopping queries, getting your feed clean and your Google Merchant Center data healthy does a large part of the AI-visibility job at the same time. The exotic new channel is, for now, partly fed by infrastructure many merchants already run.

So the workflow is a loop. Measure the pattern, diagnose the gaps from citations and competitive share, fix the feed and the catalog and the content the assistants cite, then measure again to see if the pattern moved. Judge it over weeks, on trend lines, never on a single reading. And keep the spending honest: a brand testing the water can use entry tools in the low hundreds of dollars a month, while enterprise tiers run into the low thousands. Match the cost to how much of your discovery is genuinely shifting into AI before signing an enterprise contract on hype.

Where this is heading

Two forces will shape the next couple of years for this category.

The first is consolidation. A field with this many similar vendors does not stay fragmented. Adobe's purchase of Semrush is the early signal, and AI-visibility measurement is on a path to becoming a standard module inside analytics and marketing suites rather than a fleet of standalone dashboards. For buyers, that argues against deep, long lock-in to any single specialist tool while the ground is still moving.

The second is the measurement problem itself, which gets harder before it gets easier. As assistants personalize more aggressively and as buying completes inside the chat through agentic checkout, the gap widens between what a generic sampled prompt sees and what a specific shopper experiences. The most useful future signal would tie AI visibility to actual outcomes, and a few vendors are reaching for closed-loop attribution, but honest revenue attribution from a channel with no clickstream remains genuinely unsolved.

The realistic destination is not a perfect dashboard. It is a recognized, permanently approximate discipline, run alongside web and search analytics rather than replacing them, treated like brand or survey research: directionally true, never precise to the decimal. The merchants who get value from it will be the ones who hold both ideas at once. The question, do AI assistants recommend my products, is now too commercially important to leave unmeasured. And the answer, however good the tool, will always be an estimate. Use it for direction, fix the catalog and the feed underneath it, and do not mistake a modeled score for the truth.

Council summary

This post argues that AI product recommendations have opened a real measurement gap, that a new category of answer-engine analytics exists to close it, and that the category is useful only if a commerce team understands its method and its limits. The council verified the load-bearing facts: Profound's 35 million dollar Series B led by Sequoia in August 2025, total funding of 58.5 million dollars, and the November 2025 launch of its Shopping Analysis product; Adobe's completed Semrush acquisition in April 2026; the roughly 70 percent of AI traffic misclassified as direct in a study of about 446,000 visits; the arXiv paper "Don't Measure Once"; and the Search Engine Land finding that around 83 percent of ChatGPT carousel products matched Google Shopping's top 40 organic listings. Three sources were added to support claims already in the text. The reader takeaway holds: treat these tools as a compass, judge them on trend lines rather than single readings, and fix the product feed and the cited content underneath the score.

Answer-Engine Analytics: Does AI Recommend Your Products?

Why traditional analytics went blind

What answer-engine analytics actually measures

How they sample, and why the method matters

How a commerce team should use it

Where this is heading

Council summary

Comments

Leave a comment

Why traditional analytics went blind

What answer-engine analytics actually measures

How they sample, and why the method matters

How a commerce team should use it

Where this is heading

Council summary

Comments

Leave a comment

Agentic programming security: the fundamentals most teams skip

Privacy best practices for agentic AI: a consultant's checklist

AI agent governance: the framework most teams build too late