Type "warm jacket for a winter trip to Norway" into most storefront search boxes and watch what happens. The box does not understand any of it. It splits your sentence into tokens, throws away the small words, and looks for products whose text contains "warm," "jacket," "winter," "trip," and "Norway." Nothing in the catalog says Norway. So you get a thin set of results, or a "0 results" page, or a wall of items that happen to share a word. The shopper knew exactly what they wanted. The search box could not hear it.

That box is being replaced. Not redesigned, replaced, by something that reads the sentence as a sentence, works out what the person is trying to do, retrieves products that fit the intent rather than the keywords, and increasingly answers back in plain language with a short recommendation. The industry has a name for the destination: a discovery agent. This post explains how the old search box worked and why it failed so often, what semantic and vector search changed, what a retrieval-grounded assistant does differently, and the one thing that decides whether the new system helps a shopper or confidently lies to them.

Origin: a box that matched strings, not meaning

On-site search began as a database problem. Early ecommerce platforms treated the catalog as rows in a database and the search box as a way to filter them. You typed a word, the system ran the equivalent of a text match, and you got the rows that contained it. As catalogs grew, retailers moved to dedicated search engines. Oracle's Endeca was the enterprise standard for years, and then the open-source Apache projects Lucene, Solr, and later Elasticsearch took over most of the market because they were free, fast, and scalable.

These engines were a real improvement, but they shared one foundation: they matched strings. Under the hood sits an inverted index, a giant lookup table from each word to the products that contain it. Search "running shoes" and the engine intersects the list of products containing "running" with the list containing "shoes." It is fast and it scales to millions of items. It also has no idea what a running shoe is.

That single limitation produced a long list of everyday failures. If the catalog called an item a "sneaker" and the shopper searched "trainers," the lists did not intersect and the shopper got nothing. Misspell one character and the match broke. Search by a feature the copy did not mention, by a use case, by a symptom the product solves, and the index had no row to return. Teams patched around this for two decades. They built synonym dictionaries by hand, added spell-correction and stemming so "running" would also match "run," and tuned relevance weights so a word in the title counted more than the same word in a review. Every patch helped. None changed the fact that the engine was matching the shopper's vocabulary against the merchant's vocabulary and praying the two lined up.

The praying mostly failed. Baymard Institute, which has run e-commerce search usability testing for years, found in its benchmark that 41 percent of sites cannot fully support the query types shoppers actually use, and that among the 50 largest e-commerce sites only 34 percent return useful results for feature, symptom, or relational queries. Around 70 percent could not handle product-type synonyms. In Baymard's testing, 31 percent of product-finding tasks ended in failure when the subject tried to use search. This is not a story about small or careless retailers. It is the default behaviour of the technology.

The cost of that default is large, because search users are the buyers. Research commissioned by Google Cloud found that 69 percent of shoppers go straight to the search bar on a retail site, and that around 8 in 10 say they are more likely to leave and buy elsewhere after an unsuccessful search. A survey by the search vendor Nosto put the same point in revenue terms: ecommerce leaders attributed roughly 39 percent of their bounce rate to inadequate search. Industry analysis has long held that searchers can account for around 40 percent of revenue on some sites. A keyword box that fails a third of the time is failing the highest-intent traffic on the site.

Present: from matching strings to matching meaning

The fix had to come from outside the inverted index: stop matching words and start matching meaning. The mechanism is the embedding. An embedding model is a neural network that reads a piece of text, a product title, a description, a shopper's query, and outputs a vector, a long list of numbers. The training arranges those numbers so that text with similar meaning lands in a similar place. "Sneakers" and "trainers" end up close together. "Warm jacket" sits near a parka the copy describes only as "insulated." The vector is a coordinate for meaning rather than a string of characters: meaning becomes geometry.

Once products are embeddings, search becomes a geometry problem. The shopper's query is turned into a vector by the same model, and the engine looks for the product vectors sitting nearest to it. This is vector search, and it does something the inverted index structurally cannot. It returns a relevant parka for "warm jacket for Norway" even though the word "Norway" appears nowhere in the catalog, because the query's vector still lands in the right neighbourhood. The synonym dictionary that a team maintained by hand for years is replaced by a model that simply knows synonyms are close together. Semantic search, the broader term, is this idea applied through natural language understanding so the system reads intent rather than tokens.

Vector search has its own weak spot, the mirror image of the keyword box. It is excellent at concepts and bad at exact strings. Search a precise model number, a SKU, a part code, and pure vector search may smear it into a fuzzy neighbourhood and miss the exact item. So the production answer is hybrid search: run the keyword index and vector search together, then merge the two ranked lists. Keyword search nails the exact codes. Vector search catches the concepts and the phrasing. Most serious ecommerce search now runs this way. Algolia, the search vendor named a Leader in the 2025 Gartner Magic Quadrant for Search and Product Discovery and processing more than 1.75 trillion queries a year, describes its own progression in exactly these stages: keyword search, then semantic search, then vector search, then hybrid, and now retrieval-augmented generation. That last stage is the real subject of this post.

Present: the search box starts to talk back

Hybrid search fixed retrieval. It still hands the shopper a grid of products and leaves the thinking to them. The next move adds a language model on top, and that is where the search box stops being a box.

The pattern is retrieval-augmented generation, RAG, the same architecture used to ground a chatbot in a company's documents. The mechanics matter, because the precision is the whole point. The shopper asks a question in natural language. The system retrieves the genuinely relevant products and content from the catalog using hybrid search, then passes those items to a language model along with the question, and the model writes an answer grounded in what was retrieved. "Grounded" is the load-bearing word. The model is not answering from its training. It is answering from the merchant's live catalog, handed to it a moment ago.

That changes the output. Instead of a grid, the shopper can get a short, written response: these three jackets suit a Norwegian winter, here is how they differ on warmth rating and price, this one is in stock in your size. It can compare, explain a trade-off, or generate a buyer's guide on the spot for a shopper who does not yet know the vocabulary of the category. The interaction becomes a conversation, where a follow-up like "something lighter" is understood in the context of what came before, not a fresh keyword query starting from nothing.

The newest step adds reasoning, and this is what vendors mean by a discovery agent. It does not run one retrieval. It plans. Faced with "sustainable running shoes under 150 dollars for flat feet," it decomposes the request into separate searches, sustainability, price, running category, support for flat feet, runs them in parallel, and synthesises the results into one grounded recommendation. Algolia frames the shift as moving search from "find matching products" to understanding intent and planning the right steps. It is the difference between a librarian who points at a shelf and one who asks two questions and hands you the right book.

This is not a roadmap promise. It is shipping. Coveo launched Conversational Product Discovery in March 2026, built on what it calls an agentic orchestration architecture, with its commerce lead making the blunt observation that shoppers do not think in keywords. Constructor, another Leader-tier vendor, ships consumer-facing AI shopping agents and reported 82 percent customer growth in its 2026 fiscal year. Bloomreach has made its shopping agent Clarity generally available and extended it across the journey, including conversations triggered straight from the search bar. Lucidworks shipped a conversational question-answering agent in May 2026 aimed at technical products, drawing answers strictly from spec sheets and manuals. The keyword box, in other words, is being deprecated by the companies that built keyword boxes.

Future and impact: grounding is the entire game

Here is the part most coverage skips. A discovery agent is a language model wired to your catalog, and a language model, left alone, makes things up. It will describe a feature a product does not have, invent a certification, state that a discontinued item is in stock, or fabricate a glowing customer sentiment. In a chatbot that is embarrassing. On a product page it is a returns problem, a trust problem, and in regulated categories a legal problem.

RAG is the defence, because grounding the model in retrieved catalog data pulls the answer back to fact. But grounding only works if the thing it grounds against is correct and complete. The new system is only as good as the product data underneath it. As Algolia puts it, agentic search does not compensate for poor indexing, missing content, or misaligned business logic. Thin descriptions, inconsistent taxonomy, missing attributes, stale inventory and pricing, all of it degrades retrieval, and when retrieval is weak the model fills the gap with invention. A discovery agent does not fix bad product data. It amplifies it, and broadcasts the result in fluent, confident prose.

That reframes what the work actually is. For two decades, "improving site search" meant tuning the engine: better synonyms, better relevance weights. In the discovery-agent era the highest-leverage work moves upstream, into the catalog itself. The structured, attribute-rich product information a product information management system holds becomes the raw material the agent reasons over. Descriptions need real detail. Attributes need to be consistent and complete. Inventory and price need to be fresh, because an agent confidently recommending a sold-out item is worse than no answer. The merchant question shifts from "which search vendor" to "is our product data good enough for a machine to reason over and speak from." For most catalogs the honest answer today is not yet.

Be clear-eyed about the limits too. Grounding reduces hallucination; it does not eliminate it. A model can still misread an ambiguous query or over-summarise a nuanced spec. Retrieval can still miss. The mature build keeps guardrails: constrain the model to retrieved content, cite the products an answer is based on so a shopper and a merchant can check it, and keep merchandising rules in force so the agent respects margin and promotion logic. The vendors shipping this know it, which is why their marketing leans so hard on the word grounded.

There is a strategic reason to get this right now, beyond on-site conversion. The same shift is happening off the storefront entirely, as shoppers research inside ChatGPT, Gemini, Perplexity, and Amazon's Rufus. Those external assistants are discovery agents too, and they read the same product feed. A catalog clean enough to ground your own on-site agent is clean enough for an external one to understand and recommend. The data work is not two projects. It is one foundation, and it is the part a company cannot buy off a vendor's shelf.

The search box is changing job, from a string matcher that failed a third of the time into a discovery agent that reasons, recommends, and talks. That is a clear gain for the shopper. Whether it is a gain for a given merchant comes down to one unglamorous question: when the agent speaks, is it grounded in product data worth speaking from. Getting the catalog agent-ready before the agent goes live is exactly the kind of work an AI implementation partner like Perform Digital is built for.

Council summary

This post argues that on-site search is shifting from keyword matching to retrieval-grounded discovery agents, and that the deciding factor is product data quality, not vendor choice. Review confirmed every figure against primary sources: Baymard's benchmark numbers, the Google Cloud and Nosto search-abandonment research, Algolia's 1.75 trillion annual queries and 2025 Gartner Leader placement, and the 2026 launches from Coveo, Constructor, Bloomreach, and Lucidworks. We corrected the Bloomreach line, since Clarity launched before 2025 and the cited announcement only extended it. The reader takeaway is concrete: a discovery agent amplifies whatever catalog data it stands on, so the highest-leverage work moves upstream into descriptions, attributes, and live inventory before the agent ships.

Your Site Search Box Is Quietly Becoming a Discovery Agent

Origin: a box that matched strings, not meaning

Present: from matching strings to matching meaning

Present: the search box starts to talk back

Future and impact: grounding is the entire game

Council summary

Comments

Leave a comment

Origin: a box that matched strings, not meaning

Present: from matching strings to matching meaning

Present: the search box starts to talk back

Future and impact: grounding is the entire game

Council summary

Comments

Leave a comment

Agentic programming security: the fundamentals most teams skip

Privacy best practices for agentic AI: a consultant's checklist

AI agent governance: the framework most teams build too late