product information management

PIM: The Unglamorous System AI Needs to Sell Your Catalog

Product information management was back-office plumbing nobody discussed. Then the customer became a machine that reads your data instead of your photos.

Walk into any ecommerce replatforming meeting and listen for what gets the attention. The storefront design. The checkout flow. The headless front end. The new search experience. What almost never gets a slide of its own is the system that holds the actual product data: the titles, the descriptions, the dimensions, the materials, the compatibility notes, the hundred small attributes that describe what a thing is.

That system is the product information management platform, the PIM, and for two decades it was treated as plumbing. Useful, necessary, deeply boring, owned by the operations team and ignored by everyone else.

That era is ending, and not gently. For most of ecommerce history the reader of your product data was a human who could fill in the gaps. The reader is increasingly a machine that cannot. When an AI shopping assistant decides whether to recommend your product, it does not look at your photography or feel the pull of your brand. It reads your structured data, and if that data is thin, inconsistent, or wrong, the machine quietly moves on to a competitor whose data is better. The PIM just became the system that decides whether AI can sell your catalog at all.

What a PIM actually is

Strip away the vendor language and a PIM is a single, governed source of truth for product information.

A typical retailer or brand has product data scattered everywhere. Some lives in the ERP, which knows the SKU, the cost, and the case pack. Some lives in spreadsheets a merchandiser maintains by hand. Some lives in the heads of suppliers who email updated specs when they remember to. Some lives in the ecommerce platform, entered once and never reconciled. Marketing has its own version. None of these fully agree.

A PIM collects all of it into one place, cleans it, structures it, and becomes the authoritative copy. Inside the PIM, product data from suppliers and internal systems gets enriched, validated against rules, translated, and organized into a consistent model. Then it flows out to every channel that needs it: the website, marketplaces like Amazon and eBay, retail partners, the mobile app, and now AI assistants. Change a dimension once and the corrected value propagates everywhere, instead of being fixed in six places by six people, three of whom forget.

The key word is structured. A PIM does not just store a blob of marketing copy. It holds product information as discrete, defined attributes: a field for material, a field for voltage, a field for sleeve length, a field for whether the item is dishwasher safe. Each attribute has a type and rules. Modern PIM platforms support dozens of data types and effectively unlimited attributes per product, which lets them describe a complex catalog precisely rather than vaguely.

Where PIM came from

PIM is older than most of the systems around it, which makes its long stretch in the background easier to understand.

The need first appeared in the print catalog era. Through the 1980s, large retailers ran their businesses on thick mailed catalogs. The Sears "Big Book" was the emblem of the format, and it circulated for the last time in 1993, the same year the World Wide Web opened to the public. Producing a catalog of that size meant wrangling thousands of products, prices, and descriptions into one consistent document, with spreadsheets, early databases, and a lot of manual labor. That coordination problem, keeping one accurate description of every product, is what PIM was eventually built to solve.

When ecommerce arrived in the late 1990s, the problem multiplied. A retailer now had to publish product data to a website, a catalog, and a growing list of channels at once, each with its own format. The first true PIM systems formed in the late 1990s and early 2000s to do exactly that: store product data centrally and syndicate it out. Akeneo, one of the most widely used PIM products today, released its first public beta in September 2013, founded by a team out of the open source world. Salsify, its main rival, started a year earlier in Boston. For most of their existence these tools sold on a back-office promise: fewer errors, faster updates, less manual copy-paste. Worthy, unglamorous, and easy to defer.

Why messy data was survivable, and now is not

Here is the part that has genuinely changed, and the precise mechanism matters.

For a human shopper, imperfect product data was annoying but rarely fatal. Suppose a product page lists the title and price but leaves the material blank, or describes the color as "blue" in one place and "navy" in another, or buries the weight in a paragraph instead of a field. A person copes. They look at the photo, read the reviews, infer that a wool coat is probably warm. Human intelligence papers over the gaps. Thin catalogs cost some conversions and drove some returns, but the business survived them, which is exactly why so many catalogs stayed thin.

An AI agent does none of that papering over. When it evaluates products, it reads structured fields and reasons over them. If the material field is empty, the agent does not guess from the photo. It treats the attribute as unknown. If the same product is "blue" in the feed and "navy" on the page, the agent sees a conflict and trusts the source less. If a shopper asks for a machine washable jacket under a certain weight and your weight field is blank, your jacket is simply not in the answer. It was never rejected in a comparison. It was invisible to the comparison.

Merkle put the shift directly. Catalog readiness has stopped being about whether the fields are filled and become about whether an agent can reason over a product, compare it against rivals, swap in a substitute, bundle it, and recommend it with no human in the loop to interpret anything. Crystallize framed the same point as a bottleneck: the hard part of agentic commerce is not the AI, it is the data, and bad inputs produce hallucinated features and wrong recommendations on a garbage-in, garbage-out basis.

The tolerance that human shoppers extended to messy catalogs is gone. The machine reader is literal, and literal readers expose every gap.

What "structured and rich" actually means

If thin data is the disease, the cure is not just more words. It is structure and completeness, and both are specific things.

Structure means the information lives in defined, machine-readable attributes rather than in prose. "Made from 100% merino wool, weighs 340 grams, machine washable at 30 degrees" is fine for a human reading a paragraph. A machine wants material equals merino wool, weight equals 340 grams, wash temperature equals 30, machine washable equals true, each in its own field with a known type. Same facts, completely different usefulness to an agent that has to filter and compare against a query.

Rich means the attributes go beyond the bare minimum. An agent answering real shopping questions needs more than title, price, and a hero image. It needs compatibility relationships, so it knows this cartridge fits that printer. It needs use-case context, so it knows a tent is rated for three seasons. It needs variant-level specification, so the medium in black has its own accurate data, not the catalog average. Merkle's list of agent-facing requirements reads like a description of a well-modeled PIM record, because that is what it is.

Consistent means the same product says the same thing everywhere. The agent that finds your product through a feed and then cross-checks the page should see one coherent set of facts, not two. Inconsistency is not a cosmetic problem to a machine. It is a trust signal that lowers your ranking.

This is the work a PIM exists to do. The discipline of getting product data structured, rich, and consistent stopped being housekeeping and became the thing that determines machine visibility.

How the PIM feeds the machines: feeds and schema

A PIM does not talk to AI assistants directly. It feeds two layers that do, and understanding them shows why the PIM sits upstream of everything.

The first is the product feed. AI shopping platforms ingest catalogs as structured files, typically a flavor of CSV or JSON, with a defined schema. OpenAI's product feed for ChatGPT shopping sets required fields, and a product will not be listed at all without a price, an image, and an availability status. The feed can be refreshed as often as every fifteen minutes, which turns price and stock accuracy into a near-live obligation rather than a daily batch job. Google's Universal Commerce Protocol, launched at the National Retail Federation show on 11 January 2026 and co-developed with Shopify, extends the same pattern. A feed is only ever as good as its source. If the PIM holds clean, complete, attribute-rich records, the feed is strong. If the PIM is thin, no feed configuration rescues it.

The second is structured markup on the page itself, principally Schema.org Product data expressed as JSON-LD. This machine-readable layer lets crawlers and assistants extract accurate facts directly from a product page: name, description, price, availability, brand, SKU, ratings, variants. It tells a machine what a page contains without the machine interpreting the visual design. And schema, like the feed, is populated from product data. A page can only mark up the attributes that exist in the system behind it.

So the chain runs in one direction. The PIM holds the structured product knowledge. The feed and the schema carry it to the machines. The AI assistant reads them. Get the PIM right and both downstream layers have something real to carry. Get it wrong and you are optimizing the delivery of data that was never good enough. This is why practitioners increasingly describe the catalog, not the homepage, as the asset that matters most.

Why PIM is becoming strategic, not back-office

The clearest evidence that the PIM has moved up the stack is what the vendors are now doing and saying.

In February 2026, Akeneo announced a partnership with Stripe, connecting its PIM directly to Stripe's agentic commerce tooling so catalogs become discoverable by AI agents and can share product, price, and availability in real time. Akeneo's chief executive framed the rationale plainly: AI agents can only deliver reliable shopping experiences when they have access to centralized, enriched, well-governed product information. That is a PIM company describing its product as the precondition for agentic commerce, not as a data warehouse.

The same repositioning is happening across the category. In May 2026 Salsify launched SalsifyIQ, an intelligence layer aimed at agentic commerce, including a Model Context Protocol layer so a brand's own AI agents and third-party systems can securely reach its approved, authoritative product attributes. Inriver, in its spring 2026 release, added AI enrichment agents and explicit support for agentic commerce. Informatica has been pushing the idea of an agentic PIM where AI helps extract, classify, enrich, and validate product information. The whole vendor field has stopped selling efficiency and started selling readiness.

The market backdrop supports the shift. Grand View Research valued the global PIM market at around 11.49 billion dollars in 2023 and projects it to reach roughly 32.84 billion dollars by 2030, a compound annual growth rate near 16.7 percent. That is healthy growth for what used to be treated as infrastructure.

There is also a demand-side reason this matters now rather than later. Merkle's analysis, drawing on industry projections, expects a meaningful share of consumers, on the order of a quarter of US shoppers, and a larger share of B2B buyers, to use agents to complete online transactions by 2030. Yet readiness is thin. Surveys cited across the commerce industry in early 2026 suggest roughly a third of ecommerce businesses have not started any agent-readiness work and around another 40 percent are still standardizing their product pages. The gap between where buying behavior is heading and where most catalogs sit today runs straight through the PIM.

The PIM also has neighbors worth placing. A PIM governs product data. A customer data platform governs customer data. They are different systems with different jobs, and the most capable commerce operations run both, because an agent recommending the right product to the right person needs clean data on both sides of that sentence. The PIM is one half of the data foundation that agent-mediated commerce stands on.

What this means for a real catalog

The practical takeaway is not that every merchant must buy a PIM tomorrow. A small brand with a few dozen well-described SKUs can stay agent-ready with discipline alone. The point is about where the advantage now sits.

For a catalog of any real size, product data quality has quietly become a top-tier competitive variable, on the level of price and delivery speed, because it is now a precondition for being considered by the machines that mediate purchases. The work is unglamorous: auditing which attributes are missing, deciding on a consistent taxonomy, filling in compatibility and use-case data, enforcing that a color or a material means one thing across every channel, and keeping price and stock fresh enough for a fifteen-minute feed cycle. None of it produces a screenshot an executive enjoys.

But it is the difference between a catalog an AI agent can confidently recommend and one it cannot read. The storefront was the thing customers looked at. The PIM is the thing the machine reads, and as discovery moves toward agents, the system that holds your product data has stopped being plumbing and become the foundation.

Council summary

This post argues that the PIM, long dismissed as back-office plumbing, has become the system that decides whether AI shopping agents can find, compare, and recommend a catalog at all, because machine readers are literal where human shoppers were forgiving. The council verified every named figure: the Grand View Research market sizing of 11.49 billion dollars in 2023 rising to 32.84 billion dollars by 2030 at a 16.7 percent CAGR, Akeneo's September 2013 beta and February 2026 Stripe partnership, Salsify's 2012 Boston founding and May 2026 SalsifyIQ launch, Inriver's spring 2026 release, Google's Universal Commerce Protocol launched at NRF with Shopify, and the Merkle adoption projections. One accuracy fix was made: the OpenAI feed format list was softened to CSV and JSON because sources disagree on whether TSV and XML remain supported, while the verified required fields and fifteen-minute refresh were kept. The takeaway is concrete: for any catalog of real size, structured, rich, consistent product data is now a competitive variable on the level of price and delivery, and the unglamorous work of getting it right separates a recommendable catalog from an invisible one.

Comments

Leave a comment

Your email won't be published. Comments are reviewed before they appear.
★ Read next