For about two years, the deal between AI companies and the people who make content was simple and one-sided. The AI companies took the content. They crawled it, trained on it, summarized it back to users, and paid nothing. Content was a free input, the way air is, and nobody writes a check for air.

That arrangement is ending, not because the AI companies had a change of heart but because the supply got organized. A market is forming around content that machines need, with prices, intermediaries, contracts, and usage meters. Original content is becoming something it has never quite been: an asset that pays a recurring fee every time a machine uses it. This post is about how that market took shape, what it looks like in 2026, and what a content team should do about it.

How content became free in the first place

To see why a market is forming now, it helps to remember why there was no market for so long.

The web was built on an implicit bargain. A search engine crawled your pages for free, and in exchange it sent you readers. You gave up the content; you got the traffic. Both sides understood the trade, and for twenty years it held.

Generative AI broke the bargain by keeping the first half and dropping the second. An AI answer engine still crawls your content, but instead of sending a reader to your page, it digests your content into an answer and keeps the reader on its own platform. Training is even more lopsided: a model reads your entire archive once, learns from it permanently, and never crawls it again or sends anyone your way. Cloudflare put hard numbers on the imbalance for July 2025. Its crawl-to-refer data showed Google's crawlers visiting websites about 5 times for every referral sent back, OpenAI's crawler about 1,100 times per referral, and Anthropic's about 38,000 times per referral. The content was flowing one direction and almost nothing was coming back.

The first crack was a single deal. In July 2023, the Associated Press became the first major publisher to license its archive to a large AI company, granting OpenAI access to its text going back to 1985. Terms were not disclosed, but one clause is worth remembering: the AP secured a first-mover safeguard, the right to reset to better terms if OpenAI later paid another publisher more. The AP understood, even then, that it was pricing a thing nobody had priced before.

The bilateral deal era, and why it stalled

What followed was two years of one-off agreements. Axel Springer signed with OpenAI in December 2023, a roughly three year deal covering Politico, Business Insider, and its German titles Bild and Welt. In May 2024 News Corp signed what was reported as a five year deal with OpenAI worth up to 250 million dollars in cash and credits, the largest content licensing agreement in publishing to date. Reddit licensed its forum posts to Google for a reported 60 million dollars a year, then signed a deal with OpenAI too. Dotdash Meredith, Vox Media, the Financial Times, Condé Nast, and many others lined up behind them. By 2025 the running tally ran to dozens of named agreements.

This was real money, and for large publishers it mattered. But the bilateral model had a structural ceiling that became obvious fast.

It only worked if you were big. An AI company will negotiate a custom contract with The Atlantic. It will not negotiate one with a regional newspaper or an independent newsletter, because the legal and relationship cost of a bespoke deal swamps the value of any single mid-sized archive. The result was a market open to a few hundred publishers and closed to the millions of sites and creators below them.

It was also opaque and lumpy. Most deals were flat fees or fixed-term arrangements. A publisher got a number, signed, and had little visibility into how much its content was actually used or whether the price was fair. Running underneath all of it was an unresolved legal question. The New York Times sued OpenAI and Microsoft in December 2023, and in March 2025 a federal judge allowed the core copyright claims to proceed toward trial. Until courts decide whether training on unlicensed content is fair use, every bilateral deal is partly a bet on a verdict nobody has.

A few hundred custom contracts and a pending lawsuit is not a market. It is a negotiation between giants. The real development of late 2025 and 2026 is the machinery being built so everyone else can sell too.

The shift to marketplace infrastructure

A real market needs three things the bilateral era lacked: a way for small sellers to participate without bespoke contracts, a standard way to express what is and is not for sale, and a meter so payment tracks actual use. All three arrived in roughly eighteen months.

Start with the meter and the gate. In July 2025, Cloudflare, which sits in front of a large share of the web, did two things at once. It began blocking AI crawlers by default for new domains, flipping the web from opt-out to opt-in, and it launched Pay Per Crawl, which lets a site charge a crawler a set price per page and have Cloudflare collect and remit the money. A site no longer faced a binary of allow or block. It could name a price.

Then came the standard. In September 2025, a group led by RSS co-creator Eckart Walther and former Ask.com chief Doug Leeds introduced Really Simple Licensing, or RSL, an open standard that lets any publisher attach machine-readable licensing terms to a site. RSL can specify free use, subscription, pay-per-crawl, or pay-per-inference, that last one meaning the publisher gets paid whenever an AI actually uses the content in an answer rather than merely fetching it. The 1.0 spec, finalized in December 2025, added usage categories that let a publisher allow ordinary search indexing while charging for AI use. Reddit, Yahoo, Medium, O'Reilly Media, and Ziff Davis were among the early backers. The honest caveat: as of early 2026, no major model developer had publicly committed to honor RSL, so the standard is, for now, a stake in the ground rather than an enforced contract.

The third piece is the intermediary, the broker that aggregates supply and matches it to demand. This is where the marketplace takes form, and three names matter.

ProRata, founded by advertising veteran Bill Gross, built its model around attribution. It runs an answer engine, Gist, trained only on licensed content, and its patented systems measure how much each source contributed to a given answer so compensation can be split proportionally. The economics are blunt: ProRata keeps 50 percent of the revenue an answer generates and distributes the other 50 percent across the publishers whose content was used, in proportion to use. By mid-2025 it had signed more than 500 publications, and it raised a 40 million dollar Series B in September 2025 to launch Gist Answers, a widget that lets any publisher put a licensed AI search box on its own site and earn from it. The pitch: a small publisher joins one platform instead of negotiating with every AI company alive.

Perplexity took a subscriber-revenue route. In August 2025 it launched Comet Plus, a 5 dollar a month tier, and committed to paying publishers 80 percent of the revenue, keeping 20 percent for compute. It seeded the pool with 42.5 million dollars and pays out across three tracked categories: human visits to publisher content, citations in answers, and actions its assistant takes using publisher material. That third category is a quiet signal of where this is heading. It is payment for an agent's use, not a person's.

Then the largest platforms moved. On February 3, 2026, Microsoft Advertising launched the Publisher Content Marketplace, built with the Associated Press, Business Insider, Condé Nast, Hearst, USA Today, and Vox Media. Publishers list premium content and set their own usage terms; AI builders license it to ground specific answers, with usage-based reporting so a publisher sees how its content performs and is paid by delivered value rather than a flat fee. Within a week, reports surfaced that Amazon was circulating its own plans for a content marketplace tied to AWS. When Microsoft and Amazon both build the same infrastructure in the same month, the category has stopped being speculative.

Per-use and the RAG revenue model

The most important shift inside all of this is not who is building marketplaces. It is the change in what publishers are paid for.

The bilateral era paid for training. An AI company wanted your archive to make its model smarter, paid once, and trained. That payment is inherently one-time, because a model only needs to learn a fact once.

The marketplace era is increasingly paying for retrieval, and retrieval is recurring. Modern AI answers do not rely only on what a model memorized in training. They use retrieval-augmented generation, or RAG: when you ask a question, the system fetches relevant, current source material and grounds its answer in that content. Microsoft's marketplace exists precisely because grounding an answer in licensed premium content beats relying on training memory alone.

The economic consequence is large. Every RAG answer that pulls in your content is a fresh use. If a publisher is paid per use, a single well-written explainer can be retrieved and cited thousands of times, each retrieval carrying a small payment. Training money is a sale: paid once, gone. Per-use money is a license: paid every time, indefinitely. RSL's pay-per-inference category, Perplexity's citation and agent-action tracking, and Microsoft's usage-based reporting are all the same idea in different clothes. Content stops being a one-time asset you sell and becomes a metered asset you rent out, answer by answer.

This also reshapes what content commands a price. Training rewarded sheer volume, a big archive to bulk up a model. Retrieval rewards freshness, because a question about today needs today's content and a model's training data is months stale. It rewards specialized depth, because an AI answering a medical, legal, or financial question needs authoritative, accurate sourcing it cannot safely improvise. It rewards proprietary data that exists nowhere else, so it cannot be substituted. A general explainer of a well-known topic, the kind the open web already has ten thousand versions of, has little retrieval value, because a model can reproduce it unaided. The content that earns here is the content a machine cannot generate on its own and needs to fetch.

What this means for content strategy, honestly

It is tempting to read all of this as rescue, a new revenue stream arriving just as search traffic collapses. Be careful with that reading. The honest picture has hard edges.

The money is real but, for most publishers, modest. By early 2026 some publishers were reporting AI licensing as a notable line in earnings, and USA Today's parent reported meaningful revenue from such deals, helped by agreements with Meta and Microsoft. But analysts are openly split, and a credible skeptical case holds that licensing income will stay small for the typical publisher, swamped by the traffic it is losing. Publishers themselves are unimpressed. In a late 2025 Digiday scorecard that rated platforms on willingness to pay, transparency, traffic impact, and crawler behavior, no AI company earned a great grade. Microsoft led at 8 out of 10, praised for collaboration. Google sat near the bottom at 2 out of 10, marked down for the traffic its AI Overviews siphon away and opaque economics. Perplexity tied it at 2 out of 10 over aggressive scraping and thin payouts. Marketplace payouts also depend on platform adoption that is still early. A 50 or 80 percent share of a small revenue pool is still small.

There is also a real tension worth naming. A marketplace is run by a platform, and the platform sets the terms, the rates, and the reporting you are allowed to see. Publishers spent twenty years dependent on Google for traffic and learned what that dependence costs. A content economy intermediated by Microsoft, Amazon, Cloudflare, and ProRata is a different set of gatekeepers, not the absence of gatekeepers. Independence here means using open standards like RSL and avoiding a single channel.

What a content team should take from this is concrete. First, treat your content as licensable inventory and decide its terms deliberately rather than by default. An unconfigured site is still free crawling. Setting an AI usage policy, through robots directives, RSL tags, or an infrastructure provider, is now basic business hygiene, not an advanced move. Second, recognize that the value has moved to what is hard to retrieve elsewhere. The same proprietary data, first-hand testing, and genuine expertise that earn citations in AI answers are exactly what command a price in a licensing market. Third, watch the agent line. Perplexity already pays for an assistant's use of content, distinct from a human's. As autonomous agents do more of the consuming, content will increasingly be priced for machine readers, and structuring it to be cleanly retrievable, well-labeled, current, and chunked becomes a revenue decision rather than only a craft one.

The larger shift is a change in what content fundamentally is. For two decades it was bait: you published it to attract a human who might see an ad or buy a product, and the content itself was a cost. In the market now forming, the content is the product. A machine needs it, a meter counts it, and a fee follows. That does not make the old model disappear, but it adds a second way content earns, and for the first time the value of a well-made article is no longer entirely dependent on whether a human ever clicks it.

Council summary

This post argues that content licensing has moved from a handful of bespoke deals between giants to a real market with meters, brokers, and standards, and that the decisive shift is from one-time training payments to recurring per-use payments for retrieval. Every named deal, marketplace, and figure was checked against primary sources: the AP and OpenAI agreement of July 2023, the News Corp deal worth up to 250 million dollars over five years, ProRata's 500-plus publications and 40 million dollar Series B, Perplexity's Comet Plus and its 42.5 million dollar pool, the RSL standard, and Microsoft's marketplace launched February 3, 2026. The review corrected the Digiday scorecard, which rates platforms on an aggregate of willingness to pay, transparency, traffic impact, and crawler behavior rather than transparency alone, and fixed the stated reasons behind Google's and Perplexity's low scores. The reader takeaway is concrete: set an AI usage policy deliberately, because an unconfigured site is still free, and invest in the proprietary, fresh, specialized content a model cannot generate on its own, since that is what both earns citations and commands a licensing price.

Content Licensing Marketplace: Your Words Are Now Inventory

How content became free in the first place

The bilateral deal era, and why it stalled

The shift to marketplace infrastructure

Per-use and the RAG revenue model

What this means for content strategy, honestly

Council summary

Comments

Leave a comment

How content became free in the first place

The bilateral deal era, and why it stalled

The shift to marketplace infrastructure

Per-use and the RAG revenue model

What this means for content strategy, honestly

Council summary

Comments

Leave a comment

Agentic programming security: the fundamentals most teams skip

Privacy best practices for agentic AI: a consultant's checklist

AI agent governance: the framework most teams build too late