Ask a marketing team which AI image generator they use and the answer used to be one word. Midjourney. Or DALL-E. Or Firefly. One subscription, one prompt box, one place to go.
Sit with a team that ships real campaign work in 2026 and the answer is different. They will name three tools, sometimes four, and they will tell you exactly which job each one does. One generates the hero shot. A second one is the only tool allowed near anything with a deadline and a legal sign-off. A third gets called in the moment the image needs words on it, because the others cannot spell.
This is not indecision and it is not tool-hoarding. It is the rational response to a market that broke into pieces. The thing people still call "AI image generation" is actually four separate problems that no single model solves at once. This post is about what those four problems are, why the tradeoffs between them are real and unlikely to collapse soon, and how a marketing team should structure an image workflow that uses the right tool for each one.
How the field split
For the first stretch of generative imaging, picking a model really was a single decision, because the models were close enough that quality was the only axis that mattered.
The modern era starts in 2022. OpenAI's DALL-E 2 arrived that April, Midjourney opened its Discord beta in July, and Stable Diffusion shipped as an open model in August. For about two years after that, the public conversation was a straight quality race. Which one makes the most convincing photo. Which one renders hands without extra fingers. People compared output grids and ranked them, and the ranking was the whole story.
Two things ended that. The first was lawsuits. In January 2023 Getty Images sued Stability AI, alleging its copyrighted photos had been scraped to train Stable Diffusion. In June 2025 Disney, NBCUniversal, and DreamWorks jointly sued Midjourney, calling it a "bottomless pit of plagiarism" and pointing at generated images of Darth Vader, Shrek, and Homer Simpson. Suddenly the training data behind a model was not a technical detail. It was a question a brand's legal team would ask before approving a campaign.
The second was specialization. As more models entered the market, they stopped converging and started diverging. One lab optimized hard for photorealism. Another built its whole pitch around legible text inside the image. A third trained only on content it had licensed, and sold safety rather than flash. By 2026 the models are not racing down one track anymore. They are running different races. That is why the single-tool answer stopped being correct.
The four problems that pull in different directions
A marketing image has to clear four bars at once, and the awkward fact is that each bar has a different best-in-class model. Understanding why they conflict is the whole point.
Commercial safety. Most image models were trained on enormous sets of images scraped from the open web, copyrighted material included. When you generate from one of those models and put the result in a paid campaign, the legal exposure if an output resembles someone's protected work sits with you, the user. There is one structural exception. Adobe Firefly is trained on Adobe Stock content, openly licensed material, and public-domain imagery, and Adobe pairs that with contractual IP indemnification: on qualifying paid plans, Adobe says it will defend you against infringement claims arising from eligible Firefly output and cover damages. That promise is not unlimited. It only applies to content generated inside Adobe's own apps, not through third-party API wrappers, and the per-asset cover is capped, with reporting putting it at up to 10,000 dollars per asset on volume plans and far higher on enterprise agreements. Firefly's record is not spotless either: MarTech reported in 2024 that a slice of its training set included AI-generated images from other models. But the basic split holds. Firefly is the model built to be safe to ship. The others ask you to carry the risk yourself.
Raw quality. This is the original axis and it has its own winner. Midjourney, released as V7 in April 2025 by the independent lab founded by David Holtz, is still the model practitioners reach for when the image has to look genuinely beautiful: lighting, composition, texture, a sense of aesthetic taste rather than competent rendering. Black Forest Labs, the startup formed by the researchers behind the original Stable Diffusion, competes hard here too. Its Flux models lead on photorealistic detail and skin texture, and the company raised 300 million dollars in December 2025 at a 3.25 billion dollar valuation, which tells you how much money believes image quality is still a distinct prize. Neither Midjourney nor Flux offers Firefly-style indemnification. The best-looking image and the safest-to-ship image come from different companies.
Text inside the image. For years, putting words into an AI image meant accepting garbled, melting, almost-letters. Then the autoregressive models changed it. OpenAI's GPT image generation, built into ChatGPT and exposed to developers as the API model gpt-image-1 in April 2025, was a clear jump in rendering readable text. Google then pushed past it. Nano Banana Pro, the image model built on Gemini 3 Pro and released in November 2025, is widely judged the best available tool for generating correct, legible, stylized text directly inside an image, including across multiple languages. If your asset is a social post with a headline, a promo graphic with a price, or an infographic, this is a hard requirement, and the model that meets it is not the model that wins on artistic quality.
Editing control and consistency. Generating one striking image is the easy part. A campaign needs the same character, the same product, and the same style across twenty images, plus the ability to change one element without regenerating the whole frame. This is an editing problem, not a generation problem. Google's Nano Banana built its reputation on holding a person's or product's identity steady across edits. Tools like Recraft, the design-focused platform from a team led by former Yandex machine-learning scientist Anna Veronika Dorogush, add brand-color conditioning, layout control, and editable vector output, which raster generators do not give you. And the most important editor in the workflow is still the oldest one: Adobe Photoshop, where a human composites, corrects, and finishes. Consistency lives in a different part of the toolchain from the first generation.
Line those four up and the conflict is obvious. The safe model is not the prettiest. The prettiest cannot spell. The one that can spell is not built for brand-consistent editing. No single subscription clears all four bars, and that is not a temporary gap waiting for one model to close it.
Why one model will not win them all soon
The natural hope is that this fragmentation is just an awkward phase, and that some 2027 model will be best at everything so teams can go back to one tool. It is worth being honest about why that is unlikely.
The deepest reason is the training data, and it is a business decision, not an engineering one. Firefly is commercially safe because Adobe chose to train only on content it had the rights to use. That choice has a cost: a fully licensed dataset is smaller and narrower than the entire scraped web, which puts a ceiling on how varied and how cutting-edge the output can be. Midjourney and Flux push raw quality precisely because they trained on far more, with the legal exposure that comes with it. A model cannot have the safety of a small licensed set and the quality ceiling of the entire internet at the same time. The Disney and Getty cases make the licensed path more attractive over time, but they do not erase the tradeoff. They sharpen it.
The other reasons are structural. These models come from companies with different DNA. Adobe sells to enterprises and cares about indemnification and governance. OpenAI and Google are building general systems where images are one capability among many. Midjourney is an independent lab optimizing for a creative community. Recraft is built for designers. Each is genuinely good at what its business is built around, and a tool shaped by an enterprise legal mindset will not out-design a tool shaped by an artistic one. On top of that the field moves fast enough that today's text-rendering leader is a release or two from being overtaken, which is itself an argument against betting everything on one name. Fragmentation is the steady state, not the glitch.
How a marketing team should actually run this
If three or four tools is the rational setup, the job is to make a multi-tool workflow that is organized rather than chaotic. A few principles do most of the work.
Route by job, not by habit. Decide in advance which tool owns which task and write it down. A workable default: Firefly for anything client-facing or legally sensitive where indemnification matters; Midjourney or Flux for hero images and concept work where look is everything and the asset will be reviewed; a text-strong model like Nano Banana Pro for any asset with words baked into it; Photoshop, plus an editing-focused tool, for compositing, consistency, and the final pass. The point is that the decision is made once, by policy, not re-argued every time.
Make commercial safety a checkpoint, not an afterthought. The most expensive mistake is generating a campaign in whichever tool was open, then discovering at sign-off that nothing carries indemnification. Treat the safety question the way you treat brand and legal review: a gate the work passes through. For paid media and anything with regulatory exposure, the licensed-and-indemnified path is the default, and a more permissive tool needs a reason to be used instead.
Know what indemnification does not cover. Adobe's protection applies to output generated inside Adobe's apps on qualifying plans, not to the free tier, not to beta features, and not through third-party API integrations. If your team reaches Firefly through some other platform, the legal shield may not travel with it. Read the terms of the path you actually use, not the headline.
Keep the workflow tool-agnostic. Your brand rules, your style references, your prompt intent, and your review steps should live in your process and your documentation, not inside one vendor's app. The models will keep changing. A team that has written down what a good image needs can swap whichever model leads a category this quarter without rebuilding anything. A team whose entire method lives inside one subscription has to start over every time the market moves.
Budget for a stack, not a seat. Several specialist subscriptions cost more than one all-purpose tool, and credit-based pricing makes the real spend hard to see from the sticker price. But the alternative is shipping work that is weaker than a competitor's or legally exposed. The honest line item is a small portfolio of tools, priced by the output you actually finish, not by the cheapest monthly plan.
The shift this really reflects
Step back and the three-tool workflow is a sign of something larger. AI image generation has stopped being a single product category and become a supply chain. The generation model is one supplier. The text-rendering model is another. The editor is a third. The value a marketing team adds is no longer knowing the prompt tricks for one tool. It is knowing which supplier to send each job to, where the commercial risk sits, and how to assemble the pieces into work that looks deliberate.
That is the durable skill, because it survives the next model release and the one after that. The teams that look slow in two years will be the ones still hunting for the single best generator. The teams that look sharp will be the ones who accepted early that there is no such thing, built a routed workflow around the real tradeoffs, and treated safety, quality, text, and control as four jobs for four tools rather than one wish for one.
Council summary
This post argues that AI image generation has fractured into four jobs that no single model wins: commercial safety, raw quality, in-image text, and editing consistency. The council verified every load-bearing claim: Midjourney V7 (April 2025), the June 2025 Disney, NBCUniversal, and DreamWorks suit against Midjourney, Black Forest Labs raising 300 million dollars at a 3.25 billion dollar valuation in December 2025, gpt-image-1 (April 2025), Nano Banana Pro on Gemini 3 Pro (November 2025), the November 2025 Getty v Stability AI UK ruling, and Adobe Firefly's training data and indemnification terms, including the per-asset cap and its limits. The key correction kept the indemnification language precise: Adobe's defense applies only to output made inside Adobe's own apps on qualifying paid plans, not free tiers or third-party API wrappers. The reader takeaway is concrete: route each image job to the tool built for it, gate commercial safety at sign-off, and keep your brand rules outside any one vendor's app.
Comments