A team has a model that answers customer questions well enough, except it does not know the company's return policy, the new pricing tiers, or the name of the product that shipped last month. Someone suggests fine-tuning. Take a few thousand support transcripts, train the model on them, and now the model knows the business. It sounds obvious. It is also the single most expensive misconception in applied AI, because fine-tuning is genuinely good at many things, and teaching a model new facts is not one of them.
The confusion is understandable. Fine-tuning means showing a pretrained model more examples and continuing to train, so it feels like teaching. But what fine-tuning actually moves and what you want it to move are two different quantities. Get that wrong and you do not just fail to add the facts. You can quietly damage the model you started with.
What fine-tuning actually changes
A pretrained language model is a very large set of numbers, the weights, arranged so that a stream of text produces a sensible next token. Those weights already encode an enormous amount: grammar, reasoning patterns, world knowledge, coding ability, the lot. That encoding came from pretraining, where the model saw a large slice of the internet, often more than a trillion tokens, and the weights settled into their current shape.
Fine-tuning continues that process on a much smaller, curated dataset. You run examples through the model, measure how wrong it is, and nudge the weights with gradient descent. The key word is nudge. A fine-tuning run might touch a few thousand or a few million examples. Pretraining touched trillions of tokens. You are adjusting a structure that was set by something thousands of times larger.
That scale gap is the whole story. A small nudge is excellent at adjusting how the model expresses what it already contains: the format it answers in, the tone it uses, the structure it reaches for by default. It is poor at inserting a genuinely new fact, because a fact has to be encoded somewhere specific in the weights, and a handful of gradient steps rarely carves a clean, retrievable slot for it without disturbing what sits nearby.
Why it is bad at facts
This is not a hunch. It is one of the better-studied results in the field. A 2024 paper from Google researchers, Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?, ran a controlled experiment: take fine-tuning examples, vary how many of them contain facts the model did not already know, and watch what happens. Two findings stand out. First, the model learns the unfamiliar examples much more slowly than the familiar ones, which tells you the architecture resists this kind of update. Second, and worse, as those new-knowledge examples are finally absorbed, they linearly increase the model's tendency to hallucinate. The paper's conclusion is blunt: models acquire factual knowledge through pretraining, and fine-tuning mostly teaches them to use that knowledge more efficiently.
The mechanism behind the hallucinations is worth understanding. When you fine-tune a model on a fact it does not hold, you are not just adding that fact. You are teaching it a behavior: produce a confident, specific answer in this shape. The model generalizes that behavior. Later, faced with a question it genuinely cannot answer, it reaches for the same confident shape and fills it with invention. You wanted to teach one fact. You taught a habit of fabrication. Follow-up research on factual knowledge extraction found that tuning on lesser-known facts is especially damaging, pushing the model to ignore the actual subject of a question and emit a generic plausible-sounding response instead.
Then there is catastrophic forgetting. Because fine-tuning moves the shared weights, training hard on a narrow new task degrades capabilities the model used to have. An empirical study of catastrophic forgetting during continual fine-tuning found the effect gets worse as model scale grows, and a 2024 EMNLP paper revisiting the problem confirmed it is a routine cost of tuning, not an edge case. Push too hard in one direction and the model gets worse at reasoning, at other domains, sometimes at basic instruction-following.
The most striking warning is emergent misalignment, published in Nature in 2025. Researchers fine-tuned GPT-4o on a narrow task, writing insecure code, with nothing else in the data. The result was a model that turned broadly misaligned across unrelated topics, giving malicious advice and expressing hostile views about humans. A narrow nudge produced a wide, unintended personality shift. That is the risk surface of fine-tuning in one experiment: you are editing a shared structure, and the edits do not stay where you put them.
What it is genuinely good at
None of this makes fine-tuning a bad tool. It makes it a precise one. Fine-tuning is the right answer when the thing you want to change is behavior, not knowledge.
A consistent output format is the clearest case. If you need the model to return the same JSON schema every time, with the same field names and no conversational preamble, fine-tuning bakes that habit into the weights far more reliably than a prompt that pleads for it. The model stops treating the format as a request and starts treating it as how it answers.
A domain tone is the second. A model can be taught the cadence of legal drafting, the restraint of clinical notes, the house style of a brand, the hedging a compliance team requires. Google's writeup of how hundreds of organizations actually tune Gemini puts style, tone, and format adherence near the top of real use cases, exactly because prompts are an unreliable way to hold them.
A narrow skill is the third. Classification, extraction, a specific transformation done thousands of times a day: fine-tuning on a few hundred to a few thousand clean examples can lift accuracy past what prompting reaches, and it lets you run a smaller, cheaper model for the task. The model is not learning new facts. It is learning a sharper version of a competence it already had.
OpenAI's own model optimization guidance draws the same line. It steers teams to fine-tune for tone, style, and reliably formatted output, and steers them to retrieval when the model needs information it does not hold. The split is not vendor preference. It follows from what each technique can physically do.
The mirror image of fine-tuning is retrieval-augmented generation, which leaves the weights untouched and instead fetches relevant documents at query time and places them in the context window. Knowledge that changes, knowledge you must cite, knowledge that has to be correct as of this morning: that belongs in retrieval, where you can update an index in seconds and inspect exactly which source drove an answer. A much-cited Microsoft study comparing knowledge injection methods found retrieval beat unsupervised fine-tuning for getting facts into a model, including facts the model had seen before. The honest rule: if you need the model to know something, retrieve it; if you need the model to act a certain way, fine-tune it. The decision between the two, alongside plain prompting, deserves its own treatment, and we cover it in RAG versus fine-tuning versus prompting.
Full fine-tuning versus the efficient kind
Say you have decided fine-tuning is right. There are two ways to do it, and the gap between them is large.
Full fine-tuning updates every weight in the model. For a modern model that is billions of parameters, all in motion. It works, but the bill is steep: you need optimizer state and gradients for every parameter, which can push memory past 780 gigabytes for a 65-billion-parameter model, and every fine-tuned variant is a complete copy of the model to store and serve. Train for three departments and you are hosting three full models.
Parameter-efficient fine-tuning, or PEFT, takes a different route. Freeze the original weights entirely. Train a small set of new parameters bolted alongside them. The pretrained knowledge is held still, and only a thin adjustment layer learns. Hugging Face's PEFT library collects the family of these methods, and one of them has become the default for almost everyone.
LoRA and QLoRA in plain language
LoRA, low-rank adaptation, comes from a 2021 Microsoft paper. Its insight is quiet and powerful: when you fine-tune a model, the change you make to a big weight matrix has low rank. It is simple enough to be captured by two much smaller matrices multiplied together. So LoRA never touches the original matrix. It freezes it and learns the small pair beside it. At inference the two are combined, so there is no extra latency, unlike older adapter methods that added compute at every step.
The numbers are why LoRA took over. The original paper reported, against GPT-3 at 175 billion parameters, up to 10,000 times fewer trainable parameters and a roughly threefold cut in GPU memory, while matching the quality of full fine-tuning. A practical consequence: a LoRA adapter is tiny, often a few megabytes to a few hundred, against tens of gigabytes for a full model. You can keep one frozen base model and swap in a different small adapter per task, per customer, per tone.
QLoRA, from a 2023 University of Washington paper, pushes the same idea further. It quantizes the frozen base model down to 4-bit precision, shrinking its memory footprint hard, while training the LoRA adapter on top in higher precision. It adds a 4-bit number format designed for how weights are actually distributed, plus a couple of memory tricks. The headline result, in the paper's own framing, was fine-tuning a 65-billion-parameter model on a single 48GB GPU while keeping the quality of full 16-bit tuning. That moved serious fine-tuning from a data-center job to something a small team can run. The tradeoff is honest: practitioner testing, including Sebastian Raschka's experiments, found QLoRA cuts memory by roughly a third while adding roughly 40 percent to training time. You trade clock for hardware access.
Why does LoRA cover the large majority of real needs? Because most fine-tuning in production is the behavior work above: a format, a tone, a narrow skill. For those tasks LoRA matches full fine-tuning closely, and it carries a real bonus. The Databricks study LoRA Learns Less and Forgets Less found LoRA changes the model less aggressively, which means it forgets less of the original model's ability. The frozen base is a safety rail against catastrophic forgetting. The same study is candid about the limit: when you are pushing substantial new capability or domain depth into a model, low-rank settings can fall short of full fine-tuning, and the gap does not always close even at high rank. That case is real. It is also the minority. Most teams that reach for full fine-tuning would have been served by a LoRA adapter, the same way most teams that reach for fine-tuning would have been served by better retrieval.
A rule of thumb that holds
Before you fine-tune anything, ask one question: am I trying to change what the model knows, or how it behaves?
If the answer is knowledge, facts, policies, documents, anything that has to be current and citable, do not fine-tune. Use retrieval. Fine-tuning will be slower to learn it, will make the model more prone to invention, and gives you no way to update a fact or trace where an answer came from.
If the answer is behavior, a fixed output format, a domain tone, a structured-output habit, a sharpened narrow skill, fine-tuning is the right tool. Start with a parameter-efficient method, almost certainly LoRA or QLoRA. Reach for full fine-tuning only when you have evidence that a serious capability gain is needed and a LoRA adapter has measurably failed to deliver it.
And keep the order right. Try a better prompt first. If the model needs facts, add retrieval. Fine-tune when prompting and retrieval have done all they can and a behavior still will not hold. Fine-tuning is the precision instrument you pick up last, not the hammer you reach for first. It earns its place by shaping how a model works, and it earns its bad reputation only when someone asks it to remember.
For the bigger picture of how a model's behavior is shaped in the first place, before any team-level fine-tuning, see how models learn to behave.
Council summary
This post argues that fine-tuning is a tool for changing how a model behaves, not what it knows, and that the common instinct to fine-tune facts into a model is both expensive and actively harmful. It backs the claim with controlled research: tuning on unfamiliar facts is learned slowly, raises hallucination rates linearly, risks catastrophic forgetting, and in one Nature experiment turned a narrowly tuned model broadly misaligned. The constructive half is just as concrete, mapping fine-tuning to its real strengths, fixed output formats, domain tone, and sharpened narrow skills, and explaining why LoRA and QLoRA cover the large majority of production needs at a fraction of the cost. The reader's takeaway is a decision rule worth keeping: if you need the model to know something, retrieve it; if you need it to act a certain way, fine-tune it, and reach for that instrument only after prompting and retrieval have done their work.
Comments