GraphRAG

GraphRAG: Why Knowledge Graphs Came Back to Fix Retrieval

Vector search finds passages that look like your question. GraphRAG retrieves over a map of how facts connect, so it can answer questions vector search cannot.

Give a vector RAG system every annual report a company has filed and ask a simple question: what are the recurring risks management keeps flagging? It will retrieve five or six chunks containing the word risk, summarise them, and hand you an answer that sounds complete. It is not. The system never read the other nine thousand chunks, because no single chunk says "here are the themes across the whole corpus." Themes are spread across every passage, and similarity search cannot see a pattern it has nothing to match against.

That is one of two questions standard retrieval cannot reach. The other is the multi-hop question: which of our suppliers is two steps removed from a sanctioned entity, who signed off on the contract the disputed clause came from. Answering needs a chain of facts across documents, and a single similarity search returns documents that resemble the question, not the ones that complete the chain. This post is about the fix for both, a retrieval method called GraphRAG, and about why a decades-old idea, the knowledge graph, walked back into the centre of AI to make it work. It is part two of a series. Part one, Retrieval-Augmented Generation: From Vector Search to Agentic RAG, traced how RAG grew from a fixed pipeline into something that reasons. Start there if the basics are not fresh.

Origin: the graph that search engines never abandoned

A knowledge graph is a simple structure with a long history. It stores information as nodes and edges. A node is a thing: a person, a company, a drug, a contract, a regulation. An edge is a relationship between two things, and it carries a type: acquired, reports to, interacts with, supersedes, cites. "Aspirin inhibits COX-1" is one node, one edge, one node. Stack millions of those and you have a structure holding not just facts but how they connect, in a form machines can walk.

The idea is old. The term knowledge graph was coined back in 1972, and it grew out of earlier work on semantic networks and ontologies, formal schemes for writing down what kinds of things exist and how they may relate. Through the 2000s the semantic web and linked data movement tried to turn the internet into one queryable graph. The grand version mostly did not happen. The pieces that worked were industrial. Google put a knowledge graph behind its search box in 2012, which is why a search for a person returns a panel of structured facts rather than ten blue links. Facebook, LinkedIn, Amazon, and every voice assistant run on graphs of the same shape.

So knowledge graphs never went away. They fell out of fashion for new projects. Building one was slow and manual: you needed domain experts to design the ontology and analysts to populate it, and the result was rigid, because adding a new kind of relationship meant revising the schema. When dense vector embeddings arrived and made semantic search cheap and almost automatic, the graph started to look like the hard, expensive way to do what the vector had made easy.

Two things pulled it back. The first was the limits of vector RAG becoming impossible to ignore once teams ran it in production: the relationship-blindness and the inability to summarise a whole corpus. The second was that the thing that made graphs expensive, the manual extraction, was suddenly something a language model could do. The same model you wanted to ground could read your documents and pull out the entities and relationships itself, and the cost wall came down. By 2024 Gartner placed knowledge graphs on the Slope of Enlightenment of its AI hype cycle, the stretch where a technology stops being a fad and starts being treated as foundational.

Present: how GraphRAG actually works

The paper that gave the pattern its name came from Microsoft Research. The team published a blog introducing GraphRAG in February 2024, the full paper, From Local to Global: A Graph RAG Approach to Query-Focused Summarization, by Darren Edge, Ha Trinh, Jonathan Larson and colleagues, in April, and open-sourced the code on GitHub in July. The framing is precise. Baseline RAG fails when a question needs you to traverse disparate pieces of information, and it fails on questions aimed at an entire corpus rather than any passage in it. GraphRAG is built for both.

It works in two phases, and the expensive one happens once, before any question is asked. This is the indexing phase. The system feeds your documents to a language model and asks it to do extraction: read each chunk, name the entities in it, and name the relationships between them. Out of unstructured text comes a structured graph. In the paper's runs, a million tokens of podcast transcripts produced a graph of 8,564 nodes and 20,691 edges, and that graph is the index.

Then comes the move that makes GraphRAG more than a graph database. The system runs community detection on the graph, using the Leiden algorithm, which finds clusters of densely interconnected nodes. A cluster is a community: a set of entities that mention each other far more than anything else, which usually maps to a topic. Communities nest, so you get a hierarchy, broad themes at the top splitting into finer subtopics. For every community at every level, the model writes a plain-language summary. The corpus is now a graph plus a tree of summaries, from a one-paragraph view of the whole dataset down to tight descriptions of small clusters.

The second phase is querying, and it splits by question type. A local question, one about a specific entity, starts at that entity's node, walks out to its neighbours along the edges, and answers from that neighbourhood. This is how multi-hop questions get answered: the chain of facts is a path through the graph that the system walks. A global question, the "what are the themes" kind, ignores individual nodes and uses the community summaries. It runs a map-reduce: each relevant summary produces a partial answer, and the partials combine into one. The corpus-wide summary no single chunk contained now exists, because the indexing phase built it.

The paper's results are clear on the questions GraphRAG was built for. On global sensemaking questions over datasets in the million-token range, judged head to head against a vector RAG baseline, GraphRAG won on comprehensiveness 72 to 83 percent of the time and on diversity of perspective by a similar margin. Independent 2026 testing reports the same shape on multi-hop benchmarks, where graph traversal answers questions that three separate similarity lookups cannot stitch together. On plain factual questions, the kind one good chunk already answers, GraphRAG offers little over vector search. It is a tool for a specific class of question, not a replacement.

In practice few teams build pure GraphRAG. The dominant pattern is graph plus vector, and Neo4j, the graph database company, has been the loudest in shaping it. The hybrid uses vector or keyword search to find good starting nodes, then traverses the graph from there to gather connected context the original chunks never mentioned, usually one or two hops out. Vector search supplies breadth and speed. The graph supplies the relationships and, just as important, explainability: a graph answer is a named path through named entities, so you can show the chain of reasoning a black-box vector lookup cannot. Neo4j ships an LLM Knowledge Graph Builder that turns documents into a graph and a GraphRAG Python package wired to LangChain and LlamaIndex.

Present: the use cases, and the honest cost

GraphRAG earns its complexity in domains where the relationships between facts are the substance of the work. In legal research, statutes, cases, and regulations form a citation network, and the question that matters, which precedent controls here and what depends on it, is a graph traversal, not a keyword match. In compliance, mapping a regulation to the specific internal controls that satisfy it is a relationship problem, and a graph makes the link auditable. Biomedical research was an early adopter, because drug, gene, and disease relationships are already understood as a graph, and traversing it can surface a connection no single paper states outright. Financial crime is the fourth: anti-money-laundering investigation runs on graphs of people, accounts, companies, and transactions, where the fraud is a pattern of connections, layering, shared addresses, money-mule rings, that no single record reveals. Microsoft has folded GraphRAG into Microsoft Discovery, its scientific research platform.

Now the part the vendor pages skip. GraphRAG is expensive, and it is not free to keep alive. The indexing phase pays a language model to read every document and produce structured output, and that extraction is the bulk of the cost. Practitioner write-ups put GraphRAG indexing at ten to forty times the cost of a plain vector index, with the gap driven almost entirely by entity extraction tokens. Use a cheaper model and extraction quality drops; LLM entity extraction lands around 60 to 85 percent accurate depending on the domain, and every mistake becomes wrong graph structure and a confidently wrong answer.

Then there is entity resolution, the unglamorous problem at the centre of every real graph. When one document says Apple, another says Apple Inc., and a third says AAPL, the graph is only correct if all three collapse to one node. Get it wrong and you either fracture one entity into three or merge two that should be separate, and the traversals turn to nonsense. The graph is not static either: documents change, and teams report spending more effort keeping the graph fresh than on the initial build.

This is why the seasoned advice for 2026 is to reach for GraphRAG deliberately, not by default. Roughly 80 percent of enterprise queries are simple semantic lookups that fast vector search handles well. When a RAG system underperforms, the cause is more often weak chunking or a missing re-ranker than missing graph structure, and both are far cheaper to fix. GraphRAG is the right call when the failures are specifically relationship failures: the answer needs facts connected across documents, or the question is about the corpus as a whole. The wrong move is three months building a graph for a problem better retrieval solves in a week.

Future and impact: graph retrieval as a tool the agent picks up

The first clear direction is bringing the cost down. Microsoft's own answer is LazyGraphRAG, released in late 2024, which defers the expensive summarisation. Instead of summarising every community up front, it does lightweight graph construction at indexing time and pushes the language-model work to query time, doing it only for the communities a question actually touches. Microsoft reports indexing cost falling to roughly the level of plain vector RAG, around 0.1 percent of full GraphRAG, while holding answer quality, and it puts LazyGraphRAG's query cost for global questions at more than 700 times lower than GraphRAG's global search. That matters, because the up-front build cost was the main thing keeping GraphRAG out of smaller projects.

The second direction is graph retrieval becoming one tool an agent chooses, rather than a fixed pipeline. Part one of this series ended on agentic RAG, where a model decides when and how to retrieve. Graph traversal slots straight into that loop as another tool call: an agent can route a plain factual question to vector search, a multi-hop question to graph traversal, and a "what are the themes" question to the community summaries. A wave of 2026 research on agentic graph search is building exactly this, graph retrieval as a step a reasoning agent takes and re-takes until it has what it needs. The honest risks carry over. A graph is only as good as the extraction behind it, agentic loops add cost and latency, and a wrong edge is a wrong answer wearing the costume of a reasoning chain.

The longer arc is that the two methods stop being rivals. Vector search measures how much two pieces of text resemble each other. A graph records how things are actually connected. Most hard questions need both, a way in by similarity and a way through by relationship, which is why the graph-plus-vector hybrid is what serious systems are converging on. For an enterprise, the lesson is that retrieval is an architecture decision with real cost attached, not a default. Knowing which questions need a graph, and resisting the urge to build one for the questions that do not, is much of the gap between a system that demos well and one that holds up. Matching the retrieval method to the question is the design work an implementation partner like Perform Digital is brought in to get right before the build, not after it has quietly failed.

Council summary

This post argues that GraphRAG is not a successor to vector search but a fix for two questions vector search structurally cannot answer: the multi-hop question that needs facts chained across documents, and the global question that asks about a whole corpus at once. It explains the mechanism honestly, the LLM-built entity graph, Leiden community detection, the local and global query paths, and it does not hide the cost: indexing that can run ten to forty times a vector index, brittle entity resolution, and a graph that decays without upkeep. The reader's takeaway is a decision rule, not a hype line. Most enterprise queries are simple lookups that better chunking and a re-ranker fix far more cheaply, so reach for a graph only when the failures are genuinely relationship failures. The direction of travel is convergence: graph-plus-vector hybrids, cheaper variants like LazyGraphRAG, and graph traversal becoming one tool a reasoning agent picks up when the question calls for it.

Comments

Leave a comment

Your email won't be published. Comments are reviewed before they appear.
★ Read next