agent orchestration frameworks

LangGraph vs CrewAI vs AutoGen: How Agent Frameworks Differ

Three frameworks, three answers to the same question: who controls the agent's state. Pick the one whose worldview fits your task, or write the loop yourself.

Every comparison of agent frameworks you have read is probably a table. Rows for features, columns for frameworks, a scattering of checkmarks. The table is the wrong artifact. It implies the frameworks are doing the same job with different feature sets and that you should pick the one with the most ticks. They are not doing the same job. Each one starts from a different belief about what an agent fundamentally is, and that belief decides how your code is shaped, how you debug it at 2am, and how much it hurts when you outgrow it.

So this is not a table. It explains the three design philosophies behind LangGraph, CrewAI, and Microsoft AutoGen, plus the newer entrants from OpenAI and Google, well enough that you can tell which worldview matches the system you are building. And it ends with the question a lot of teams skip: whether to adopt any of them at all.

Origin: what an orchestration framework is for

Strip an agent down and you find a loop. The model receives a goal and the current state. It picks a tool to call or decides it is finished. The tool runs, its result gets appended to the context, and the model is called again. Plan, act, observe, repeat, until the goal is met or a limit trips. Braintrust calls this the canonical agent architecture, and the striking thing is that the loop itself is about five lines of code. Claude Code runs one. The OpenAI Agents SDK runs one. Every framework in this article is built on top of one.

If the loop is five lines, what is a framework for. It is for everything that surrounds the loop in production. State: the structured object that carries information between steps, so the agent knows what it has already done. Control flow: the rules deciding what runs next, including branches, loops, and parallel paths. Persistence: writing that state to durable storage so a crashed run can resume instead of starting over. Human-in-the-loop: pausing to surface a decision to a person, then continuing from that point. And tool wiring: a consistent way to expose functions, APIs, and other agents to the model.

Write a serious agent by hand and you will build all five eventually. That is the real pitch of a framework: not the trivial loop, but the unglamorous infrastructure around it, plus a set of opinions about how to organize it. The frameworks differ because the opinions differ.

Present: three philosophies, named plainly

LangGraph: an agent is a stateful graph

LangGraph, from the LangChain team, takes the most literal view of the problem. It says an agent is a directed graph. You define nodes, which are functions that do work, and edges, which decide which node runs next. A shared state object flows through the graph, and every node reads it and returns updates to it. Edges can be conditional, so the graph branches on what the state contains, and they can loop back, so an agent can revisit a step. The control flow is not hidden inside model reasoning. You draw it.

That explicitness is the point. Because the graph is a real data structure, LangGraph can do things the others struggle with. Its persistence layer checkpoints the full state at every node, into Postgres, Redis, or MongoDB. A run that dies halfway resumes from its last checkpoint with no lost work. The same checkpoints give you human-in-the-loop for free: pause the graph at a node, wait for a person, resume. They also give you time-travel debugging, where you rewind a non-deterministic run to an earlier state and replay it. For an agent that touches money or production data, that audit trail is not a nicety.

The cost is steepness. You model your problem as a state machine before you write much logic, and practitioner comparisons consistently put LangGraph as the hardest of the three to learn, with the longest road to a first demo. LangGraph 1.0, released in October 2025, shipped with no breaking changes, a deliberate signal of stability after years of LangChain churn. LangChain names production users including Uber, LinkedIn, and Klarna, whose support assistant runs on LangGraph. Reach for it when the system is long-running, needs to survive a crash, and someone will eventually have to explain to an auditor exactly what it did.

CrewAI: an agent is a role on a team

CrewAI starts somewhere completely different. It does not ask you to think about state or graphs. It asks you to think about a team. You define agents the way you would write a job description: each gets a role, a goal, and a backstory. The role is the job title, the goal is what the agent is trying to achieve, and the backstory is a chunk of system prompt that shapes how it reasons. "Senior security researcher focused on web vulnerabilities" produces visibly different behavior than "researcher." You then write tasks, assign them to agents, and group the agents into a crew. The CrewAI docs make the metaphor the whole interface.

This is a genuine design philosophy, not just friendlier naming. It bets that the natural way to decompose an agentic problem is by role, the way you would staff a project with people. The payoff is speed. A developer new to the framework can describe a research crew, a writer, and an editor, and have something running in an afternoon, because the mental model is one everyone already owns. Comparisons routinely name CrewAI the fastest of the three to a working prototype. A crew runs sequentially, or hierarchically with a manager agent that delegates, and the newer Flows layer adds event-driven orchestration for chaining crews with real conditional logic.

The bet has limits. The role abstraction is comfortable until you need fine control over what happens between steps, and then it can get in the way, because the framework is steering and you are describing intent rather than wiring flow. Persistence and checkpointing are less developed than LangGraph's. CrewAI suits internal tools, content pipelines, and any problem that genuinely decomposes into a few clear roles. It is a weaker fit when the hard part of your system is the control flow rather than the division of labor.

AutoGen: an agent is a participant in a conversation

AutoGen, born in Microsoft Research, took the third stance: multi-agent work is a conversation. You create agents and put them in a chat. They take turns, message each other, and the work emerges from the dialogue. A coding agent proposes code, a critic replies with problems, the coder revises, and a human can sit in the same conversation through a user-proxy agent. Where LangGraph makes you draw the control flow and CrewAI makes you staff a team, AutoGen says: define who is in the room and let them talk.

For open-ended problems, especially ones with a code-execute-critique rhythm, the conversational model is expressive and natural, which is why AutoGen became a favorite for research and exploration. But AutoGen is also the clearest illustration of framework churn in this whole field. The project was rewritten around an event-driven architecture for its 0.4 release. A community fork, AG2, split off to keep the older style alive. Then the destination changed: Microsoft folded AutoGen and its enterprise sibling Semantic Kernel into a single new product, the Microsoft Agent Framework, which reached 1.0 general availability in April 2026. Microsoft's own migration guidance now points new projects at the Agent Framework and has placed AutoGen in maintenance. The new product keeps AutoGen's conversational agents but adds Semantic Kernel's enterprise plumbing and, tellingly, graph-based workflows for explicit orchestration. The conversation model alone was not enough for production. Starting fresh in the Microsoft world today, you build on the Agent Framework, not AutoGen.

Two more worth knowing: OpenAI Agents SDK and Google ADK

The OpenAI Agents SDK, the production successor to the experimental Swarm, is deliberately minimal. Four primitives: agents, tools, handoffs, and guardrails. A handoff is a clean transfer of control from one agent to another, the SDK's sharpest idea. Its stated design rule is enough features to be worth using, few enough primitives to learn quickly. It is model-agnostic in principle but most natural with OpenAI models. In April 2026 OpenAI extended it with native sandboxed execution and a model-native harness for long-horizon tasks, explicitly because too many teams were rebuilding the same infrastructure by hand.

Google's Agent Development Kit powers Google's own Gemini Enterprise agent products, open-sourced for Python, Java, Go, and TypeScript. It is code-first and event-driven, with a graph-based execution engine supporting routing, fan-out and fan-in, loops, retries, and human-in-the-loop, plus one-command deployment to Google Cloud. If LangGraph is the graph framework for the open ecosystem, ADK is the graph framework for teams standardized on Google Cloud.

Notice the pattern. Both newcomers, and the rebuilt Microsoft offering, converged toward explicit control flow and away from pure free-form conversation. The field learned something.

The question underneath: adopt a framework, or write the loop

Here is what the comparison tables never print. The honest first question is not which framework. It is whether to use one at all.

The case against is real. Frameworks here change fast, and the churn has a cost. LangChain alone restructured its API across v0.1, v0.2, and v0.3, and developers describe a steady upgrade anxiety, the fear that bumping a dependency triggers a cascade of refactors. AutoGen's path, from rewrite to fork to absorption into a different product, is the same story. Abstractions hide things, and a hidden control flow is hard to debug when an agent misbehaves. If your agent calls two or three tools in a straight line, a framework is mostly overhead. You can write that loop, with retries and a token cap, in an afternoon, and own every line.

The case for is also real, and it sharpens as the system grows. The moment you need durable checkpointing, resumable runs, human-in-the-loop pauses, parallel branches, and tracing, you are building a framework whether you admit it or not. The choice is a known one with documentation and a community, or a private one only your team understands. The MAST failure taxonomy and Anthropic's own measurements, covered in the multi-agent debate, show how many ways coordination breaks once you go past a single agent. A framework encodes hard-won fixes for some of them.

A decision guide that holds up:

  • Linear flow, a handful of tools, one agent: skip the framework. A hand-written loop, or the thin OpenAI Agents SDK, is the right size. Do not buy infrastructure you will not use.
  • Production system needing persistence, crash recovery, human approval gates, and an audit trail: LangGraph, or Google ADK if you live on Google Cloud. The graph model earns its learning curve here.
  • A problem that cleanly splits into roles, and you want a prototype this week: CrewAI. Its speed is real, and you can harden later.
  • Already committed to the Microsoft stack: the Microsoft Agent Framework, not AutoGen.
  • Type safety and validated outputs are your top concern: look at PydanticAI, which treats agents as ordinary, type-checked Python.

Whatever you pick, keep your domain logic, prompts, and tools behind your own interfaces rather than scattered through framework-specific calls. The framework should be a layer you can replace, not a foundation you are married to. Given the churn rate, you will be glad you did.

Future and impact

Two forces are pulling this category toward consolidation. The first is the protocol layer. The Model Context Protocol standardized how an agent reaches a tool, and Agent2Agent standardized how agents talk to each other, both now under Linux Foundation governance. As that wiring becomes shared infrastructure, it stops being a reason to pick one framework, and the frameworks compete on state, control flow, and observability instead. The second is convergence on design. OpenAI's SDK, Google ADK, and the rebuilt Microsoft Agent Framework all landed on explicit, graph-like control flow with checkpointing and human-in-the-loop. The free-form conversational model, exciting in 2023, turned out too loose for production.

The honest risk is that this stays a churning category for a while yet. APIs will keep moving, frameworks will keep merging and forking, and a tutorial written today may not compile in a year. That is the strongest argument for understanding the philosophies rather than memorizing the APIs. The graph model, the role model, and the conversation model are durable ideas. The function names are not. What a framework cannot give you either way is judgment about which design pattern a task actually needs, which is the work that decides whether the system is reliable.

For an enterprise moving agents from pilot to production, the framework is rarely the hard part. The hard part is the orchestration design underneath it: which steps are deterministic, where state must persist, where a human must approve, how failures are caught and traced. An implementation partner like Perform Digital spends its time there, on the architecture, treating the framework as a deliberate and replaceable choice rather than the starting point. Pick the worldview that matches your task, and the logo on it will matter less than you think.

Council summary

This post argues that LangGraph, CrewAI, and AutoGen are not rival feature sets but rival beliefs about what an agent is: a stateful graph you draw, a team of roles you staff, or a conversation you convene. It is current as of May 2026, with AutoGen now folded into the Microsoft Agent Framework and the whole field converging on explicit, graph-like control flow. The reader should leave able to match a framework to a system, and willing to skip all of them and write the loop when the task is small.

Comments

Leave a comment

Your email won't be published. Comments are reviewed before they appear.
★ Read next