📘 What LLM frameworks are (and why they matter)

An LLM framework is a library that sits between your application code and one or more large language models. At minimum it wraps provider APIs; at maximum it takes responsibility for prompt construction, tool calling, memory, retrieval, typed output, multi-step agent loops, evaluation and observability. Most of the work of building a production LLM feature is plumbing, and frameworks exist to save you from writing that plumbing from scratch.

In 2026 the category has matured in two directions at once. The big names (LangChain, LlamaIndex) have split into modular sub-projects with dedicated cloud platforms. At the same time, a new generation of smaller, more opinionated libraries — Mastra, PydanticAI, smolagents, BAML — have arrived for teams who found the older stacks too heavy. Model vendors have also started shipping their own agent SDKs (OpenAI Agents SDK, Claude Agent SDK), which handle the loop for their specific models.

This page lists the frameworks worth knowing, groups them by what they are good at, and gives you a short, neutral take on each. All links go to the official project site.

🗺️ The 2026 landscape at a glance

Rather than thinking of a single “best” framework, it helps to think in five overlapping layers:

  • Agent frameworks — run the model-tool-model loop, handle planning and multi-agent coordination.
  • RAG & data frameworks — ingest documents, chunk them, embed them, and retrieve relevant context at query time.
  • Orchestration & general — chains, graphs, routers and the glue that connects prompts, tools and data sources.
  • Typed output / structured generation — force the model to return valid JSON, Pydantic objects or schema-constrained strings.
  • Lightweight / single-purpose — one-file libraries that do one thing well, such as provider routing or validation.

Most production stacks mix several layers. A common 2026 shape: LiteLLM for provider routing, LlamaIndex or Haystack for retrieval, Instructor or BAML for structured output, and LangGraph or a vendor Agent SDK for the control loop.

📊 Comparison table — every framework at a glance

Framework Language Type Maturity Licence Best for
LangChainPython / TSOrchestrationMatureMITGeneral-purpose chains, widest integration surface
LangGraphPython / TSAgent / graphMatureMITStateful, branching agent workflows
LlamaIndexPython / TSRAGMatureMITIngesting, indexing and querying your own data
HaystackPythonRAGMatureApache 2.0Production search and QA pipelines
RAGFlowPythonRAGGrowingApache 2.0Deep document understanding with OCR and layout
VerbaPythonRAGGrowingBSD-3Drop-in RAG chatbot on top of Weaviate
Semantic Kernel.NET / Py / JavaOrchestrationMatureMITEnterprise .NET and Microsoft-stack apps
Vercel AI SDKTypeScriptOrchestrationMatureApache 2.0TypeScript web apps, streaming UIs, edge runtimes
AutoGenPython / .NETAgentsMatureCC-BY-4.0 / MITMulti-agent conversation, research-grade experiments
CrewAIPythonAgentsMatureMITRole-based multi-agent teams with ergonomic APIs
OpenAI Agents SDKPython / JSAgentsStableMITAgents against GPT models with handoffs and guardrails
Claude Agent SDKPython / TSAgentsStableMITAgents against Claude with tool use and subagents
MastraTypeScriptAgentsGrowingApache / ElasticEnd-to-end TS agent apps with workflows and evals
PydanticAIPythonAgentsGrowingMITTyped Python agents with Pydantic at the core
smolagentsPythonAgentsGrowingApache 2.0Minimal code-writing agents, Hugging Face ecosystem
InstructorPython / TS / GoTyped outputMatureMITPydantic-typed outputs from any major provider
OutlinesPythonTyped outputMatureApache 2.0Regex / grammar-constrained generation, local models
GuidancePythonTyped outputMatureMITInterleaving generation and control flow
BAMLDSL + Py/TSTyped outputGrowingApache 2.0Schema-first prompts with a dedicated DSL
LiteLLMPythonRoutingMatureMITOne API for 100+ providers, cost and fallback control
GuardrailsPython / JSValidationMatureApache 2.0Input / output validation and safety policies
DSPyPythonProgrammingMatureMITDeclarative prompt programs you can optimise

🤖 Agent frameworks

Agent frameworks run the loop: the model picks a tool, the tool returns a result, the model reasons about the result, and the loop continues until a task is finished. They differ in how much structure they impose, how multi-agent coordination works, and which models they are tuned for.

LangChain Agents
LangChain Inc.
Python / TS
#agents#tools#mature
The long-standing agent abstraction inside LangChain. In 2026 most new agent work has moved to LangGraph, but the classic AgentExecutor interface is still widely deployed and well documented. Pairs with LangSmith for tracing and evaluations.
MITMatureMany integrations
AutoGen
Microsoft Research
Python / .NET
#multi-agent#research
A multi-agent framework that models systems as conversations between specialised agents (for example a planner, a coder, a critic). The v0.4 rewrite introduced a typed, event-driven runtime; AutoGen Studio gives you a low-code UI on top.
MIT / CC-BY-4.0MatureMulti-agent
CrewAI
CrewAI Inc.
Python
#multi-agent#ergonomic
Role-based multi-agent framework: you define agents with goals and tools, then assemble them into a “crew” that executes tasks. Known for a friendly API, strong docs and a growing enterprise offering with observability.
MITMatureRole-based
OpenAI Agents SDK
OpenAI
Python / JS
#vendor-sdk#handoffs
OpenAI’s official production agent framework. Provides agents, tools, handoffs between agents, guardrails and tracing; replaces the older Swarm experiment. Provider-agnostic in theory but optimised for GPT and Responses API models.
MITStableOfficial
Claude Agent SDK
Anthropic
Python / TS
#vendor-sdk#tool-use
Anthropic’s official agent SDK, extracted from the system that powers Claude Code. Exposes the same tool-use loop, subagent model, session management and MCP integrations that the CLI uses, for building custom agents against Claude.
MITStableOfficial
Mastra
Mastra
TypeScript
#typescript#workflows#evals
Open-source TypeScript framework for agents, workflows, RAG and evaluations, built on top of the Vercel AI SDK. Targets full-stack JS teams who want an all-in-one toolkit rather than stitching many libraries together.
Apache / ElasticGrowingTS-first
PydanticAI
Pydantic Services
Python
#typed#pydantic
Agent framework from the team behind Pydantic. Treats models, tools and responses as fully typed Python objects, with provider-agnostic adapters. Appeals to developers who want FastAPI-style ergonomics for LLM apps.
MITGrowingTyped
smolagents
Hugging Face
Python
#minimal#code-agents
Deliberately small agent library from Hugging Face. Emphasises code-writing agents, where the model emits Python as its action space. Pairs naturally with open-source models from the Hugging Face Hub.
Apache 2.0GrowingLightweight

📚 RAG & data frameworks

RAG (retrieval-augmented generation) frameworks focus on getting the right context into the prompt. They handle document ingestion, chunking, embedding, vector storage, hybrid search, re-ranking and query pipelines. If your product is “chat with your documents”, you live in this layer.

LlamaIndex
LlamaIndex Inc.
Python / TS
#rag#indexing#agents
The reference framework for building RAG and “agents over your data”. Strong loaders, chunking strategies, query engines and evaluation tooling; LlamaCloud adds a managed ingestion and parsing service. Integrates with most vector stores and model providers.
MITMatureBiggest RAG ecosystem
Haystack
deepset
Python
#rag#pipelines#enterprise
Mature pipeline framework for search, question answering and RAG. Built around composable components and explicit pipelines, which maps well to production deployments. Actively developed by deepset, which also offers a commercial platform.
Apache 2.0MatureProduction pipelines
RAGFlow
InfiniFlow
Python
#rag#document-ai
Open-source RAG engine with an emphasis on deep document understanding — OCR, layout parsing and table extraction — for complex PDFs and scanned files. Ships with a web UI and is popular in regulated and document-heavy verticals.
Apache 2.0GrowingDocs / OCR
Verba
Weaviate
Python
#rag#drop-in
Opinionated open-source RAG chatbot from Weaviate. Provides a ready-made stack — ingestion, retrieval, chat UI — that is easy to self-host and tweak. A good starting point when you want a working RAG app rather than a toolkit.
BSD-3GrowingOpinionated

🧵 Orchestration & general-purpose

Orchestration frameworks are the generalists. They give you chains, graphs, memory, tools, callbacks and integrations, so you can wire together prompts, retrievers, APIs and model calls into arbitrary flows. They tend to be the broadest and also the most opinionated about how an LLM app should be structured.

LangChain
LangChain Inc.
Python / TS
#orchestration#integrations
The most widely-used LLM framework. Provides chat models, prompts, chains, tools, retrievers, memory and hundreds of integrations across both Python and TypeScript. Often criticised for abstraction overhead, but still the default starting point for many teams.
MITMatureWidest surface
LangGraph
LangChain Inc.
Python / TS
#graph#stateful#agents
Low-level graph framework from the LangChain team for building stateful, controllable agents. Flows are expressed as nodes and edges with explicit state; supports streaming, human-in-the-loop and durable execution via LangGraph Platform.
MITMatureStateful graphs
Semantic Kernel
Microsoft
.NET / Py / Java
#enterprise#dotnet
Microsoft’s SDK for integrating LLMs into enterprise applications, particularly on .NET. Centres on plugins (tools), planners and a kernel abstraction; first-class support for Azure OpenAI and Copilot-style scenarios.
MITMature.NET / Azure
Vercel AI SDK
Vercel
TypeScript
#web#streaming#edge
The de-facto LLM SDK for TypeScript web apps. Provides a unified chat / generate / stream API across providers, React hooks, structured output helpers and tool calling. Runs well on serverless and edge runtimes.
Apache 2.0MatureTS / web

🧱 Typed output & structured generation

A large share of production bugs in LLM apps are shape problems: the model returns nearly-valid JSON that your parser trips over. This group of frameworks exists to force the model to produce output that conforms to a schema, either at the token level or by validating and retrying.

Instructor
Jason Liu et al.
Python / TS / Go
#pydantic#json
A thin wrapper that turns Pydantic models into response schemas across providers. You define a Python class, Instructor handles the schema, the function call and validation. Ports exist for TypeScript and Go. A go-to for “just give me typed JSON”.
MITMatureCross-provider
Outlines
dottxt
Python
#constrained#grammars#local
Constrained generation library: regex, JSON schema or context-free grammars are compiled into logits masks so the model can only emit valid tokens. Works especially well with open-source models you run yourself, where you have logit access.
Apache 2.0MatureLocal models
Guidance
Guidance AI
Python
#templates#control
Lets you interleave fixed text, generated text and control flow (loops, conditionals) into a single programme that drives the model. Originally from Microsoft, now maintained independently; popular when you want tight control over the generation trace.
MITMatureControl flow
BAML
Boundary
DSL + Python / TS
#dsl#schema-first
A small domain-specific language for writing prompts as typed functions. You define inputs, outputs and prompt text in .baml files; the toolchain generates typed Python or TypeScript clients with deterministic parsing. Strong VS Code tooling.
Apache 2.0GrowingDSL

Lightweight / single-purpose

The last group is small libraries that do one thing and do it well. They intentionally do not try to be a framework; they slot into whatever stack you have.

LiteLLM
BerriAI
Python
#router#proxy#fallback
A single client and optional proxy that speaks the OpenAI API shape to over 100 providers — OpenAI, Anthropic, Google, Azure, Bedrock, open-source hosts and more. Adds load balancing, budgets, retries and observability; widely used as a backplane.
MITMature100+ providers
Guardrails
Guardrails AI
Python / JS
#validation#safety
Validation layer for LLM inputs and outputs. You declare rules (PII leakage, toxicity, competitor names, schema shape, etc.) via reusable validators; the Guardrails Hub has community ones. Framework-agnostic and easy to bolt on.
Apache 2.0MaturePolicy layer
DSPy
Stanford NLP
Python
#optimisation#programs
Declare LLM programs as modules with typed signatures, then let DSPy’s compilers optimise the prompts, demonstrations or even fine-tune weights against a metric. Shifts the mindset from “tune prompts by hand” to “programme and compile”.
MITMatureResearch-led

🎯 Decision matrix — use X if you need Y

Chat with your documents

Start with LlamaIndex or Haystack. Add RAGFlow if your corpus is heavy on scanned PDFs, tables or layout-sensitive documents.

Build a TypeScript web app

Pick Vercel AI SDK for the client-facing layer and Mastra if you need agents, workflows and evals in the same codebase.

Multi-step autonomous agents

Use LangGraph for fine-grained control over state, AutoGen for conversation-style multi-agent research, or CrewAI for role-based teams with ergonomic APIs.

Stay close to the model vendor

Use OpenAI Agents SDK for GPT-family models or Claude Agent SDK for Claude. Both ship official primitives for tools, handoffs and guardrails.

Strict typed JSON out

Instructor for cloud models with Pydantic schemas; Outlines for regex or grammar-constrained generation on local models; BAML if you want a schema-first DSL.

One provider API, many models

Put LiteLLM in front of everything. It abstracts over 100+ providers behind the OpenAI API shape and adds retries, budgets and routing.

Enterprise .NET stack

Semantic Kernel is the clearest fit, especially alongside Azure OpenAI and existing Microsoft identity, logging and deployment plumbing.

Small, minimal agents

smolagents for code-writing agents; PydanticAI if you want the typed-Python feel of FastAPI for LLM apps.

Tighten safety and validation

Layer Guardrails on top of whatever agent or chain you use — it is framework-agnostic and can gate both inputs and outputs.

Optimise prompts systematically

Reach for DSPy. It is the clearest answer to “treat prompts as code that can be compiled and optimised against a metric”.

🚫 When not to use a framework

A framework is not free. You pay in dependencies, abstraction layers, performance overhead and the risk that a future breaking change ripples through your code. There are real cases where calling the provider SDK (or plain HTTP) is the better choice:

  • Single-shot prompts or thin wrappers. If your feature is “send a prompt, show the answer”, the OpenAI, Anthropic or Gemini SDK plus a few lines of code will be shorter, faster and easier to debug than any framework.
  • Simple chatbots with short memory. You can manage a message list and a system prompt in a few dozen lines; frameworks become necessary when tool use, retrieval or multi-step planning enter the picture.
  • Prototyping with unusual providers or models. When you are experimenting with a new endpoint that lacks framework support, raw HTTP often wins on velocity.
  • Latency-critical code paths. Heavy orchestration layers can add noticeable overhead in streaming chat; measure before committing.
  • You already have strong internal libraries. Some mature teams have grown their own mini-framework that fits their codebase and domain better than anything off-the-shelf.

A useful heuristic: start with the vendor SDK. Move to a framework the first time you need retrieval, typed output, a tool loop, or a provider swap.

Frequently asked questions

An LLM framework is a library or toolkit that sits between your application code and a large language model. It handles prompt construction, tool calling, memory, structured output, retrieval, evaluation and multi-step agent loops so you do not have to write all of that plumbing against a raw HTTP API.

No. For single-shot prompts, simple chatbots or thin wrappers, calling the provider SDK directly is often faster and easier to debug. Frameworks pay off once you need agents with tools, retrieval over your own data, structured outputs, routing across providers or repeatable evaluations.

They overlap heavily in 2026 but have different centres of gravity. LangChain and LangGraph focus on general orchestration, agents and tool use; LlamaIndex is optimised for ingesting, indexing and querying your own documents (RAG). Many teams use both in the same stack.

There is no single winner. OpenAI Agents SDK and Claude Agent SDK are the most tightly integrated with their respective models; LangGraph and AutoGen target complex multi-step or multi-agent flows; CrewAI and Mastra prioritise developer ergonomics; smolagents and PydanticAI keep the surface area small.

Most are open source under permissive licences (MIT, Apache 2.0) and free to self-host. Some vendors offer paid hosted platforms on top — for example LangSmith for LangChain, LlamaCloud for LlamaIndex, or Mastra Cloud — but the core framework is typically free.

Yes, and it is common. A typical 2026 stack might use LiteLLM for provider routing, Instructor or BAML for structured output, LlamaIndex for retrieval, and LangGraph or an Agent SDK for the control loop. Framework-agnostic pieces like Guardrails and LiteLLM are specifically designed to slot in alongside others.

Disclaimer: Licences, maturity labels and feature descriptions reflect the state of each project at the time of writing (April 2026) and are based on publicly available documentation. LLM frameworks iterate quickly; always check the upstream docs before committing to one. AIHumanLove.com is not affiliated with any of the listed projects or vendors and does not receive compensation for inclusion on this page.