LLM Frameworks 2026 — 20+ Compared (LangChain, LlamaIndex, DSPy + more)

What LLM frameworks are
The 2026 landscape
Comparison table
Agent frameworks
RAG & data frameworks
Orchestration & general
Typed output
Lightweight / single-purpose
Decision matrix
When not to use one
FAQ
Related on AIHumanLove

📘 What LLM frameworks are (and why they matter)

An LLM framework is a library that sits between your application code and one or more large language models. At minimum it wraps provider APIs; at maximum it takes responsibility for prompt construction, tool calling, memory, retrieval, typed output, multi-step agent loops, evaluation and observability. Most of the work of building a production LLM feature is plumbing, and frameworks exist to save you from writing that plumbing from scratch.

In 2026 the category has matured in two directions at once. The big names (LangChain, LlamaIndex) have split into modular sub-projects with dedicated cloud platforms. At the same time, a new generation of smaller, more opinionated libraries — Mastra, PydanticAI, smolagents, BAML — have arrived for teams who found the older stacks too heavy. Model vendors have also started shipping their own agent SDKs (OpenAI Agents SDK, Claude Agent SDK), which handle the loop for their specific models.

This page lists the frameworks worth knowing, groups them by what they are good at, and gives you a short, neutral take on each. All links go to the official project site.

🗺️ The 2026 landscape at a glance

Rather than thinking of a single “best” framework, it helps to think in five overlapping layers:

Agent frameworks — run the model-tool-model loop, handle planning and multi-agent coordination.
RAG & data frameworks — ingest documents, chunk them, embed them, and retrieve relevant context at query time.
Orchestration & general — chains, graphs, routers and the glue that connects prompts, tools and data sources.
Typed output / structured generation — force the model to return valid JSON, Pydantic objects or schema-constrained strings.
Lightweight / single-purpose — one-file libraries that do one thing well, such as provider routing or validation.

Most production stacks mix several layers. A common 2026 shape: LiteLLM for provider routing, LlamaIndex or Haystack for retrieval, Instructor or BAML for structured output, and LangGraph or a vendor Agent SDK for the control loop.

📊 Comparison table — every framework at a glance

Framework	Language	Type	Maturity	Licence	Best for
LangChain	Python / TS	Orchestration	Mature	MIT	General-purpose chains, widest integration surface
LangGraph	Python / TS	Agent / graph	Mature	MIT	Stateful, branching agent workflows
LlamaIndex	Python / TS	RAG	Mature	MIT	Ingesting, indexing and querying your own data
Haystack	Python	RAG	Mature	Apache 2.0	Production search and QA pipelines
RAGFlow	Python	RAG	Growing	Apache 2.0	Deep document understanding with OCR and layout
Verba	Python	RAG	Growing	BSD-3	Drop-in RAG chatbot on top of Weaviate
Semantic Kernel	.NET / Py / Java	Orchestration	Mature	MIT	Enterprise .NET and Microsoft-stack apps
Vercel AI SDK	TypeScript	Orchestration	Mature	Apache 2.0	TypeScript web apps, streaming UIs, edge runtimes
AutoGen	Python / .NET	Agents	Mature	CC-BY-4.0 / MIT	Multi-agent conversation, research-grade experiments
CrewAI	Python	Agents	Mature	MIT	Role-based multi-agent teams with ergonomic APIs
OpenAI Agents SDK	Python / JS	Agents	Stable	MIT	Agents against GPT models with handoffs and guardrails
Claude Agent SDK	Python / TS	Agents	Stable	MIT	Agents against Claude with tool use and subagents
Mastra	TypeScript	Agents	Growing	Apache / Elastic	End-to-end TS agent apps with workflows and evals
PydanticAI	Python	Agents	Growing	MIT	Typed Python agents with Pydantic at the core
smolagents	Python	Agents	Growing	Apache 2.0	Minimal code-writing agents, Hugging Face ecosystem
Instructor	Python / TS / Go	Typed output	Mature	MIT	Pydantic-typed outputs from any major provider
Outlines	Python	Typed output	Mature	Apache 2.0	Regex / grammar-constrained generation, local models
Guidance	Python	Typed output	Mature	MIT	Interleaving generation and control flow
BAML	DSL + Py/TS	Typed output	Growing	Apache 2.0	Schema-first prompts with a dedicated DSL
LiteLLM	Python	Routing	Mature	MIT	One API for 100+ providers, cost and fallback control
Guardrails	Python / JS	Validation	Mature	Apache 2.0	Input / output validation and safety policies
DSPy	Python	Programming	Mature	MIT	Declarative prompt programs you can optimise

🤖 Agent frameworks

Agent frameworks run the loop: the model picks a tool, the tool returns a result, the model reasons about the result, and the loop continues until a task is finished. They differ in how much structure they impose, how multi-agent coordination works, and which models they are tuned for.

LangChain Agents

LangChain Inc.

Python / TS

#agents#tools#mature

The long-standing agent abstraction inside LangChain. In 2026 most new agent work has moved to LangGraph, but the classic AgentExecutor interface is still widely deployed and well documented. Pairs with LangSmith for tracing and evaluations.

MITMatureMany integrations

Visit

AutoGen

Microsoft Research

Python / .NET

#multi-agent#research

A multi-agent framework that models systems as conversations between specialised agents (for example a planner, a coder, a critic). The v0.4 rewrite introduced a typed, event-driven runtime; AutoGen Studio gives you a low-code UI on top.

MIT / CC-BY-4.0MatureMulti-agent

Visit

CrewAI

CrewAI Inc.

Python

#multi-agent#ergonomic

Role-based multi-agent framework: you define agents with goals and tools, then assemble them into a “crew” that executes tasks. Known for a friendly API, strong docs and a growing enterprise offering with observability.

MITMatureRole-based

Visit

OpenAI Agents SDK

OpenAI

Python / JS

#vendor-sdk#handoffs

OpenAI’s official production agent framework. Provides agents, tools, handoffs between agents, guardrails and tracing; replaces the older Swarm experiment. Provider-agnostic in theory but optimised for GPT and Responses API models.

MITStableOfficial

Visit

Claude Agent SDK

Anthropic

Python / TS

#vendor-sdk#tool-use

Anthropic’s official agent SDK, extracted from the system that powers Claude Code. Exposes the same tool-use loop, subagent model, session management and MCP integrations that the CLI uses, for building custom agents against Claude.

MITStableOfficial

Visit

Mastra

TypeScript

#typescript#workflows#evals

Open-source TypeScript framework for agents, workflows, RAG and evaluations, built on top of the Vercel AI SDK. Targets full-stack JS teams who want an all-in-one toolkit rather than stitching many libraries together.

Apache / ElasticGrowingTS-first

Visit

PydanticAI

Pydantic Services

Python

#typed#pydantic

Agent framework from the team behind Pydantic. Treats models, tools and responses as fully typed Python objects, with provider-agnostic adapters. Appeals to developers who want FastAPI-style ergonomics for LLM apps.

MITGrowingTyped

Visit

smolagents

Hugging Face

Python

#minimal#code-agents

Deliberately small agent library from Hugging Face. Emphasises code-writing agents, where the model emits Python as its action space. Pairs naturally with open-source models from the Hugging Face Hub.

Apache 2.0GrowingLightweight

Visit

📚 RAG & data frameworks

RAG (retrieval-augmented generation) frameworks focus on getting the right context into the prompt. They handle document ingestion, chunking, embedding, vector storage, hybrid search, re-ranking and query pipelines. If your product is “chat with your documents”, you live in this layer.

LlamaIndex

LlamaIndex Inc.

Python / TS

#rag#indexing#agents

The reference framework for building RAG and “agents over your data”. Strong loaders, chunking strategies, query engines and evaluation tooling; LlamaCloud adds a managed ingestion and parsing service. Integrates with most vector stores and model providers.

MITMatureBiggest RAG ecosystem

Visit

Haystack

deepset

Python

#rag#pipelines#enterprise

Mature pipeline framework for search, question answering and RAG. Built around composable components and explicit pipelines, which maps well to production deployments. Actively developed by deepset, which also offers a commercial platform.

Apache 2.0MatureProduction pipelines

Visit

RAGFlow

InfiniFlow

Python

#rag#document-ai

Open-source RAG engine with an emphasis on deep document understanding — OCR, layout parsing and table extraction — for complex PDFs and scanned files. Ships with a web UI and is popular in regulated and document-heavy verticals.

Apache 2.0GrowingDocs / OCR

Visit

Verba

Weaviate

Python

#rag#drop-in

Opinionated open-source RAG chatbot from Weaviate. Provides a ready-made stack — ingestion, retrieval, chat UI — that is easy to self-host and tweak. A good starting point when you want a working RAG app rather than a toolkit.

BSD-3GrowingOpinionated

Visit

🧵 Orchestration & general-purpose

Orchestration frameworks are the generalists. They give you chains, graphs, memory, tools, callbacks and integrations, so you can wire together prompts, retrievers, APIs and model calls into arbitrary flows. They tend to be the broadest and also the most opinionated about how an LLM app should be structured.

LangChain

LangChain Inc.

Python / TS

#orchestration#integrations

The most widely-used LLM framework. Provides chat models, prompts, chains, tools, retrievers, memory and hundreds of integrations across both Python and TypeScript. Often criticised for abstraction overhead, but still the default starting point for many teams.

MITMatureWidest surface

Visit

LangGraph

LangChain Inc.

Python / TS

#graph#stateful#agents

Low-level graph framework from the LangChain team for building stateful, controllable agents. Flows are expressed as nodes and edges with explicit state; supports streaming, human-in-the-loop and durable execution via LangGraph Platform.

MITMatureStateful graphs

Visit

Semantic Kernel

Microsoft

.NET / Py / Java

#enterprise#dotnet

Microsoft’s SDK for integrating LLMs into enterprise applications, particularly on .NET. Centres on plugins (tools), planners and a kernel abstraction; first-class support for Azure OpenAI and Copilot-style scenarios.

MITMature.NET / Azure

Visit

Vercel AI SDK

Vercel

TypeScript

#web#streaming#edge

The de-facto LLM SDK for TypeScript web apps. Provides a unified chat / generate / stream API across providers, React hooks, structured output helpers and tool calling. Runs well on serverless and edge runtimes.

Apache 2.0MatureTS / web

Visit

🧱 Typed output & structured generation

A large share of production bugs in LLM apps are shape problems: the model returns nearly-valid JSON that your parser trips over. This group of frameworks exists to force the model to produce output that conforms to a schema, either at the token level or by validating and retrying.

Instructor

Jason Liu et al.

Python / TS / Go

#pydantic#json

A thin wrapper that turns Pydantic models into response schemas across providers. You define a Python class, Instructor handles the schema, the function call and validation. Ports exist for TypeScript and Go. A go-to for “just give me typed JSON”.

MITMatureCross-provider

Visit

Outlines

dottxt

Python

#constrained#grammars#local

Constrained generation library: regex, JSON schema or context-free grammars are compiled into logits masks so the model can only emit valid tokens. Works especially well with open-source models you run yourself, where you have logit access.

Apache 2.0MatureLocal models

Visit

Guidance

Guidance AI

Python

#templates#control

Lets you interleave fixed text, generated text and control flow (loops, conditionals) into a single programme that drives the model. Originally from Microsoft, now maintained independently; popular when you want tight control over the generation trace.

MITMatureControl flow

Visit

BAML

Boundary

DSL + Python / TS

#dsl#schema-first

A small domain-specific language for writing prompts as typed functions. You define inputs, outputs and prompt text in .baml files; the toolchain generates typed Python or TypeScript clients with deterministic parsing. Strong VS Code tooling.

Apache 2.0GrowingDSL

Visit

⚡ Lightweight / single-purpose

The last group is small libraries that do one thing and do it well. They intentionally do not try to be a framework; they slot into whatever stack you have.

LiteLLM

BerriAI

Python

#router#proxy#fallback

A single client and optional proxy that speaks the OpenAI API shape to over 100 providers — OpenAI, Anthropic, Google, Azure, Bedrock, open-source hosts and more. Adds load balancing, budgets, retries and observability; widely used as a backplane.

MITMature100+ providers

Visit

Guardrails

Guardrails AI

Python / JS

#validation#safety

Validation layer for LLM inputs and outputs. You declare rules (PII leakage, toxicity, competitor names, schema shape, etc.) via reusable validators; the Guardrails Hub has community ones. Framework-agnostic and easy to bolt on.

Apache 2.0MaturePolicy layer

Visit

DSPy

Stanford NLP

Python

#optimisation#programs

Declare LLM programs as modules with typed signatures, then let DSPy’s compilers optimise the prompts, demonstrations or even fine-tune weights against a metric. Shifts the mindset from “tune prompts by hand” to “programme and compile”.

MITMatureResearch-led

Visit

🎯 Decision matrix — use X if you need Y

Chat with your documents

Start with LlamaIndex or Haystack. Add RAGFlow if your corpus is heavy on scanned PDFs, tables or layout-sensitive documents.

Build a TypeScript web app

Pick Vercel AI SDK for the client-facing layer and Mastra if you need agents, workflows and evals in the same codebase.

Multi-step autonomous agents

Use LangGraph for fine-grained control over state, AutoGen for conversation-style multi-agent research, or CrewAI for role-based teams with ergonomic APIs.

Stay close to the model vendor

Use OpenAI Agents SDK for GPT-family models or Claude Agent SDK for Claude. Both ship official primitives for tools, handoffs and guardrails.

Strict typed JSON out

Instructor for cloud models with Pydantic schemas; Outlines for regex or grammar-constrained generation on local models; BAML if you want a schema-first DSL.

One provider API, many models

Put LiteLLM in front of everything. It abstracts over 100+ providers behind the OpenAI API shape and adds retries, budgets and routing.

Enterprise .NET stack

Semantic Kernel is the clearest fit, especially alongside Azure OpenAI and existing Microsoft identity, logging and deployment plumbing.

Small, minimal agents

smolagents for code-writing agents; PydanticAI if you want the typed-Python feel of FastAPI for LLM apps.

Tighten safety and validation

Layer Guardrails on top of whatever agent or chain you use — it is framework-agnostic and can gate both inputs and outputs.

Optimise prompts systematically

Reach for DSPy. It is the clearest answer to “treat prompts as code that can be compiled and optimised against a metric”.

🚫 When not to use a framework

A framework is not free. You pay in dependencies, abstraction layers, performance overhead and the risk that a future breaking change ripples through your code. There are real cases where calling the provider SDK (or plain HTTP) is the better choice:

Single-shot prompts or thin wrappers. If your feature is “send a prompt, show the answer”, the OpenAI, Anthropic or Gemini SDK plus a few lines of code will be shorter, faster and easier to debug than any framework.
Simple chatbots with short memory. You can manage a message list and a system prompt in a few dozen lines; frameworks become necessary when tool use, retrieval or multi-step planning enter the picture.
Prototyping with unusual providers or models. When you are experimenting with a new endpoint that lacks framework support, raw HTTP often wins on velocity.
Latency-critical code paths. Heavy orchestration layers can add noticeable overhead in streaming chat; measure before committing.
You already have strong internal libraries. Some mature teams have grown their own mini-framework that fits their codebase and domain better than anything off-the-shelf.

A useful heuristic: start with the vendor SDK. Move to a framework the first time you need retrieval, typed output, a tool loop, or a provider swap.

❓ Frequently asked questions

An LLM framework is a library or toolkit that sits between your application code and a large language model. It handles prompt construction, tool calling, memory, structured output, retrieval, evaluation and multi-step agent loops so you do not have to write all of that plumbing against a raw HTTP API.

No. For single-shot prompts, simple chatbots or thin wrappers, calling the provider SDK directly is often faster and easier to debug. Frameworks pay off once you need agents with tools, retrieval over your own data, structured outputs, routing across providers or repeatable evaluations.

They overlap heavily in 2026 but have different centres of gravity. LangChain and LangGraph focus on general orchestration, agents and tool use; LlamaIndex is optimised for ingesting, indexing and querying your own documents (RAG). Many teams use both in the same stack.

There is no single winner. OpenAI Agents SDK and Claude Agent SDK are the most tightly integrated with their respective models; LangGraph and AutoGen target complex multi-step or multi-agent flows; CrewAI and Mastra prioritise developer ergonomics; smolagents and PydanticAI keep the surface area small.

Most are open source under permissive licences (MIT, Apache 2.0) and free to self-host. Some vendors offer paid hosted platforms on top — for example LangSmith for LangChain, LlamaCloud for LlamaIndex, or Mastra Cloud — but the core framework is typically free.

Yes, and it is common. A typical 2026 stack might use LiteLLM for provider routing, Instructor or BAML for structured output, LlamaIndex for retrieval, and LangGraph or an Agent SDK for the control loop. Framework-agnostic pieces like Guardrails and LiteLLM are specifically designed to slot in alongside others.

Disclaimer: Licences, maturity labels and feature descriptions reflect the state of each project at the time of writing (April 2026) and are based on publicly available documentation. LLM frameworks iterate quickly; always check the upstream docs before committing to one. AIHumanLove.com is not affiliated with any of the listed projects or vendors and does not receive compensation for inclusion on this page.

🧩 LLM Frameworks in 2026

📘 What LLM frameworks are (and why they matter)

🗺️ The 2026 landscape at a glance

📊 Comparison table — every framework at a glance

🤖 Agent frameworks

📚 RAG & data frameworks

🧵 Orchestration & general-purpose

🧱 Typed output & structured generation

⚡ Lightweight / single-purpose

🎯 Decision matrix — use X if you need Y

Chat with your documents

Build a TypeScript web app

Multi-step autonomous agents

Stay close to the model vendor

Strict typed JSON out

One provider API, many models

Enterprise .NET stack

Small, minimal agents

Tighten safety and validation

Optimise prompts systematically

🚫 When not to use a framework

❓ Frequently asked questions

🔗 Related on AIHumanLove

Directories

Deeper reading on the blog