What is Hermes Agent?

Hermes Agent is an open-source autonomous AI agent built by Nous Research, released in February 2026 under the MIT licence. Unlike coding copilots that live inside an IDE or chatbot wrappers around a single API, Hermes is a self-improving agent that runs on your own infrastructure, persists memory across sessions, and accumulates capability the longer it runs. It reached 135,000 GitHub stars in under three months, making it one of the fastest-growing open-source agent frameworks of the year.

The agent runs on Linux, macOS, and WSL2 with no prerequisites — the installer handles everything via a single curl command from the terminal. The interface is a TUI (terminal user interface), meaning it operates entirely from the command line. Once running, Hermes connects to whichever AI model provider you select, and all your conversations and task history are stored locally.

The feature that separates Hermes from simpler agents is /goal. Giving Hermes a goal rather than a single task tells it to work autonomously until that goal is achieved — cycling through tool calls, checking its own output, and continuing without prompting. This loop can run for hours and work through thousands of tool calls, making it well-suited to research, code generation, and multi-step file tasks that would normally require you to stay at the keyboard.

What is MiniMax M3?

MiniMax M3, launched on 1 June 2026, is an open-weights frontier model built on a proprietary architecture called MiniMax Sparse Attention (MSA). The core idea behind MSA is straightforward: instead of attending to every token in the context window, the model selects the most relevant blocks of its key-value cache for each query and skips the rest. This means that a 1-million-token context does not cost 1-million-token-worth of compute — it costs roughly one-twentieth of that, because most of the context is skipped most of the time.

In practice this produces two effects. First, the model is fast: MiniMax reports more than 9× speedup in the prefilling stage and more than 15× in decoding at long context lengths, compared to a standard attention model. Second, 1-million-token context — enough to hold an entire large codebase or a book-length research brief — is available at a price point that would be uneconomical with conventional attention.

On benchmarks published by MiniMax, M3 scores 59.0% on SWE-Bench Pro and 83.5% on BrowseComp, outperforming GPT-5.5 and Gemini 3.1 Pro on coding tasks and exceeding Claude Opus 4.7 on SVG generation and browsing benchmarks. These figures come from MiniMax's own published results; independent third-party verification is still catching up with a model released only weeks ago.

Note on sponsored coverage. MiniMax M3 has been featured in a number of YouTube demonstrations that were sponsored by MiniMax. The technical facts about the model are independently verifiable from MiniMax's research blog and from third-party benchmark aggregators. The pricing and subscription plans listed in this article are drawn from those primary sources.

Installing Hermes and connecting MiniMax

The full install takes one terminal command. Open a terminal (or WSL2 on Windows) and run the one-line installer from the Hermes GitHub repository:

curl -fsSL https://hermes-agent.org/install.sh | sh

The installer downloads Hermes and its dependencies automatically. Once complete, launch Hermes from the terminal. The first-run setup asks you to select a model provider — choose MiniMax from the list.

To get your MiniMax API key, visit minimax.io, create an account, and navigate to the API console. Copy the key and paste it when Hermes prompts for it. Then select MiniMax M3 as the model. From this point, Hermes uses M3 as its underlying intelligence for all tasks.

First task. Try a simple /goal to test the setup: type /goal Write a summary of the last 5 files in this directory and save it as summary.md. Hermes should list the files, read them, write the summary, and confirm — all without further prompting.

What you can actually do

With a 1-million-token context and an autonomous goal loop, the combination opens up tasks that are impractical with smaller context windows or models that require constant steering.

Deep research. Use /goal to ask Hermes to research a topic, gather sources, and produce a structured report. The 1M context means the agent can hold its entire working set — notes, sources, drafts — in memory without losing earlier material. This kind of task has previously required paid research tools or bespoke agent pipelines.

Codebase-scale work. Hermes can read an entire project, understand dependencies across files, implement a feature, write tests, and iterate — all as a single goal. With a large enough context, the agent does not need to paginate through the code or lose track of earlier findings.

Creative generation. SVG animations, design mockups, and multi-file creative work are tasks where MiniMax M3's performance on SVG benchmarks shows up in practice. These tasks complete quickly and at very low cost per run.

Multi-step automation. Because Hermes persists memory between sessions and supports multi-channel integration (Telegram, Discord, Slack, and others), it can be left to monitor a task and report back — rather than requiring you to supervise each step.

How much does it actually cost?

MiniMax M3 is priced at $0.60 per million input tokens and $2.40 per million output tokens on the standard API tier, listed on OpenRouter and the MiniMax API console. At launch MiniMax offered a 50% promotional discount, bringing effective prices to roughly $0.30 input and $1.20 output per million tokens — though promotional rates change, and you should verify current pricing before budgeting a project.

TierInput (per 1M tokens)Output (per 1M tokens)Context ceiling
Standard API$0.60$2.401M tokens
Launch discount (50%)~$0.30~$1.201M tokens
Long-context tier (>512K tokens)$1.20$4.801M tokens

To put the standard pricing in context: a research task that reads 50,000 tokens of source material and generates a 5,000-token report would cost roughly $0.04 at standard rates. A longer autonomous session working through a mid-size codebase — say, 200,000 tokens in and 20,000 tokens out — would cost around $0.17. These numbers are illustrative; actual costs depend heavily on how long the goal loop runs and how much the model generates.

Hermes itself has no per-query cost — it is open source and runs locally. The only cost is the API usage for whichever model you connect.

Open Code: a bonus connection

If you already use a CLI coding tool, it is worth knowing that Open Code — an open-source alternative to commercial coding agents — also supports MiniMax M3 as a backend. The connection is made with a single command inside Open Code:

/connect minimax

This gives you a similar workflow to proprietary coding agents, powered by the same M3 model and at the same API rates. Whether you prefer Hermes's goal-based autonomous loop or Open Code's file-edit focused approach depends on the task; both are compatible with the same MiniMax API key.