Inside Claude Code: 9 Lessons from the New Source-Level Architecture Paper

What the paper actually is
1. The loop is tiny — the harness is everything
2. Five values, thirteen principles
3. Deny-first because users say yes 93% of the time
4. Context as a five-stage pipeline, not one knob
5. Four extension surfaces, not one mega-API
6. Subagents are isolation, not parallelism
7. Append-only state — auditability over query power
8. Three patterns that recur everywhere
9. Six open questions for whoever builds the next one
Credits and source

By AIHumanLove Editorial · Published 27 April 2026 · A digest of arXiv:2604.14228 by Liu, Zhao, Shang and Shen

What the paper actually is

On 14 April 2026, four researchers — Jiacheng Liu, Xiaohan Zhao, Xinyi Shang and Zhiqiang Shen, with corresponding authorship from MBZUAI's Zhiqiang Shen — uploaded a 70-page tour of Claude Code's architecture to arXiv. They had no internal access. They worked from the publicly distributed TypeScript build (v2.1.88), traced files like query.ts, permissions.ts and tools.ts, and cross-checked their reading against an independent open-source agent called OpenClaw. The result is the first end-to-end source-level account of how a production coding agent is actually wired together.

We've read it so you don't have to. The lessons below are the parts that should change how you think about building your own agent harness. The work belongs to the authors; if any of this lands, please cite the original.

Paper at a glance. Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems. arXiv:2604.14228 [cs.SE], submitted 14 April 2026. Source code referenced throughout: VILA-Lab/Dive-into-Claude-Code. Licensed CC BY-NC-SA 4.0.

The loop is tiny — the harness is everything

The thing every tutorial calls "the agent" is, in Claude Code, a while-true cycle: call the model, parse any tool_use blocks it returned, route them through the permission system, dispatch them to tools, append the result, repeat. That's it. Community analysis quoted in the paper estimates the model-decision-logic portion of the codebase at around 1.6%; the other 98.4% is the operational infrastructure around the loop — context assembly, permissions, tool routing, sandboxing, recovery. The single queryLoop() function in query.ts runs whether the user is on the interactive CLI, the headless claude -p path, the Agent SDK, or an IDE integration. Only rendering changes.

If you take one thing from the paper, take this: the agent is not a model with a prompt. It is a deterministic harness inside which a model is allowed to make decisions. The model never directly accesses the filesystem, runs shell commands, or makes network requests. Its only interface to the outside world is the structured tool_use protocol, which the harness validates before execution. A compromised or adversarially manipulated model cannot override the rules implemented in the harness, because reasoning and enforcement live in different code paths.

Five values, thirteen principles

The authors don't describe the architecture as a feature list. They identify five human values — Human Decision Authority, Safety, Security and Privacy, Reliable Execution, Capability Amplification, Contextual Adaptability — and trace them through thirteen design principles, each of which answers a recurring question that any production coding agent must resolve. A handful of those principles will be familiar to anyone who has lost an afternoon to an unruly agent:

Deny-first with human escalation — unrecognised actions are denied or escalated, never allowed silently.
Defense in depth with layered mechanisms — multiple independent safety layers, each able to block on its own.
Context as a scarce resource with progressive management — treat the context window as the binding constraint and manage it in stages.
Reversibility-weighted risk assessment — lighter oversight for read-only and reversible actions, heavier for destructive ones.
Append-only durable state — mutate as little as possible; preserve audit trails.
Model judgment within a deterministic harness — trust the model's reasoning, but only inside guardrails the harness can enforce.
Minimal scaffolding, maximal operational harness — invest in tool routing and recovery, not explicit planning graphs.

Cataloguing them makes the trade-offs legible. If your harness embeds different principles — "explicit planning graphs" instead of "minimal scaffolding", or "container isolation" instead of "deny-first evaluation" — you'll get LangGraph, SWE-Agent or Aider rather than Claude Code, and that is a deliberate choice rather than an accident.

Deny-first because users say yes 93% of the time

The most useful empirical fact in the paper is borrowed from Anthropic's own auto-mode threat model: users approve about 93% of permission prompts. That number is not a victory. It means interactive confirmation is unreliable as a sole safety mechanism, because once habituated, humans rubber-stamp.

The architectural response is layered. Blanket-deny pre-filtering strips forbidden tools out of the model's view at tool-pool assembly time — the model never sees tools it cannot run, so it never wastes a turn proposing them. Deny rules outrank ask rules outrank allow rules, even when the allow rule is more specific. An optional ML classifier in auto-mode races against the user dialog when BASH_CLASSIFIER is enabled, approving high-confidence safe actions instantly and escalating the rest. Underneath all of it sits an optional shell sandbox. The seven permission modes documented in the paper (plan, default, acceptEdits, auto, dontAsk, bypassPermissions, plus the internal bubble used for subagent escalation) are points on a graduated trust spectrum, not toggles.

One subtle detail worth lifting out: when the classifier or a deny rule blocks an action, the system treats the denial as a routing signal rather than a hard stop. The model receives the denial reason, revises its approach, and tries something safer next iteration. Permission enforcement shapes behaviour rather than halting it.

Context as a five-stage pipeline, not one knob

Long-context models do not eliminate the context problem; they move it. Claude Code runs five distinct compactors before every model call, in increasing order of cost:

Budget reduction — trims oversized tool outputs in place, replacing them with content references.
Snip — lightweight history trimming gated by HISTORY_SNIP.
Microcompact — fine-grained compression with an optional cache-aware path that defers boundary decisions until after the API response so it can use real cache_deleted_input_tokens rather than estimates.
Context collapse — a read-time projection over the conversation, so the model sees the collapsed view while the full history remains intact for resume.
Auto-compact — a model-generated summary that fires only when everything else has failed.

Each layer addresses a different pressure type. Cheap layers run before expensive ones. Lazy-loaded CLAUDE.md, deferred tool schemas, and summary-only subagent returns are additional context-cost dampeners around the loop. If your harness has one auto-summary mechanism, it has the wrong number.

Four extension surfaces, not one mega-API

MCP servers, plugins, skills, and hooks. Each plugs in at a different injection point and at a different context cost.

MCP is the external tool integration path, with stdio, SSE, HTTP, WebSocket, SDK and IDE-specific transports.
Plugins package multiple component types into one distribution unit — commands, agents, skills, hooks, MCP and LSP servers, output styles, channels, settings, user config.
Skills are SKILL.md files with YAML frontmatter; their instructions only inject into context when invoked, so they're cheap when idle.
Hooks fire at one of 27 documented events — PreToolUse, PostCompact, SessionStart, FileChanged, SubagentStop and so on — with four persistable command types (shell, prompt, http, agent) and a runtime callback variant for SDK use.

The deliberate refusal to ship one unified extension API is itself a design choice. Extension is layered so that the low-cost mechanisms are favoured, and the expensive ones (always-on tools, large MCP responses) opt-in.

Subagents are isolation, not parallelism

The mistake most agent builders make with subagents is treating them as workers in a pool. The paper makes the alternative reading explicit. Subagents in Claude Code create new, isolated context windows; they don't share the parent's permission state or transcript; they return only summary text to the parent. Worktree-based isolation lets a subagent run in a literal git worktree separate from the user's working tree. The point is not throughput — it is to bound the blast radius of a delegated task. If you wanted parallelism, you would build it differently.

Append-only state — auditability over query power

Session transcripts are append-only JSONL files with read-time chain patching. Permissions are deliberately not restored across resume — the authors call this out as a safety choice rather than an oversight. Even compaction is implemented, where possible, as a read-time projection over the full history rather than a destructive edit.

The trade-off is real and explicit: structured queries like "every tool call across all sessions that touched file X" require post-hoc reconstruction. The authors connect this commitment back to Human Decision Authority — humans can audit because nothing is overwritten.

Three patterns recur everywhere

When the authors zoom out, three commitments show up in every subsystem they analyse:

Graduated layering over monolithic mechanisms. The permission system has seven stages; context management has five compactors; extensibility has four mechanisms. Each layer is independently auditable.
Append-only over query power — everywhere. Logs, transcripts, memory, hook outputs.
Model judgment within a deterministic harness. The harness creates the conditions in which the model can decide well, then gets out of the way.

These three are useful checkpoints when reviewing your own design. If a subsystem is monolithic, mutates state, or constrains the model's choices procedurally, you have probably picked a different point on the design space than Claude Code did. That can be the right call — but worth doing on purpose.

Six open questions for whoever builds the next one

The paper's final section is the most generous. It names six open directions, each with citations enough to send a research team off for a quarter:

The observability–evaluation gap. Industry surveys show roughly 89% of teams ship observability but only 52% offline evaluation, and Bessemer estimates that 78% of AI failures are invisible. Closing the gap likely requires harness-layer scaffolding (generator–evaluator separation, sprint contracts, post-hoc checks) rather than model improvements.
Cross-session persistence. What sits between static CLAUDE.md instructions and the per-session JSONL transcript? Auto-curated playbooks of strategies learned from past sessions are the natural next layer.
Harness-boundary evolution along where, when, what, and with-whom. Managed Agents virtualise session/harness/sandbox; KAIROS-style proactive ticks change when the agent acts; vision-language-action work changes what it acts on; multi-agent topologies change with whom.
Horizon scaling — from session to scientific programme. Whether the same primitives that work for a turn and a session continue to work over weeks of autonomous research.
Governance. The EU AI Act becomes fully applicable in August 2026, and only 13.3% of indexed agentic systems publish agent-specific safety cards.
The evaluative lens. The paper's own concern: short-term capability amplification may erode long-term human capability. Anthropic's own internal study found a 17% comprehension drop in AI-assisted developers. Worth keeping a column on the design dashboard for.

These are the parts of the agent stack most likely to look completely different in 18 months.

Credits and source

This article is a digest. The work belongs to the authors. The full reference is:

Jiacheng Liu, Xiaohan Zhao, Xinyi Shang, Zhiqiang Shen. "Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems." arXiv:2604.14228 [cs.SE], 14 April 2026. Code: github.com/VILA-Lab/Dive-into-Claude-Code. CC BY-NC-SA 4.0. Corresponding author: Zhiqiang Shen (MBZUAI).

If you only have time for parts of the original, head for Section 2 (the values-and-principles table), Section 4 (the compaction pipeline) and Section 11 (the cross-cutting design tensions). The paper is the source of truth; this post is just a reading guide.

Related reading on this site: What is MCP?, The best MCP servers in 2026, Cursor vs Windsurf vs Claude Code.

Related course. Coding agents like Claude Code rely on the same conventions that make whole websites legible to AI. The free AI SEO Foundations course covers AGENTS.md, llms.txt, schema markup, and the agent-readability frameworks — useful background for anyone shipping or instrumenting an agent.

← Back to Blog ← Back to Blog →

Inside Claude Code: 9 Lessons from the New Source-Level Architecture Paper

What the paper actually is

The loop is tiny — the harness is everything

Five values, thirteen principles

Deny-first because users say yes 93% of the time

Context as a five-stage pipeline, not one knob

Four extension surfaces, not one mega-API

Subagents are isolation, not parallelism

Append-only state — auditability over query power

Three patterns recur everywhere

Six open questions for whoever builds the next one

Credits and source

Related articles