Claude Code Nested Sub-Agents: 5 Levels Deep, Token Math, 3 Pitfalls (2026)
For most of 2026, “subagents cannot spawn other subagents” was a hard rule in the Claude Code docs — a deliberate guardrail against infinite recursion. That rule quietly broke on June 10, when v2.1.172 shipped one line of changelog: sub-agents can now nest, up to 5 levels deep.
The capability is new enough that the official sub-agents reference page still says nesting is impossible. If you’re already comfortable with single-level delegation, the upgrade is small — but the token math, the configuration surface, and the failure modes are all different from what your habits expect. This guide is what you needed to read before the first nested run you regret.
What You Can Do With Nested Sub-Agents (And What You Can’t)
| You Can | You Can’t |
|---|---|
| Spawn sub-agents from within a sub-agent, up to 5 levels deep | Skip configuring tools, model, or maxTurns — the defaults inherit everything from the parent |
Route each level to a different model (opus → sonnet → haiku) | Share context between nested sub-agents — each runs in an isolated 200K window |
Restrict which children a sub-agent can spawn with Agent(name1, name2) | Stop a parent without also stopping every descendant it has alive |
Define inline mcpServers and hooks that only run during one sub-agent’s lifetime | Use EnterPlanMode or AskUserQuestion inside any nested sub-agent (those tools require main-thread state) |
Set permissions.deny: ["Agent(name)"] to block specific sub-agents globally | Rely on the official sub-agents docs page — it hasn’t been updated to reflect 2.1.172 yet |
The 30-second mental model: nested sub-agents are recursion with a 5-frame stack limit, each frame carrying its own system prompt and model. The parent reads only the leaf’s summary. Everything in between costs tokens and disappears.
Decision Frame: When to Nest (and When NOT)
When to use nested sub-agents
- The work is a tree, not a list — a top-level reviewer that needs to ask three specialised sub-reviewers (security, perf, style) the same question against the same diff, where each specialist has its own MCP server or skill preload.
- The work is recursive in shape — exploring a monorepo where each
packages/*directory needs its own scoped exploration with its own conventions, and the leaf summaries roll up. - You’re hitting context-window pressure at level 1 — single-level sub-agents are returning 30K-token reports that swamp the main conversation, and pushing one more layer down would let each leaf summarise before sending up.
When NOT to use nested sub-agents
- The work is a sequence — five steps that depend on each other belong in a single sub-agent with
maxTurns: 20, not five nested calls. Each layer adds an irreversible context split and a system-prompt write. - You want parallel workers that talk to each other — that’s agent teams, not nesting. Nested sub-agents communicate only up and down, never sideways.
- The leaf would only call one tool — at that point you’re paying a full sub-agent system prompt for one Read or Grep. Just call the tool from the parent.
Stop rule
Before nesting, write the leaf summary you expect on a sticky note. If it’s under ~500 tokens or the parent could have produced it directly with two tool calls, don’t nest. Nesting earns its keep when the leaf does meaningful work and returns a meaningfully compressed answer.
System Requirements
| Requirement | Verify with |
|---|---|
| Claude Code v2.1.172 or newer | claude --version |
Agent tool present in the parent sub-agent’s tools (allowlist) | cat .claude/agents/<parent>.md | grep -i agent |
| Anthropic API access OR an Anthropic-compatible gateway like ofox.ai | echo $ANTHROPIC_BASE_URL |
At least one child sub-agent definition in .claude/agents/ or ~/.claude/agents/ | ls -1 .claude/agents/ ~/.claude/agents/ 2>/dev/null |
Upgrade with npm i -g @anthropic-ai/claude-code if you’re on 2.1.171 or below. The CLI prints the resolved version on launch; older versions silently ignore nesting calls and treat the inner Agent tool use as a regular Read.
Step-by-Step: Set Up a Nested Sub-Agent Chain
The example below builds a three-level chain: a top-level triage-lead that classifies an incoming bug report, delegates to a repro-runner to confirm reproducibility, which in turn delegates to a log-summariser to compress raw container logs into a 200-line summary.
Step 1: Define the leaf
Create .claude/agents/log-summariser.md:
---
name: log-summariser
description: Reads raw container logs and returns a structured summary of errors, timing, and suspicious patterns.
tools: Read, Grep
model: haiku
maxTurns: 8
---
You are a log summariser. Given a path to a log file, return:
1. Top 5 distinct error signatures with frequency
2. Earliest and latest timestamps observed
3. Any panic, OOM, or segfault entries with line numbers
Return strictly Markdown — no preamble.
Expected result: the file is picked up immediately if you create it via /agents, or on next session start if you edit it on disk.
Step 2: Define the mid-tier
Create .claude/agents/repro-runner.md:
---
name: repro-runner
description: Reproduces a bug from a description, captures logs, and returns whether the failure is deterministic.
tools: Agent(log-summariser), Read, Bash
model: sonnet
maxTurns: 12
---
You reproduce bugs. Run the failing command, capture logs to /tmp/repro.log,
then call the log-summariser sub-agent to compress the output before returning.
Report: deterministic? yes/no, plus the summary.
The tools: Agent(log-summariser) field is the new piece. Pre-2.1.172, Agent(...) had no effect inside a sub-agent definition. It’s now the allowlist for what this sub-agent is permitted to spawn.
Step 3: Define the root
Create .claude/agents/triage-lead.md:
---
name: triage-lead
description: Triages incoming bug reports end-to-end — classification, reproduction, log review.
tools: Agent(repro-runner), Read, Grep, Bash
model: opus
maxTurns: 15
---
Classify the bug (P0/P1/P2), delegate reproduction to repro-runner, then
return a triage memo with severity, repro status, and the summarised logs.
The chain is now: main session → triage-lead (Opus) → repro-runner (Sonnet) → log-summariser (Haiku). Three levels.
Step 4: Invoke from the main session
> Use the triage-lead agent on the bug report at ./bugs/2026-06-11-checkout-500.md
Claude reads the description fields and routes through the chain automatically. You can also force the delegation explicitly: Use triage-lead. Each handoff appears as a separate panel in the /agents view, with depth indicators showing the nesting.
Step 5: Verify the nesting actually happened
After the run completes, check /agents (the Running tab) or scroll the transcript for three distinct sub-agent panels. If you only see one, the parent treated the Agent(...) call as a regular tool invocation — almost always a version mismatch. Re-run claude --version and confirm 2.1.172+.
/agents
> [Running]
> triage-lead opus depth 1 ✓ completed
> repro-runner sonnet depth 2 ✓ completed
> log-summariser haiku depth 3 ✓ completed
Depth 3 is the floor for “this worked.” The 5-level cap is a stack limit, not a target — most useful chains live at depth 2-3.
Common Errors When Nesting Sub-Agents (and Fixes)
| Symptom | Likely cause | Quick fix |
|---|---|---|
Agent tool absent from sub-agent’s available tools | Pre-2.1.172 behavior, or Agent omitted from the parent’s tools field | Upgrade Claude Code, then add Agent(child-name) to the parent’s tools |
| Sub-agent spawns child but child has no MCP servers the parent had | MCP servers are not inherited through nesting — each sub-agent reconnects per its own frontmatter | Add the server name to the child’s mcpServers field, or hoist to user-level settings.json |
Agent(x) ignored even after upgrade | Restriction applied in a sub-agent definition referenced from a plugin (security policy strips permissionMode/hooks/mcpServers from plugin agents) | Copy the agent file into .claude/agents/ or ~/.claude/agents/ instead of leaving it in the plugin |
| Nested sub-agent reaches depth 6 and stops with no clear error | 5-level recursion cap reached; the 6th-level call fails silently in some transcripts | Refactor: either flatten one level or split the work across two top-level invocations |
| Parent reports “completed” but child agent stays in “active” state | Bug fixed in 2.1.172 for the most common variant; can still appear with background sub-agents | Update to 2.1.172+, or kill the child via /agents Running tab and re-run |
| Token bill spikes 7-12× without a visible behaviour change | Each nested level adds a system-prompt write; default model inherits the main session’s (Opus for many users) | Set CLAUDE_CODE_SUBAGENT_MODEL=haiku to force Haiku on every sub-agent that doesn’t override, then promote per-agent |
The third row is the one that bites teams. The v2.1.172 changelog explicitly notes the fix for background sub-agents stuck as active after a nested child was stopped — if you’re resuming a session that was started on an older version, start fresh once to clear any inherited state.
Token Math: Why Nesting Costs ~7× Per Branch
This is the calculation that makes or breaks the technique. A single-thread Claude Code session reuses one cached system prompt across turns. A sub-agent runs in its own context window with its own system prompt (smaller than the main session’s, but not free). Nest that, and you pay the same overhead at each depth.
| Depth | What gets written | What’s reusable | Approx. token overhead per call |
|---|---|---|---|
| 0 (main) | Main system prompt + tools + CLAUDE.md | Yes (cached across turns) | Baseline |
| 1 (sub-agent) | Sub-agent system prompt + tool list + task description | Partial — tool list cached within branch | +30-60% of baseline |
| 2 (nested) | Inner sub-agent prompt + tool list + task description from parent’s summary | Lower — each child writes a fresh prefix | +30-60% on top of depth 1 |
| 3-5 (deep nesting) | Same shape, cumulative | Vanishing — each leaf’s prefix is short-lived | Cumulative ~5-7× single-thread |
The “~7× token consumption” figure in the official cost-management guide is the canonical worst case for sub-agent-heavy workflows. Nesting doesn’t multiply that number unboundedly — leaf calls do less work and use cheaper models — but it does push you toward it whenever the tree is wide.
Two levers fight that math:
- Model tiering by depth. The leaf rarely needs Opus. A typical breakdown for a triage chain: Opus at root, Sonnet at level 1, Haiku from level 2 onward. Pricing tiers between Opus and Haiku run roughly an order of magnitude apart on input tokens, so the savings on the leaf tier compound across every nested call.
maxTurnsper layer. Without it, a leaf can churn for 30+ turns trying to “be helpful.” Cap leaves at 8 turns and mid-tiers at 12; mostly you’ll never hit the cap, and when you do it’s a signal that the task was wrong.
For the deeper cost model on Claude Code session economics, see the Claude Code token optimization guide. Its cache_read_input_tokens analysis applies per sub-agent branch, not just to the main thread.
3 Anti-Patterns That Blow Your Budget
Anti-Pattern 1: Runaway nesting via general-purpose recursion
The path of least resistance is to give every sub-agent tools: Agent (the unrestricted allowlist) and let the model figure out who to spawn. Within hours you’ll see a transcript where the triage agent spawned a “general researcher” that spawned an “investigation specialist” that spawned a “log reader” that spawned… a general researcher again. Each layer adds tokens. The depth cap of 5 saves you from infinite loops but does nothing to save you from the bill.
Fix: Always use explicit allowlists. tools: Agent(repro-runner, log-summariser) is the contract between a sub-agent and its children. If the model can’t accomplish the task with those children, the task description was wrong — fix that, don’t widen the allowlist.
Anti-Pattern 2: Opus everywhere
Defaulting every sub-agent to the main conversation’s model is the default, and it’s exactly wrong for a multi-level tree. Opus at the root is justified (you’re doing the synthesis). Opus at a leaf that’s reading a log file and counting error frequencies is paying the top-tier rate for work that Haiku handles at a fraction of the cost.
Fix: Set CLAUDE_CODE_SUBAGENT_MODEL=haiku at the shell level so every sub-agent that doesn’t override gets Haiku. Then promote only the ones that need more — typically just root and depth-1. The Claude Code hybrid routing pattern walks through how to think about per-task model selection.
Anti-Pattern 3: Untyped Agent allowlists in production agent files
Project sub-agent files are checked into version control. tools: Agent without parentheses works in development and looks deceptively forgiving — but a teammate adding a new migration-runner sub-agent next month will silently grant the existing triage chain the ability to spawn it, because the allowlist accepts everything.
Fix: Anchor the allowlist explicitly in the file: tools: Agent(repro-runner, log-summariser), Read, Grep, Bash. Treat any change to that list as a security review, the same way you’d treat changing permissions.allow. The Claude Code safety guide covers the broader principle: every permission-adjacent field is a security boundary.
Team / Multi-Developer Configuration
For teams running shared Claude Code setups across many repositories, three conventions keep nested sub-agents cheap, predictable, and reviewable:
| Convention | Where to set it | Why it matters |
|---|---|---|
CLAUDE_CODE_SUBAGENT_MODEL=haiku in shared shell rc | Per-developer dotfiles | Eliminates the “Opus everywhere” leak by default; teams that need Sonnet/Opus override per file |
All shared sub-agents committed under .claude/agents/ with explicit Agent(name1, name2) allowlists | Per-repo | Code review surfaces nesting changes; new sub-agents don’t get auto-granted |
permissions.deny: ["Agent(general-purpose)"] for sub-agent-heavy projects | .claude/settings.json | Forces specialised delegation; prevents the recursion anti-pattern |
For organisations standardising across many repos, also consider a dotfiles-managed ~/.claude/agents/ directory containing the cross-project agents (review, security, doc-summariser). User-level definitions win against project-level only when project doesn’t redeclare the same name, so the convention is “shared in user scope, repo-specific in project scope.”
Advanced: Routing Different Levels to Different Models via ofox
The CLAUDE_CODE_SUBAGENT_MODEL environment variable applies to every sub-agent that doesn’t set its own model. When pointed at an Anthropic-compatible gateway, that one variable can swap your entire leaf tier from one model alias to another without touching any sub-agent file. The integration is one environment variable plus a model alias:
export ANTHROPIC_BASE_URL=https://api.ofox.ai/anthropic
export ANTHROPIC_API_KEY=<your_ofox_key>
export CLAUDE_CODE_SUBAGENT_MODEL=haiku
claude
Sub-agents at any depth that don’t explicitly override the model field now route through ofox to Haiku. The root agent and any sub-agent that explicitly sets model: opus or model: sonnet still routes through ofox to the corresponding tier — CLAUDE_CODE_SUBAGENT_MODEL is a default, not an override. The Claude Code ofox configuration guide covers the full setup, including how cache_control headers pass through unchanged for prompt caching savings on the parent.
ofox accepts both the bare Claude Code aliases (opus, sonnet, haiku) and the explicit provider-prefixed IDs (anthropic/claude-opus-4.8, anthropic/claude-sonnet-4.6, anthropic/claude-haiku-4.5) that you’d reference in a direct integration. Aliases keep your sub-agent frontmatter portable across direct-Anthropic and gateway routing; explicit IDs pin you to a specific version. The model marketplace at ofox.ai/models lists the current Claude line-up so you can size the tiered tree against actual bill impact before committing.
For very long nested runs, the 5-minute default prompt-cache TTL becomes the next constraint. Setting ENABLE_PROMPT_CACHING_1H=1 extends it to one hour at 2× the cache-write cost — typically worth it once a single triage chain crosses 15 minutes of wall-clock time.
Sources Checked for This Refresh
- Claude Code CHANGELOG v2.1.172 (verified 2026-06-11): github.com/anthropics/claude-code/blob/main/CHANGELOG.md — confirms “Sub-agents can now spawn their own sub-agents (up to 5 levels deep)” and the related bug fix for stuck background sub-agents
- Claude Code sub-agents reference (verified 2026-06-11): code.claude.com/docs/en/sub-agents — full frontmatter schema (
tools,model,maxTurns,mcpServers,hooks,permissionMode); note the page still claims nesting is impossible, which contradicts the 2.1.172 changelog - Claude Code cost management guide (verified 2026-06-11): code.claude.com/docs/en/costs — source of the “~7× tokens for sub-agent-heavy workflows” figure
- Claude Code changelog mirror (verified 2026-06-11): code.claude.com/docs/en/changelog — secondary confirmation of 2.1.172 entry
- ofox model gallery snapshot (verified 2026-06-11): ofox.ai/models — current
anthropic/claude-opus-4.8,anthropic/claude-sonnet-4.6,anthropic/claude-haiku-4.5listings
The headline is “5 levels deep,” but the useful number is “depth 2.” Most nested chains earn their keep at root-Opus, mid-Sonnet, leaf-Haiku — three layers, three model tiers, one explicit Agent(name) allowlist at each handoff. Skip the allowlist or the tiering, and the 5-level cap becomes the wallet’s last line of defence rather than the first.


