How to Delegate Claude Code Tasks to Mistral Vibe — Save 2-4x on Tokens
TL;DR: Mistral Vibe (Mistral’s open-source coding CLI, running Mistral Medium 3.5 at $1.50/$7.50 per million tokens) is roughly 3.3x cheaper than Claude Opus 4.7 ($5/$25). You don’t have to choose between them — Claude Code can spawn Vibe as a subagent via the Bash tool, keeping Opus 4.7 for planning and review while Vibe handles refactors, file scans, and bulk edits. A 30-line config file gets you there.
For most of 2025 the agentic-CLI debate was “which one is best.” In 2026 the better question is “which one does each job best.” Claude Code’s subagent system lets you answer that pragmatically: keep the expensive model where reasoning matters, route the grunt work somewhere cheaper. Mistral Vibe is a particularly good worker because its non-interactive --prompt mode behaves like a function call — give it a task, get back code, no orchestration overhead.
Why this pattern exists
Claude Opus 4.7 is expensive because most of what you ask it to do doesn’t need Opus. Anthropic’s flagship runs at $5 per million input tokens and $25 per million output (Anthropic pricing page), and a single agentic session that reads 20-30 files easily burns 100K+ tokens before the model writes a line of code. The token bill is dominated by exploration and bulk edits, not by the moments where Opus actually earns its keep.
Mistral Medium 3.5 — the default model in Mistral Vibe since April 29, 2026 — costs $1.50/$7.50 per million tokens and scores 77.6% on SWE-Bench Verified (Mistral announcement). It’s not as strong as Opus on novel reasoning, but for “rename this symbol in 14 files,” “add error handling to these three functions,” or “extract this prop into a config object,” it’s indistinguishable.
The delegation pattern lets you keep Opus for the decisions and hand the mechanics to Vibe. If you’re skeptical of mixing CLIs, the alternative — hybrid routing inside Claude Code itself — is a closer-coupled approach worth comparing.
What Mistral Vibe actually is
Mistral Vibe is a terminal-based coding agent that ships as a single Python CLI with built-in subagent support, MCP integration, and a non-interactive prompt mode. The install is one line:
curl -LsSf https://mistral.ai/vibe/install.sh | bash
export MISTRAL_API_KEY=...
Config (optional — Vibe ships with defaults) lives at ~/.vibe/config.toml. A minimal version using only documented keys:
active_model = "mistral-medium-3-5"
enable_auto_update = true
enable_telemetry = false
The flag you care about for delegation is --prompt, which runs Vibe one-shot and prints the result:
vibe --prompt "refactor src/utils/date.ts to use date-fns instead of moment"
That command is the entire integration surface. Any orchestrator that can shell out — Claude Code, a Makefile, a CI job — can call it.
The Claude Code half: defining a subagent
Claude Code routes work to subagents by reading the description field in each agent definition under .claude/agents/ (Claude Code subagents docs). To make Opus 4.7 delegate to Vibe, you write one Markdown file with a Bash-only tool scope and a description that tells Opus when this worker is the right pick.
Create .claude/agents/vibe-worker.md:
---
name: vibe-worker
description: Use for mechanical code changes where reasoning is shallow — renames, refactors across many files, adding error handling, extracting helpers, format/lint cleanup. Do NOT use for architectural decisions or novel logic.
tools: Bash
model: sonnet
---
You are a delegation wrapper around the Mistral Vibe CLI.
When invoked with a task description, run:
vibe --prompt "<task description>"
Capture the output, then return a short summary: which files changed, what the change was, and any warnings Vibe surfaced. Do not edit files yourself — only run the `vibe` command.
If `vibe` returns an error or asks for clarification, return the raw output to the parent and stop.
A few things to notice. tools: Bash is intentional: this subagent’s only superpower is shelling out, which keeps its context narrow. The description is what Opus reads to decide when to dispatch, so the “do NOT use for…” line matters as much as the positive cases. And the wrapper itself runs on Sonnet 4.6, not Opus, because all it has to do is format one shell command — see the Claude Code hooks, subagents, and skills guide for why model selection on subagents is its own cost lever.
The real cost math
A typical “refactor 20 files to use the new API” task burns about 50K input tokens (reading files + scratch reasoning) and produces about 10K output tokens. Running it three ways:
| Path | Input cost | Output cost | Total |
|---|---|---|---|
| Claude Opus 4.7 direct | 50K × $5/M = $0.25 | 10K × $25/M = $0.25 | $0.50 |
| Mistral Vibe (Medium 3.5) | 50K × $1.50/M = $0.075 | 10K × $7.50/M = $0.075 | $0.15 |
| DeepSeek V4 Flash via ofox | 50K × $0.14/M = $0.007 | 10K × $0.28/M = $0.003 | $0.01 |
Mistral Vibe saves 3.3x against direct Opus on this task. If you run 100 such tasks a month, you’ve kept $35 in your wallet instead of Anthropic’s. The catch is that the saving evaporates the moment you delegate something Vibe can’t handle — Opus then re-does the work, so you pay twice. The decision rubric is: only delegate when you’d be comfortable letting a junior engineer do it without supervision.
For the genuinely token-paranoid, the third row in that table is real — DeepSeek V4 Flash is $0.14/$0.28 per million tokens on ofox (DeepSeek V4 pricing breakdown). You can substitute it for Mistral Vibe in the same subagent pattern; see the variant below.
Variant: the same pattern, on ofox + DeepSeek V4 Flash
If you already have an ofox key for unified model access, you can skip Mistral Vibe entirely and have Claude Code dispatch to DeepSeek V4 Flash directly. The Bash wrapper changes from vibe --prompt to a curl call, but the subagent definition is otherwise identical.
.claude/agents/cheap-worker.md:
---
name: cheap-worker
description: Use for mechanical edits — renames, format cleanup, boilerplate generation, simple refactors. Routes to DeepSeek V4 Flash via ofox. NOT for design decisions or novel logic.
tools: Bash, Read
model: sonnet
---
For each delegated task, call:
curl https://api.ofox.ai/v1/chat/completions \
-H "Authorization: Bearer $OFOX_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"deepseek/deepseek-v4-flash","messages":[{"role":"user","content":"<task + relevant file contents>"}]}'
Apply the returned diff yourself using Read + your own edit primitives. Return a one-paragraph summary to the parent.
The trade-off: Mistral Vibe is a real coding agent with its own planning loop, so it handles multi-file tasks better. A raw DeepSeek V4 Flash call is just a language model — the orchestration logic falls on you (or on Opus, which costs Opus tokens). For single-file edits, the ofox variant wins on price. For multi-file refactors, Vibe pulls ahead because its agentic loop runs on the cheap model, not on Opus.
Where this stops working
The delegation pattern breaks in three specific situations.
When the task involves judgment about API design or trade-offs. Mistral Medium 3.5 will pick an answer; Opus 4.7 will tell you why one option is wrong. Architecture decisions are not where you save tokens.
When the delegated task needs context the wrapper can’t supply. Vibe runs in a fresh process with no memory of your conversation. If “fix this bug” depends on three earlier discussions, you’ll pass them in as prompt context — paying input tokens twice, in both Opus and Vibe. Net cost can exceed the no-delegation baseline.
When Vibe’s tokenizer disagrees with Anthropic’s. Claude Opus 4.7 ships a new tokenizer that uses ~12-27% more tokens than Opus 4.6 on the same text. Mistral’s tokenizer is different again. Your “50K tokens” estimate from a Claude session is not what Vibe will count, and the bills won’t line up exactly. The 3.3x ratio holds in aggregate; trust it monthly, not per-task.
Picking the right delegation cutoff
A useful heuristic: if you can write a clear, two-sentence task description without referring to “the thing we discussed earlier” or “the approach you mentioned,” it’s delegable. If you can’t, keep it on Opus.
Tasks that consistently win when delegated:
- Symbol renames across the codebase
- Adding null-checks or error handling to a list of known functions
- Generating boilerplate (test scaffolding, type definitions from schemas, config files)
- Format/lint fixes that grep can target but humans hate doing
- Translating between formats (JSON ↔ YAML, OpenAPI ↔ TypeScript)
Tasks that lose when delegated:
- Anything involving “which approach is better”
- Novel algorithm work
- Bug fixes where the root cause isn’t established
- Reviewing AI-generated code (don’t ask a cheaper model to review its peer’s work)
If you’re already optimizing Claude Code spend, this pattern stacks on top of the strategies in the Claude Code token optimization guide — they target different cost drivers (this one targets which model does the work, the optimization guide targets how much context the model sees).
The full minimum-viable setup
Five minutes if you already have an Anthropic key:
curl -LsSf https://mistral.ai/vibe/install.sh | bashexport MISTRAL_API_KEY=...(get one from console.mistral.ai)- Drop the
.claude/agents/vibe-worker.mddefinition from earlier into your project root - Restart Claude Code
- Next time you need to do a 20-file refactor, just ask — Opus will read the subagent description and delegate
The first time you watch Claude Code dispatch to vibe-worker and come back with a diff that cost $0.15 instead of $0.50, the pattern justifies itself.
When this is the wrong question entirely
If your monthly bill is dominated by one model and you’re chasing a single-digit-percent cost cut, this isn’t the lever to pull. Re-read the how to reduce AI API costs guide and check whether prompt caching, batching, or context window discipline would save more for less engineering effort. Delegation overhead is real — every subagent dispatch is a Bash spawn, and every Bash spawn is a roundtrip Opus has to reason about.
But if you’ve already done the easy optimizations and you still see Opus burning tokens on tasks that look mechanical when you watch them happen, this is the pattern. Two CLIs, one config file, predictable savings. The broader model-selection question is independent — you can run the delegation pattern with any pair of orchestrator + worker; Opus + Vibe is just the version with the cleanest CLI ergonomics in May 2026.
What you’re really buying is the right to keep using the model you trust for hard problems, while paying a third of the price for the easy ones. That’s the deal — and it only takes 30 lines of YAML to claim it.
Sources & references
- Mistral Vibe + Medium 3.5 announcement — mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
- Mistral Medium 3.5 launch and pricing — Mistral AI pricing
- Claude Opus 4.7 pricing — Anthropic
- Claude Code custom subagents documentation — code.claude.com/docs/en/sub-agents
- DeepSeek V4 Flash pricing — DeepSeek API docs


