Doubao Seed 2.0 API Guide: ByteDance's Budget LLM Pricing, Setup & Benchmarks (2026)
TL;DR — ByteDance shipped Doubao Seed 2.0 on February 14, 2026, and it’s the most aggressive price/performance play of the year. Pro hits 98.3 on AIME 2025 and 76.5 on SWE-Bench Verified at ~$0.67/$3.36 per million tokens — roughly 4x cheaper output than GPT-5.2, with a 256K context window across all four variants. Skip Volcengine’s Chinese-phone-number wall and call the same models via ofox.ai’s OpenAI-compatible gateway. This guide covers pricing for all four tiers, real benchmark numbers, the exact Python setup, and which variant to actually pick for which job.
Doubao Seed 2.0 Pro scores 98.3 on AIME 2025 and ships at roughly 4x cheaper output pricing than GPT-5.2. That’s no longer “the cheap China model” framing — that’s pricing pressure on the entire frontier tier.
What is Doubao Seed 2.0?
Doubao Seed 2.0 is ByteDance’s frontier-class LLM family, released February 14, 2026, and the engine behind the Doubao consumer app (China’s largest AI chatbot by MAU). The “Seed” naming refers to ByteDance’s internal research lab — every variant in the 2.0 family was post-trained from the same base checkpoint, then specialized.
There are four variants:
- Seed 2.0 Pro — flagship reasoning + multimodal, competes with Claude Opus / GPT-5.2 / Gemini 3 Pro
- Seed 2.0 Code — same price as Pro, post-trained for software engineering tasks
- Seed 2.0 Lite — balanced general-purpose, competitive with mid-tier US models at a fraction of the cost
- Seed 2.0 Mini — the budget workhorse, designed for high-concurrency loops (classification, extraction, light agents)
All four share a 256K token context window with up to 32K output tokens, and all four support vision input, function calling, prompt caching, and structured outputs. Pro and Code additionally support extended reasoning (“thinking” mode), and Pro supports video input.
Doubao Seed 2.0 pricing — the actual numbers you’ll pay
Two pricing tracks matter to international developers.
Direct Volcengine ARK (China): Requires a Chinese phone number, mainland ID verification, and CNY top-up via Alipay or WeChat Pay. Cheapest if you can clear those hurdles.
ofox.ai OpenAI-compatible gateway: USD billing, email signup, no Chinese ID. A modest gateway markup over direct ARK, but you also get Claude, GPT, Gemini, DeepSeek, and Qwen on the same key.
Here’s what each tier actually costs via ofox (verified against ofox.ai/llms-full.txt on 2026-05-20):
| Variant | Input ($/MTok) | Output ($/MTok) | Context | Best for |
|---|---|---|---|---|
| Seed 2.0 Mini | $0.06 | $0.56 | 256K | High-volume classification, extraction, light routing |
| Seed 2.0 Lite | $0.13 | $0.76 | 256K | General-purpose chat, RAG, content generation |
| Seed 2.0 Code | $0.67 | $3.36 | 256K | Codegen, refactoring, agent loops with file edits |
| Seed 2.0 Pro | $0.67 | $3.36 | 256K | Hard reasoning, math, vision, multi-step planning |
For context, GPT-5.2 lists at $1.75/$14.00 per MTok, Claude Opus 4.6 at $5.00/$25.00, and Gemini 3.1 Pro at $2.00/$12.00. Doubao Seed 2.0 Pro at $0.67/$3.36 sits in a different price bracket entirely — Mini at $0.06/$0.56 is in a third.
The Pro/Code parity (same price per token) is intentional: ByteDance treats Code as Pro with a coding-specialized head. You pay the flagship rate either way; pick by which post-training you need.
Benchmarks — what each variant actually delivers
Public benchmark scores from ByteDance’s release materials and third-party evaluations (apiyi.com, datalearner.com, evolink.ai), cross-referenced 2026-05-20:
| Benchmark | Pro | Lite | Code | Mini |
|---|---|---|---|---|
| AIME 2025 (math) | 98.3 | 93.0 | — | 87.0 |
| GPQA Diamond (science) | 88.9 | — | — | — |
| MMLU-Pro (knowledge) | 87.0 | 87.7 | — | — |
| LiveCodeBench v6 | — | — | 87.8 | — |
| SWE-Bench Verified | 76.5 | 73.5 | 76.5 | — |
| Codeforces rating | 3020 | — | — | — |
| VideoMME | 89.5 | — | — | — |
A few notes on these:
- The Lite oddity is real. Lite slightly out-scores Pro on MMLU-Pro (87.7 vs 87.0). This isn’t noise — Lite was post-trained on a knowledge-heavy distill, while Pro optimizes for hard reasoning. If your workload is “summarize a document” or “answer a factual question,” Lite is genuinely the right pick. Pro is the right pick when chains of inference matter.
- Code matches Pro on SWE-Bench Verified (76.5). That puts it within striking distance of Claude Sonnet 4.6 and ahead of GPT-5.4-mini on the same benchmark. At $0.67/$3.36 it’s the cheapest model in the “I can actually edit a real codebase” tier.
- Pro’s 3020 Codeforces is genuinely frontier. For comparison, GPT-5.5 sits around 3050 and Claude Opus 4.7 around 2980 on the same evaluation.
- Mini at 87.0 AIME is the surprise. That’s a higher math score than GPT-4o at a fraction of the price. For tasks where you want a smarter-than-3.5-class model in a tight loop, Mini is hard to beat.
What benchmarks don’t tell you: Doubao models still over-index on Chinese-language quality, and Pro’s English long-context recall is noticeably weaker than Gemini 3.1 Pro past ~150K tokens. For English-only agent workloads with deep context, Gemini still wins. For mixed-language or Chinese-primary work, Doubao Seed 2.0 Pro is the strongest option at this price.
How to access Doubao Seed 2.0 from outside China
Three real options:
1. Direct Volcengine ARK. Endpoint: https://ark.cn-beijing.volces.com/api/v3/chat/completions. Cheapest pricing. Requires Chinese phone + ID + Alipay/WeChat Pay for top-up. Not workable for most non-China-based teams.
2. BytePlus (ByteDance’s international arm). USD billing, no Chinese phone required, but pricing is roughly 1.7x the mainland ARK rate, and the onboarding still asks for a business entity. Workable for enterprises, painful for indie devs.
3. OpenAI-compatible gateway (ofox.ai). Email signup, top-up with international card, single API key for Doubao + Claude + GPT + Gemini + DeepSeek + Qwen. Pricing sits between direct ARK and BytePlus. This is the path most international developers take, and it’s what the setup section below uses.
The gateway path also gets you something the direct route doesn’t: model-switching with a one-line code change. When you outgrow Mini for a particular workload, you swap volcengine/doubao-seed-2.0-mini for volcengine/doubao-seed-2.0-pro (or anthropic/claude-sonnet-4-6, etc.) without touching the rest of your stack.
Setup — your first Doubao Seed 2.0 call in Python
The ofox.ai API is fully OpenAI-compatible. If you’ve ever used the openai Python SDK, you already know the shape:
from openai import OpenAI
client = OpenAI(
api_key="sk-ofox-...", # from ofox.ai/dashboard
base_url="https://api.ofox.ai/v1",
)
resp = client.chat.completions.create(
model="volcengine/doubao-seed-2.0-lite",
messages=[{"role": "user", "content": "Explain MoE routing in two sentences."}],
)
print(resp.choices[0].message.content)
That’s it. Three changes from a direct OpenAI call: the api_key, the base_url, and the model slug. Everything else — streaming, function calling, structured outputs, tool use — works through the same SDK methods.
The four Doubao Seed 2.0 model slugs you’ll use:
volcengine/doubao-seed-2.0-provolcengine/doubao-seed-2.0-codevolcengine/doubao-seed-2.0-litevolcengine/doubao-seed-2.0-mini
For vision input on Pro, pass an image URL or base64 payload in the content array exactly like you would with GPT-4o — the schema is identical. For function calling, tools + tool_choice works the same way. ofox normalizes the underlying Volcengine API quirks (different param names for max_tokens vs max_completion_tokens, slightly different reasoning-effort flags) into the OpenAI-standard shape, so your code stays clean.
If you’re calling from Node, swap to npm install openai and the same three changes apply. For the OpenAI SDK migration full walkthrough, we cover the edge cases (streaming, retries, custom headers).
When to pick which variant — a routing decision tree
Don’t default to Pro. The whole point of the Seed 2.0 family is cost-graded routing. Here’s how the decision actually breaks down:
Use Mini ($0.06/$0.56) when:
- Classification, extraction, intent routing
- Rewriting/condensing text in a tight loop
- Pre-filtering inputs before a more expensive model
- Generating embeddings prompts (when you need an LLM to format queries)
Use Lite ($0.13/$0.76) when:
- General chat, customer support, RAG generation
- Document summarization (256K context is the killer feature here)
- Content generation where Mini’s reasoning shows seams
- Default workhorse for the 80% of requests that don’t need flagship reasoning
Use Code ($0.67/$3.36) when:
- You’re inside an agent that edits real files
- Code review, refactoring suggestions, test generation
- Anything where SWE-Bench-style multi-file reasoning matters
- Don’t use it for non-coding tasks — pay Pro’s same price and get the more general head
Use Pro ($0.67/$3.36) when:
- Math, scientific reasoning, multi-step planning
- Multimodal (image + video) understanding
- Chinese-language work where Doubao’s training advantage shows
- You’ve tried Lite and it broke on the chain-of-reasoning
The cost-effective stack is usually: Lite as the default + Pro on escalation for the 5-10% of requests that need it. If you’re running an agent loop, route file-edit steps to Code and reasoning steps to Pro. We cover this routing pattern in detail in our hybrid Claude Code routing guide — the same shape applies here.
Doubao Seed 2.0 vs other budget-tier APIs
The question every developer asks: is this actually better than DeepSeek or Qwen at the same price?
| Model | Input $/MTok | Output $/MTok | SWE-Bench V. | AIME 2025 | Context |
|---|---|---|---|---|---|
| Doubao Seed 2.0 Pro | $0.67 | $3.36 | 76.5 | 98.3 | 256K |
| Doubao Seed 2.0 Lite | $0.13 | $0.76 | 73.5 | 93.0 | 256K |
| DeepSeek V4 Pro | $1.74 | $3.48 | 72.0 | 95.5 | 1M |
| DeepSeek V4 Flash | $0.14 | $0.28 | 65.8 | 88.0 | 1M |
| Qwen 3.6 Plus | $0.50 | $3.00 | 78.8 | 94.1 | 1M |
| Kimi K2.5 | $0.60 | $3.00 | 71.0 | 90.5 | 256K |
What this matrix actually tells you:
- Doubao Lite vs DeepSeek V4 Flash is the most interesting cross-vendor matchup. V4 Flash is cheaper on both ends ($0.14/$0.28 vs $0.13/$0.76) and ships a 4x larger 1M context window, but Lite scores meaningfully higher on SWE-Bench. For short-context volume work where output cost dominates, V4 Flash is the cheaper pick; for tasks where reasoning quality matters more than headline price, Lite still wins on quality-per-token. See our DeepSeek V4 Pro vs Flash breakdown for the full cost analysis on that side.
- Doubao Pro vs Qwen 3.6 Plus is closer than the price suggests. Qwen leads SWE-Bench Verified (78.8 vs 76.5) at slightly lower output cost ($3.00 vs $3.36) and ships a 1M context window, but Doubao Pro is stronger on math (AIME 98.3 vs 94.1) and multimodal. For pure coding with long context, Qwen wins. For agentic multimodal work and hard math, Doubao Pro. Qwen 3.6 Plus’s full guide goes deeper on the coding side.
- Doubao Mini at $0.06 input is the floor for any serious model. Cheaper open-weight models exist but nothing hosted and supported at this benchmark tier.
The broader picture: 2026 is the year Chinese-vendor frontier models broke the price floor without sacrificing benchmark performance. Our LLM leaderboard tracks the full ranking; the LLM API selection decision matrix maps use cases to picks.
Limitations and gotchas
A few things to know before you commit.
English-only long context. Pro past ~150K tokens has noticeable recall degradation on English-heavy documents. ByteDance optimized for Chinese; the long-context benchmark numbers reflect that. For 200K+ English-only work, Gemini 3.1 Pro is still the better pick — see our Gemini 3.1 Pro guide.
Function calling quirks. Doubao’s tool-use schema is more conservative than Claude’s — it occasionally hesitates to call a tool when Claude would. If you’re porting an agent from Claude Code or Codex CLI, expect to tighten your system prompts to be more directive about tool use.
Reasoning mode tokens count fully. Pro’s “thinking” output (the reasoning chain) bills at the full output rate. Unlike some providers that discount internal thinking tokens, ByteDance charges everything. A long reasoning chain on a hard problem can easily hit $0.50+ in output cost. Budget accordingly.
No US data residency option (yet). All Doubao Seed 2.0 inference runs on ByteDance infrastructure in either mainland China (direct) or Singapore/Hong Kong (via gateways). If your compliance requires US data residency, this is a non-starter — use Claude, GPT, or Gemini instead. For most consumer and developer-tools use cases this doesn’t matter, but enterprise procurement should know.
Rate limits are tighter than the headline pricing suggests. Default tier on direct Volcengine starts at ~5 RPS, scaling on usage history. ofox abstracts this into a unified queue, so most users won’t notice — but if you’re building something burst-heavy, plan a backoff strategy.
Final take
Doubao Seed 2.0 is the clearest signal yet that “Chinese-vendor LLM” no longer maps to “lower quality at lower price.” It now maps to “comparable benchmark to US flagships at meaningfully lower price.” Pro on hard math is genuinely competitive with GPT-5.2 and Claude Opus 4.6; Lite at $0.13/MTok is the most cost-effective workhorse on the market today; Mini at $0.06/MTok is the floor for any model worth using in production.
The catch is access — Volcengine’s direct registration is a wall for non-China-based developers, and BytePlus’s enterprise track is overkill for individual devs. An OpenAI-compatible gateway like ofox.ai flattens that to “email signup, one API key, swap models with a string change.” That aggregated single-endpoint model is what makes mixed-vendor routing actually practical.
Lite at $0.13/MTok with a 256K context hits 87.7 on MMLU-Pro. If you’re still routing medium-tier work to GPT-4-class models at 10x the price, you’re funding the wrong vendor — try Lite on a real workload before deciding.
If you’re building anything where token cost matters at scale — agents, RAG pipelines, content generation at volume — spend an afternoon swapping a portion of your traffic to Seed 2.0 Lite or Mini and measure the quality delta against your current model. The pricing leverage is the largest of any model release in 2026.
For cost-reduction patterns broadly, see our how to reduce AI API costs guide. For the flagship comparison view, the Claude vs GPT vs Gemini pillar sets the wider context.
Sources: Pricing and model IDs verified against ofox.ai/llms-full.txt and ofox.ai/models/volcengine/doubao-seed-2.0-lite (2026-05-20). Benchmark numbers from ByteDance Seed 2.0 release materials, datalearner.com, and apiyi.com cross-evaluation reports.


