What is the ofox model ID for Qwen 3.6 Plus?

Use `bailian/qwen3.6-plus` as the `model` value in your request. The same key works for every other model in the ofox catalog — no separate Alibaba Cloud account needed.

How much does Qwen 3.6 Plus cost through ofox.ai?

$0.50 per million input tokens and $3.00 per million output tokens as of May 2026. Alibaba's direct DashScope pricing is $0.325 / $1.95, so you pay a unified-gateway premium; in exchange you skip a separate ICP-bound account and use the same key as Claude, GPT-5, Gemini, and DeepSeek.

Is Qwen 3.6 Plus really as good as Claude Opus on coding?

Close, not equal. On SWE-bench Verified, Qwen 3.6 Plus scores 78.8% vs. Opus 4.6 at 80.8% — within two points. On long-horizon agent workflows the gap reopens; Opus still wins on multi-step tool use and SWE-bench Pro. Use Qwen 3.6 Plus for the bulk of routine code work, escalate to Opus only when traces start failing.

Does Qwen 3.6 Plus support vision and tool calling?

Yes to both. It's multimodal (text + image input) and exposes OpenAI-compatible `tools` / `tool_choice` parameters. Always-on chain-of-thought reasoning means you get `reasoning_content` in responses by default.

May 11, 2026

qwenapi-accessmodel-comparisontutorial

Qwen 3.6 Plus API: Complete Guide to Pricing, Benchmarks, and Access (2026)

TL;DR — Qwen 3.6 Plus lands within two points of Claude Opus 4.6 on SWE-bench Verified (78.8% vs. 80.8%) at 1/30th the input price, with a 1M-token native context that Opus still doesn’t offer. Throughput is mixed: faster than Opus in tokens/sec, but below the median of its price tier on independent measurement (52 t/s vs. 58.9 t/s median). Access it on ofox.ai as bailian/qwen3.6-plus for $0.50/$3.00 per million tokens — same key as Claude, GPT, Gemini, DeepSeek.

Alibaba shipped a 1M-context model that comes within two points of Claude Opus 4.6 on SWE-bench Verified (78.8% vs. 80.8%) and charges less than Gemini Flash for input. That is not a typo, and the price war for serious coding models is not going to look the same after this quarter.

What is Qwen 3.6 Plus?

Qwen 3.6 Plus is Alibaba’s April 2026 flagship — a sparse mixture-of-experts model with always-on reasoning, released to the public API on April 2, 2026. It is the “Plus” tier of the Qwen 3.6 family, sitting between Qwen 3.6 Flash (cost-optimized) and Qwen 3.6 Max Preview (the top reasoning tier still in preview).

Three things make it interesting rather than just another model release:

1,000,000-token native context. Not a sliding-window trick — the model is trained for long context end-to-end, with up to 65,536 output tokens per response.
Hybrid attention. Linear attention combined with sparse MoE routing; this is what lets it serve 1M context without blowing up latency the way standard transformers do at that length.
Reasoning by default. No mode toggle, no extended_thinking flag — every response goes through chain-of-thought, and you receive reasoning_content alongside the answer.

If you’ve used Qwen 3.5 Plus, the architectural shift here matters more than the version bump suggests. Linear attention is a real bet, not a renaming exercise.

Qwen 3.6 Plus pricing — what you actually pay

You pay $0.50 per million input tokens and $3.00 per million output tokens on ofox.ai as of May 2026. That’s the practical number for developers reading this guide. For comparison, here is the model versus its direct peers, normalized to per-million-token rates:

Model	Input	Output	Context
Qwen 3.6 Plus (ofox)	$0.50	$3.00	1M
Claude Opus 4.6	$15.00	$75.00	200K
Claude Opus 4.7	$15.00	$75.00	200K
GPT-5.5	$1.25	$10.00	400K
Gemini 3.1 Pro	$1.25	$10.00	2M
DeepSeek V4 Pro	$0.27	$1.10	128K
Qwen 3 Max (older tier)	$0.36	$1.43	256K

For workloads where you’d consider Claude Opus, the input savings are 30× and the output savings are 25×. The honest comparison is not Opus though — most teams pick Sonnet or GPT-5 mini, where the gap narrows to 2-3×. That is still meaningful when you’re shipping millions of tokens through a coding agent.

Direct vs. gateway pricing. Alibaba’s DashScope publishes $0.325 / $1.95 per million for Qwen 3.6 Plus. The ofox markup buys you: one API key across the full catalog (Claude, GPT, Gemini, DeepSeek, Kimi, Llama, MiniMax, plus image/video models), USD invoicing, no ICP filing, and OpenAI-SDK drop-in compatibility. If you only ever call Qwen models from inside mainland China and have an Alibaba Cloud account already, go direct. If you’re routing between providers, the gateway pays for itself in operational time. See our gateway-vs-direct breakdown for the full math.

Benchmarks: where Qwen 3.6 Plus actually wins

Qwen 3.6 Plus wins on practical coding benchmarks and loses on raw output speed against its price tier. Here is what the public numbers say:

Coding (SWE-bench Verified — repository-level patch tasks):

Claude Opus 4.6: 80.8%
GPT-5.4: ~80% (matches GPT-5.3-Codex)
Qwen 3.6 Plus: 78.8%
Gemini 3.1 Pro: comparable range (mid-70s)

On the harder SWE-bench Pro (multi-language, larger repos) the order changes: Opus 4.7 reaches 64.3%, GPT-5.4 lands at 57.7%, and Gemini 3.1 Pro at 54.2%. Don’t conflate the two — Pro is the meaningful one for production agent loops on real codebases, and Qwen 3.6 Plus has not posted a competitive Pro number yet. On Verified, the headline is “within two points of Opus at 1/30th the input price,” which is enough reason to A/B test it for routine “read this codebase, fix this issue, output a diff” agents.

Throughput and latency (Artificial Analysis, May 2026):

Intelligence Index score: 50 (well above the 35 average for comparable models)
Output speed: 52 tokens/sec (Alibaba API)
Time-to-first-token: 3.12 seconds
Median for reasoning models in this price tier: 58.9 tokens/sec

So it’s slower than median on independent measurement. Qwen 3.6 Plus is still faster than Opus (Opus is a slow model in absolute terms), but it is not the fastest thing in its price bracket. DeepSeek V4 Flash and Gemini Flash both win on raw tokens/sec.

What this means in practice. Pick Qwen 3.6 Plus when output quality on coding/reasoning matters more than latency. For chat UIs where users watch tokens stream, prefer Flash-tier models. For overnight batch agent runs, the latency doesn’t matter and the quality-per-dollar is hard to beat.

For deeper benchmark methodology and how we score reasoning tasks, see our LLM leaderboard for 2026.

API access: minimal working example

You can hit Qwen 3.6 Plus from any OpenAI-compatible SDK by pointing it at https://api.ofox.ai/v1 and setting model: "bailian/qwen3.6-plus". The full working Python call:

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-ofox-key",
    base_url="https://api.ofox.ai/v1",
)

response = client.chat.completions.create(
    model="bailian/qwen3.6-plus",
    messages=[{"role": "user", "content": "Refactor this loop to use map()"}],
)
print(response.choices[0].message.content)

For curl users:

curl https://api.ofox.ai/v1/chat/completions \
  -H "Authorization: Bearer $OFOX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"bailian/qwen3.6-plus","messages":[{"role":"user","content":"Hi"}]}'

That’s it. If you already have OpenAI SDK code in production, the migration is a two-line change — see our OpenAI SDK migration guide for the corner cases (streaming, tool use, response shape).

Reading the reasoning_content field

Always-on reasoning means every response includes a reasoning_content field alongside the regular content. If you’re used to OpenAI’s o1 pattern this will look familiar:

msg = response.choices[0].message
print(msg.content)            # the answer
print(msg.reasoning_content)  # the chain of thought

The reasoning tokens are billed at the output rate. For a typical SWE-bench task, expect 2-4× the visible-answer length in hidden reasoning. Budget for it: a “200-token answer” routinely costs 800-1200 output tokens once reasoning is included. This is normal and shows up in every reasoning-mode model on the market.

If you don’t need reasoning (chat UIs, classification, summarization), Qwen 3.6 Flash at $0.25/$1.50 is the right tier — don’t pay Plus rates for it.

Tool calling and 1M context: where the value lives

The combination of tool calling + 1M context is what makes Qwen 3.6 Plus a serious agent model rather than just a benchmark winner. Standard OpenAI tools parameter works:

tools = [{
    "type": "function",
    "function": {
        "name": "search_codebase",
        "description": "Search the repository",
        "parameters": {"type": "object", "properties": {
            "query": {"type": "string"}}}
    }
}]
response = client.chat.completions.create(
    model="bailian/qwen3.6-plus",
    messages=[...],
    tools=tools,
)

The 1M window means you can drop entire mid-sized codebases into a single conversation without RAG — useful for “explain this monorepo” or “find every place this function is called” prompts. Most teams aren’t ready for 1M-token context yet because their orchestration code assumes ≤200K, but if you can wire it up, it eliminates a lot of retrieval plumbing. The function calling guide has the full schema reference.

When to pick Qwen 3.6 Plus over the alternatives

Pick Qwen 3.6 Plus when:

You’re running a coding agent and Opus is eating your budget
You need >200K context for repo-level work without setting up RAG
You want reasoning-mode quality without GPT-5 / Opus prices
Your traffic isn’t latency-critical (batch jobs, async agents)

Pick something else when:

You need <1s time-to-first-token (use Flash-tier or Gemini Flash)
You’re doing pure chat — reasoning overhead wastes money
You’re inside the Anthropic ecosystem (Claude Code, MCP) and switching cost > savings
Your workload is multi-step agent loops with heavy tool use — Opus and GPT-5.5 still lead there

For a structured way to think about this, our LLM API selection decision matrix walks through the same tradeoffs across 12 use cases. For the underlying philosophy of when API gateways pay for themselves, see the Claude vs. GPT vs. Gemini comparison guide.

Migration checklist if you’re moving off Claude or GPT

A pragmatic order of operations for swapping in Qwen 3.6 Plus:

Audit your current spend. Pull a week of token logs grouped by task type (chat / coding / summarization / agent loops).
Pick one task type to migrate first. Routine code review or test generation usually has the cleanest A/B baseline.
Run shadow traffic for 48 hours. Send 10% of requests to bailian/qwen3.6-plus in parallel; compare outputs offline.
Watch reasoning_token usage. Always-on reasoning means a 2-4× output token amplification. Confirm your dollar savings hold after this multiplier.
Keep an escape hatch. Route failures or low-confidence outputs back to your previous model — see how to reduce AI API costs for the routing patterns.

If you’re coming from DeepSeek, the differences are subtler — DeepSeek V4 Pro is cheaper still but its long-context behavior is weaker past ~64K. Worth reading the DeepSeek API pricing guide for the side-by-side.

What is still rough about Qwen 3.6 Plus

Three things worth knowing before you commit:

Output speed below median. 52 t/s on Artificial Analysis is fine for batch but visibly slow in a chat UI. If users will see streaming, test before committing.
English-content benchmarks lag Chinese ones. The model is genuinely bilingual but its strongest training signal is on Chinese-language tasks. For pure-English creative writing the gap to Claude is visible.
Reasoning content is verbose. The default chain-of-thought is long. Most teams find they want to either strip it from logs entirely or budget for the token multiplier — not both halfway.

None of these are blockers. They’re the things you’d discover yourself after a week of production traffic, written down so you don’t have to.

The interesting question after Qwen 3.6 Plus isn’t “is it good” — the benchmarks settle that. It’s whether the rest of the industry can keep charging $15/$75 for Opus-tier code quality when a 1M-context alternative costs less than coffee. The next twelve months of model pricing are going to be loud.

Sources

Alibaba Qwen 3.6 Plus product page and DashScope API reference
ofox.ai model catalog and pricing (May 2026)
Artificial Analysis intelligence and throughput benchmarks
SWE-bench Verified leaderboard
OpenRouter Qwen 3.6 Plus specification page