What is the ofox model ID for Qwen 3.7 Max?

Use `bailian/qwen3.7-max` as the `model` value. The same ofox API key works for every other model in the catalog — Claude, GPT, Gemini, DeepSeek — so you do not need a separate Alibaba Cloud or DashScope account.

How much does Qwen 3.7 Max cost compared to Claude Opus 4.7?

Qwen 3.7 Max is $2.50 per million input tokens and $7.50 per million output tokens. Claude Opus 4.7 is $5/M input and $25/M output. Qwen is half the input rate and roughly one-third of the output rate. Cached input on Qwen drops to $0.25/M, a 90% discount that materially changes the long-context economics.

Does Qwen 3.7 Max really support the Anthropic API protocol?

Yes, natively. Claude Code, OpenClaw, and any harness that speaks the Anthropic Messages format can point at the Qwen endpoint with only the base URL and model id changed. This is unusual — most non-Anthropic models still require a shim layer.

What is the catch with the 1M context window?

Extended thinking is on by default and the model is verbose. Artificial Analysis's evaluation observed ~97M output tokens versus a ~24M median for comparable models on the same task. For long agent loops, plan for cached-input pricing, set a hard `max_output_tokens` cap, and assume your effective token bill scales with thinking length rather than just inputs.

Is Qwen 3.7 Max open-weight?

No. The Max-Preview tier is proprietary and hosted only. Alibaba has shifted its flagship tier to closed weights starting with the 3.6 generation; the open-weight path is Qwen 3.6 27B (the Apache 2.0 dense model released April 22, 2026, which replaced the older Qwen 3.5 397B MoE), not Max.

Qwen 3.7 Max Developer Guide: 1M Context & $2.50/MTok (2026)

TL;DR — Qwen 3.7 Max-Preview launched May 19, 2026 with a 1M-token context, native Anthropic Messages protocol, $2.50/$7.50 per million tokens, and a 90% cached-input discount ($0.25/M). It scores 56.6 on the Artificial Analysis Intelligence Index (top 10 globally and the highest-ranked Chinese model on the board) and 97.1 on HMMT 2026 February. Access it on ofox.ai as bailian/qwen3.7-max with the same key that gives you Claude, GPT, and Gemini. The catch: default extended thinking makes it verbose enough that effective costs run 3–4× the headline rate on long agent sessions unless you cap max_tokens.

A model that natively speaks the Anthropic Messages protocol at half the input cost of Claude Opus 4.7, with a real 1M-token window and a 90% cache discount, is the first credible plug-replacement for Opus in a Claude Code harness. The pricing war for serious coding models just got uncomfortable.

What is Qwen 3.7 Max?

Qwen 3.7 Max-Preview is Alibaba’s May 2026 flagship reasoning model, announced at the Alibaba Cloud Summit on May 20 and live on the API one day earlier. It replaces Qwen 3.6 Max Preview as the company’s most capable hosted model. A few properties make it more than a routine version bump:

Native Anthropic Messages support. Most non-Anthropic models advertise “Anthropic compatibility” via a translation shim. Qwen 3.7 Max accepts the Anthropic Messages format directly at the endpoint level, so you can point Claude Code, OpenClaw, or any Anthropic SDK call at Qwen with only the base URL and model id changed.
1,000,000-token native context with 65,536-token max output. Not a sliding-window approximation. The model is trained for end-to-end long context and posts 90.4 on the MRCR-v2 128k retrieval benchmark, which is the score most “1M context” competitors quietly fail.
Extended thinking on by default. Every response runs through deliberation before output. Quality goes up; verbosity goes up much further. Plan for it or watch your bill.

For background on why “1M context” claims are worth verifying, see Long-Context LLM Benchmarks. Most models marketed at 1M lose accuracy hard past 200K.

Qwen 3.7 Max pricing — what you actually pay

You pay $2.50 per million input tokens, $7.50 per million output tokens, and $0.25 per million cached input tokens on ofox.ai. The cached-input rate is the number that changes the planning, not the headline rate.

Model	Input	Output	Cached In	Context
Qwen 3.7 Max-Preview	$2.50	$7.50	$0.25	1M
Qwen 3.6 Plus	$0.50	$3.00	—	1M
Qwen 3.6 Max Preview	$2.00	$12.00	—	256K
Claude Opus 4.7	$5.00	$25.00	$0.50	200K
GPT-5.5	$3.00	$12.00	$0.30	400K
Gemini 3.1 Pro	$2.50	$10.00	$0.31	1M

Qwen 3.7 Max is the cheapest model in its tier on output by a meaningful margin: $7.50 versus $10–25 for the comparable flagships. Output tokens dominate agent-loop bills because thinking and tool calls all accumulate on the output side, so this is the line item that shapes the monthly bill.

The cached-input rate is the other half of the story. At $0.25 per million tokens, repeated reads of the same context cost the same as uncached input on Gemini 3.1 Flash. If you are doing RAG over a stable codebase, document QA over a fixed PDF set, or agent loops that carry a long system prompt across hundreds of calls, the cached rate is the one that decides the bill. The headline rate is misleading for those workloads.

The output rate carries an asterisk we will get to in the verbosity section.

Quickstart: OpenAI SDK route

This is the standard chat completions path. Works with any OpenAI SDK without modification — same call shape as GPT or DeepSeek.

from openai import OpenAI

client = OpenAI(
    api_key="sk-ofox-xxx",
    base_url="https://api.ofox.ai/v1",
)

response = client.chat.completions.create(
    model="bailian/qwen3.7-max",
    messages=[{"role": "user", "content": "Explain MoE routing in three sentences."}],
    max_tokens=1024,
)
print(response.choices[0].message.content)

If you are already on the OpenAI SDK and switching from another model, swapping the model string is the whole migration. See the OpenAI SDK Migration Guide for the rest of the catalog.

Quickstart: Anthropic Messages route (drop-in for Claude Code)

This is the unusual part. Qwen 3.7 Max accepts the Anthropic Messages format at the protocol level, so anything that already targets Claude works by switching the base URL and model id.

export ANTHROPIC_BASE_URL=https://api.ofox.ai/anthropic
export ANTHROPIC_API_KEY=sk-ofox-xxx
export ANTHROPIC_MODEL=bailian/qwen3.7-max

# Claude Code, OpenClaw, or any Anthropic-SDK harness now routes to Qwen
claude  # runs against Qwen 3.7 Max with the same Claude Code UI

If you have been running Claude Code on Opus 4.7 and want to test Qwen on a real task without rewriting your harness, that is the whole setup. The protocol-level support is what makes this different from every other “Anthropic-compatible” model on the market: those are shims, with the usual edge cases around tool-use schemas and streaming chunk boundaries.

For a comparison of the agent harnesses themselves (Claude Code vs Codex CLI vs Cursor vs DeepSeek TUI) on the same task, see AI Coding Agents Compared 2026.

Benchmarks — and which ones actually matter

Headline numbers from Artificial Analysis and the Qwen team’s own evaluation:

Benchmark	Qwen 3.7 Max	What it measures
AA Intelligence Index	56.6	Composite across 10 evaluations (MMLU-Pro, GPQA, HumanEval+, SWE-bench Verified, etc.)
HMMT 2026 Feb	97.1	Competition mathematics — top result in the AA leaderboard at launch
GPQA Diamond	92.4	Graduate-level science questions
MRCR-v2 128k	90.4	Long-context multi-hop retrieval at 128K tokens
LM Arena (Elo)	~1,475	Crowd-sourced pairwise preference

The math score matters more than it looks. Nobody is shipping production code that solves HMMT, but the benchmark measures whether the model can hold a long multi-step chain without losing the thread. That correlates closely with whether the model can hold its own during a debugging session where you need it to track three files and four constraints at once.

The MRCR-v2 score is what makes the 1M context window believable. Most 1M-context models drop below 70% retrieval accuracy past 200K. Qwen 3.7 Max retains accuracy at the length it advertises, which is what the window is being sold for.

The composite Intelligence Index puts it at #1 among Chinese models and inside the global top 10, a tier above Qwen 3.6 Max Preview, comparable to the GPT-5.5 / Claude Opus 4.6 cluster, and just below the absolute frontier covered in GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro.

The verbosity tax most reviews skip

The launch coverage glosses over this part. Artificial Analysis’s published evaluation of Qwen 3.7 Max generated ~97 million output tokens, against a ~24 million median for the comparable models on the same task. Roughly 4× the verbosity. Multiply by the $7.50/M output rate and the “cheap” model lands in Claude Opus 4.7 cost territory on workloads where thinking compounds.

This is a structural property, not a benchmark artifact. Extended thinking runs by default and there is no first-class flag to switch it off. The practical mitigation set:

Cap max_tokens aggressively. Most agent loops do not need 65K of output per turn. max_tokens=2048 or 4096 per turn cuts the worst-case bill without hurting quality on typical work.
Use the cached-input rate religiously. Anything that carries a stable system prompt or repeated context across many turns should hit the $0.25/M cached rate. A 10× input saving often outweighs the verbosity tax on the output side.
Route by task length. Send the hard long-context turns to Qwen 3.7 Max where reasoning earns its keep; route short turns to a cheaper model. The Claude Code hybrid routing pattern generalizes to any agent harness.
Watch the reasoning_content field. It is billed as output even when your application discards it from the final message. If you toss it without trimming, you are still paying for it.

The economic shape of Qwen 3.7 Max is “very cheap headline, verbose middle, watch your output cap.” Different enough from how you plan a Claude or GPT budget that ignoring it makes the model look more expensive than it is.

Where Qwen 3.7 Max loses

Three places I would still reach for Claude Opus 4.7 or GPT-5.5:

Multi-step agent workflows with brittle tool schemas. Opus 4.7 is measurably more reliable at long-horizon tool use where a single malformed call breaks the loop. Qwen 3.7 Max is good here, better than 3.6 Max, but not best.
End-user-facing copy where short responses matter. If you need concise output for a chat UI and have no clean way to trim post-hoc, the cheaper-on-paper model becomes annoying to control.
Latency-sensitive interactive UIs. Extended thinking adds latency by design. At the same price point, a non-thinking model will feel faster to a user typing in a chat box.

None of these are showstoppers; they are tradeoffs to plan around. For the leaderboard view of when each model wins, see Best LLM for Coding 2026 and the LLM Leaderboard.

Practical recommendation

Use Qwen 3.7 Max as the default for two workload shapes: long-context document or codebase QA where cached input dominates the bill, and Claude Code / OpenClaw sessions where you want to cut Opus costs by roughly 3× without touching the harness. The Anthropic protocol support is the deciding factor. Every other cheap-Opus-alternative requires a shim that breaks tool-use schemas in subtle ways. Qwen is the first model where that is no longer the case.

Keep Opus 4.7 for agent loops where you cannot afford a single failed tool call, and keep a Flash-tier model for short turns where reasoning is overkill. The operational argument for running all three behind one key sits in Why an LLM API Gateway and the AI API Aggregation Guide. For the routing patterns that benefit most from Qwen’s pricing shape, How to Reduce AI API Costs covers the mechanics.

The question after the Qwen 3.7 Max launch is no longer whether a Chinese model can compete at the frontier. It is which workloads you stop sending to Opus this month.

Sources

Artificial Analysis benchmark page: https://artificialanalysis.ai/models/qwen3-7-max
MarkTechPost launch coverage: https://www.marktechpost.com/2026/05/21/qwen-introduces-qwen3-7-max-a-reasoning-agent-model-with-a-1m-token-context-window/
DataCamp deep dive: https://www.datacamp.com/blog/qwen3-7-max
OpenRouter pricing page: https://openrouter.ai/qwen/qwen3.7-max
Qwen 3.6 vs 3.7 comparison: https://codersera.com/blog/qwen-3-7-vs-qwen-3-6-2026/

What is Qwen 3.7 Max?

Qwen 3.7 Max pricing — what you actually pay

Quickstart: OpenAI SDK route

Quickstart: Anthropic Messages route (drop-in for Claude Code)

Benchmarks — and which ones actually matter

The verbosity tax most reviews skip

Where Qwen 3.7 Max loses

Practical recommendation

Sources

Related Articles

Qwen 3.6 Plus API: Complete Guide to Pricing, Benchmarks, and Access (2026)

How to Use Any OAI-Compatible API with GitHub Copilot — Custom Model Setup Guide

Doubao Seed 2.0 API Guide: ByteDance's Budget LLM Pricing, Setup & Benchmarks (2026)