Claude Opus 4.7 API Review: What Actually Changed, Real Costs, and Whether to Upgrade

Claude Opus 4.7 API Review: What Actually Changed, Real Costs, and Whether to Upgrade

TL;DR — Opus 4.7 is a real upgrade: 87.6% SWE-bench Verified (up from 80.8%), 3x better vision, and a new xhigh effort level now default in Claude Code. The part getting less attention: the new tokenizer means the same prompt costs 5-35% more in practice. Same sticker price, higher real bill. Worth it for most teams, but test before migrating production.

What Anthropic Actually Shipped

Anthropic released Claude Opus 4.7 on April 16, 2026. Two days later it’s already the default Opus route on most API platforms, including ofox.ai.

BenchmarkOpus 4.6Opus 4.7Change
SWE-bench Verified80.8%87.6%+6.8pp
SWE-bench Pro53.4%64.3%+10.9pp
CursorBench58%70%+12pp
Vision accuracy54.5%98.5%+44pp
Max image resolution~1MP3.75MP3.75x

The vision jump stands out. Opus 4.6 was mediocre at reading screenshots, diagrams, and dense charts. Opus 4.7 handles them well — 98.5% accuracy on the standard vision benchmark, accepting images up to 3.75 megapixels. If you’ve been routing vision tasks to GPT-5.4 because Claude’s image handling was unreliable, that’s worth revisiting.

On coding, 64.3% on SWE-bench Pro puts Opus 4.7 ahead of GPT-5.4 (57.7%) and Gemini 3.1 Pro (54.2%) on real-world GitHub issue resolution. SWE-bench Pro uses actual open-source repositories, not synthetic tasks.

The Tokenizer Problem

Opus 4.7 ships with a new tokenizer. Anthropic’s migration guide says it uses “roughly 1.0 to 1.35x as many tokens” as 4.6 for the same content. That range matters a lot depending on what you’re building.

  • Natural language prose: ~1.0-1.05x (negligible)
  • Mixed code and text: ~1.1-1.2x (10-20% more tokens)
  • Dense code, especially Python or TypeScript: ~1.2-1.35x (20-35% more tokens)

The list price is unchanged at $5/$25 per million tokens. But code-heavy workloads will cost more. A team spending $2,000/month on Opus 4.6 for a code review pipeline should budget $2,200-2,700/month for the same volume on 4.7. The performance gains likely justify it. Just don’t get surprised by the invoice.

The xhigh Effort Level

Opus 4.7 adds a new reasoning tier: xhigh. It sits between high and max and is now the default in Claude Code for all plans.

  • high: Fast, limited thinking budget
  • xhigh: Up to 100K thinking tokens, balances depth and latency
  • max: Uncapped thinking, slowest and most expensive

For most coding work, xhigh is the right setting. It gives the model enough room to work through multi-step problems without the latency of max effort. Anthropic’s testing shows xhigh at 100K tokens matches medium-effort 4.6 on quality while being faster.

To match Claude Code’s default behavior when calling the API directly:

response = client.messages.create(
    model="claude-opus-4-7",
    thinking={"type": "enabled", "budget_tokens": 100000},
    ...
)

Four Breaking Changes Before You Migrate

Opus 4.7 isn’t a drop-in replacement. Anthropic documented four API changes that can break existing integrations.

More literal instruction-following. Opus 4.7 interprets prompts more literally than 4.6. If your system prompt says “respond in JSON,” 4.7 will respond in JSON even when a brief explanation would have been more useful. Prompts that relied on 4.6’s tendency to add helpful context may need adjustment.

Stricter output format adherence. Related — 4.7 is less likely to deviate from specified formats, even when deviation would improve the response. Good for structured output pipelines, potentially annoying for conversational use cases.

Tokenizer change affects cache hit rates. If you’re using prompt caching, hit rates will drop initially after migration because the tokenized representation of your prompts has changed. Cache rebuilds over time, but expect higher costs during the transition.

Vision input handling. The new 3.75MP image support changes how the model processes image tokens. If you have hardcoded token estimates for image inputs, recalculate.

Pricing in Context

At $5/$25 per million tokens, Opus 4.7 is the premium tier. Here’s where it sits:

ModelInput / 1MOutput / 1MSWE-bench Verified
Claude Opus 4.7$5.00$25.0087.6%
Claude Opus 4.6$5.00$25.0080.8%
GPT-5.4$2.50$15.00~57.7%
Gemini 3.1 Pro$1.25$10.00~54.2%
Claude Sonnet 4.6$3.00$15.0079.6%

Prices via ofox.ai/models, April 2026.

Sonnet 4.6 is worth a second look here. It costs 40% less on both input and output, and scores 79.6% on SWE-bench Verified — 8 points behind Opus 4.7. For most production workloads, that gap doesn’t justify the price difference. Opus 4.7 earns its premium on the hardest tasks: complex multi-file refactoring, long autonomous agent runs, and vision-heavy workflows.

If you’re on Opus 4.6 and happy with results, upgrading to 4.7 gets you better performance at the same sticker price — just account for the tokenizer overhead. If you’re on Sonnet 4.6 and considering a step up, the question is whether your workload actually hits the ceiling where Opus’s extra capability pays off.

When to Upgrade vs. When to Wait

Upgrade immediately if you’re doing vision-heavy work. The 3x resolution improvement is the biggest single change in this release, and if you’ve been working around Claude’s image limitations, that workaround is now unnecessary. Same if you’re starting a new project — there’s no reason to start on 4.6.

Test before migrating if you have production prompts tuned for 4.6’s behavior, or if you’re using prompt caching. Cache hit rates will drop temporarily after migration. Run a representative sample of your prompts through 4.7 first and check the output quality and token counts before committing.

Stay on 4.6 for now if you’re cost-sensitive and the performance gains don’t justify 10-20% higher effective costs on code-heavy workloads. The model is better, but “better” doesn’t always mean “worth paying more for.”

Accessing Opus 4.7 via ofox.ai

The model ID is anthropic/claude-opus-4.7. Through ofox.ai, it’s available on the same OpenAI-compatible endpoint as every other model — no separate Anthropic account or billing required.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ofox.ai/v1",
    api_key="your-ofox-key"
)

response = client.chat.completions.create(
    model="anthropic/claude-opus-4.7",
    messages=[{"role": "user", "content": "Review this code..."}]
)

For thinking/xhigh features, use the Anthropic native protocol:

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.ofox.ai/anthropic",
    api_key="your-ofox-key"
)

Going through an aggregator also makes A/B testing easier. You can compare Opus 4.7 against 4.6 or Sonnet 4.6 with the same API key and endpoint, without juggling multiple billing accounts. Useful during the migration period when you’re figuring out whether 4.7’s improvements justify the tokenizer overhead for your specific workload.

Verdict

Opus 4.7 is the best coding model available right now. The SWE-bench numbers are real, the vision upgrade is substantial, and xhigh effort is a better default than anything 4.6 offered.

“Same price” is technically accurate and practically misleading. Budget for 10-20% higher costs on code-heavy workloads, test your existing prompts before migrating production, and watch cache hit rates for the first week after switching.

For new projects, start on 4.7. For existing production systems, a careful migration over a week or two is the right call.


Related: Claude Opus 4.6 API Review — the predecessor that set the bar. Claude vs GPT vs Gemini: How to Pick the Right Model — full comparison across all frontier models. Best AI Model for Coding 2026 — where Opus 4.7 fits in the coding model landscape.