Claude API Pricing: Complete Breakdown 2026

Claude API Pricing: Complete Breakdown 2026

TL;DR — Claude has six active API models in 2026. Haiku 4.5 starts at $1/$5 per million tokens; Opus 4.7 tops out at $5/$25. The sticker prices are identical across the Opus generation (4.5, 4.6, 4.7), but Opus 4.7’s new tokenizer means the same code prompt costs 5-35% more in practice. Here’s the full breakdown with a decision guide.

The Full Pricing Table

All prices via ofox.ai/models, verified April 20, 2026.

ModelModel IDInput / 1MOutput / 1MContext
Claude Haiku 4.5anthropic/claude-haiku-4.5$1.00$5.00200K
Claude Sonnet 4.5anthropic/claude-sonnet-4.5$3.00$15.00200K
Claude Sonnet 4.6anthropic/claude-sonnet-4.6$3.00$15.00200K
Claude Opus 4.5anthropic/claude-opus-4.5$5.00$25.00200K
Claude Opus 4.6anthropic/claude-opus-4.6$5.00$25.00200K
Claude Opus 4.7anthropic/claude-opus-4.7$5.00$25.00200K

A few things stand out. First, Sonnet 4.5 and 4.6 are priced identically — if you’re starting fresh, use 4.6. Second, all three Opus generations share the same list price. The difference is capability: Opus 4.7 scores 87.6% on SWE-bench Verified, 4.6 scores 80.8%, and 4.5 is behind both. Third, the Haiku-to-Sonnet jump is 3x on input and 3x on output — a meaningful step, not a rounding error.

Pull quote: “Sonnet 4.6 scores 79.6% on SWE-bench Verified at 40% less than Opus 4.7. For most production workloads, that gap doesn’t justify the premium.”

What You Actually Pay: The Tokenizer Factor

List prices are clean. Real costs are messier.

Claude Opus 4.7 ships with a new tokenizer that maps the same content to more tokens than 4.6 did. Anthropic’s migration guide puts the range at 1.0-1.35x. In practice:

  • Natural language prose: ~1.0-1.05x (negligible)
  • Mixed code and text: ~1.1-1.2x (10-20% more)
  • Dense Python or TypeScript: ~1.2-1.35x (20-35% more)

A team spending $2,000/month on Opus 4.6 for a code review pipeline should budget $2,200-2,700/month for the same volume on 4.7. The performance gains are real — but “same price” is technically accurate and practically misleading.

Haiku and Sonnet don’t have this issue. Their tokenizers are stable across generations.

Prompt Caching: Where the Real Savings Are

Prompt caching is the most underused cost lever in the Claude API. The math:

  • Cache write: base input price × 1.25
  • Cache read: base input price × 0.10

So for Sonnet 4.6 ($3.00/M input):

  • Cache write: $3.75/M
  • Cache read: $0.30/M

If you have a 10,000-token system prompt that you send with every request, and you make 1,000 requests per day:

  • Without caching: 10,000 × 1,000 × $3.00/M = $30/day
  • With caching (1 write + 999 reads): ($3.75 + 999 × $0.30) / 1,000 × 10,000 = ~$3.04/day

That’s roughly 90% savings on the system prompt portion. For RAG pipelines with large context windows, the numbers get even more dramatic.

Caching requires the cache_control parameter in your messages. The minimum cacheable block is 2,048 tokens for Sonnet 4.6, and 4,096 tokens for Opus (4.5/4.6/4.7) and Haiku 4.5.

Picking the Right Model

The decision isn’t just about price — it’s about where each model hits its ceiling.

Use Haiku 4.5 for:

  • Classification and routing
  • Simple summarization
  • High-volume, low-complexity tasks where latency matters
  • Anything where you’re making thousands of calls per hour

Use Sonnet 4.6 for:

  • Most production workloads
  • Code generation and review (79.6% SWE-bench Verified)
  • Customer-facing applications where quality matters but Opus is overkill
  • The default choice when you’re not sure

Use Opus 4.7 for:

  • Complex multi-file refactoring
  • Long autonomous agent runs
  • Vision-heavy workflows (98.5% vision accuracy, 3.75MP images)
  • Tasks where you’ve tested Sonnet and it’s actually hitting a ceiling

The honest answer for most teams: start with Sonnet 4.6, benchmark against your actual workload, and only move to Opus if you can measure the quality difference. The 8-point SWE-bench gap between Sonnet 4.6 and Opus 4.7 is real, but most production tasks don’t exercise the tail where that gap shows up.

Estimating Your Monthly Bill

Token counts vary by task, but rough benchmarks:

TaskTypical Input TokensTypical Output Tokens
Short Q&A200-500100-300
Code review (single file)2,000-5,000500-1,500
Document summarization5,000-20,000500-2,000
Agent loop (per step)3,000-10,000500-2,000

For a code review pipeline processing 500 files/day with Sonnet 4.6:

  • Input: 500 × 3,500 avg = 1.75M tokens × $3.00/M = $5.25/day
  • Output: 500 × 1,000 avg = 0.5M tokens × $15.00/M = $7.50/day
  • Total: ~$12.75/day, ~$383/month

Add prompt caching for the system prompt and you’d cut that by 20-30%.

Accessing Claude via ofox.ai

Through ofox.ai, all Claude models are available on an OpenAI-compatible endpoint. One API key, no separate Anthropic account:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ofox.ai/v1",
    api_key="your-ofox-key"
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[{"role": "user", "content": "Review this code..."}]
)

Switch models by changing the model parameter. The same key works for anthropic/claude-haiku-4.5, anthropic/claude-opus-4.7, and every other model on the platform.

For Opus 4.7’s extended thinking features (xhigh effort, 100K thinking tokens), use the Anthropic native protocol:

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.ofox.ai/anthropic",
    api_key="your-ofox-key"
)

The aggregator approach also makes A/B testing between models straightforward — same endpoint, same key, just change the model ID.


Related: Claude Opus 4.7 API Review — deep dive on the tokenizer change and xhigh effort level. Claude Opus 4.6 API Review — the previous generation benchmark. How to Reduce AI API Costs — prompt caching, batching, and model tiering strategies. Best AI Model for Coding 2026 — where Claude fits against GPT and Gemini on coding tasks.