How much does the Claude API cost?

It depends on the model. Claude Haiku 4.5 is the cheapest at $1 per million input tokens and $5 per million output tokens. Claude Sonnet 4.6 runs $3/$15 per million tokens. Claude Opus 4.7 is the premium tier at $5/$25 per million tokens. All prices are via ofox.ai as of April 2026.

What is the cheapest Claude model for API use?

Claude Haiku 4.5 at $1/M input and $5/M output tokens. It handles classification, summarization, and simple Q&A well. For tasks that need stronger reasoning, Sonnet 4.6 at $3/$15 is the next step up — it scores 79.6% on SWE-bench Verified at 40% less than Opus.

Does Claude API pricing include prompt caching?

Prompt caching is a separate feature. Cache writes cost 25% more than base input price; cache reads cost 90% less. For long system prompts reused across many requests, caching typically cuts total input costs by 60-80%.

How do I access Claude API without an Anthropic account?

Through ofox.ai, you get an OpenAI-compatible endpoint at api.ofox.ai/v1. One API key covers all Claude models plus GPT, Gemini, and others. No separate Anthropic billing account required.

What is the difference between Claude Sonnet 4.6 and Opus 4.7 in terms of cost?

Sonnet 4.6 costs $3/$15 per million tokens; Opus 4.7 costs $5/$25. Opus is 67% more expensive on input and 67% more on output. The performance gap is real — Opus 4.7 scores 87.6% on SWE-bench Verified vs Sonnet 4.6's 79.6% — but for most production workloads, Sonnet 4.6 hits the ceiling less often than you'd expect.

Claude API Pricing: Complete Breakdown 2026

TL;DR — Claude has six active API models in 2026. Haiku 4.5 starts at $1/$5 per million tokens; Opus 4.7 tops out at $5/$25. The sticker prices are identical across the Opus generation (4.5, 4.6, 4.7), but Opus 4.7’s new tokenizer means the same code prompt costs 5-35% more in practice. Here’s the full breakdown with a decision guide.

The Full Pricing Table

All prices via ofox.ai/models, verified April 20, 2026.

Model	Model ID	Input / 1M	Output / 1M	Context
Claude Haiku 4.5	`anthropic/claude-haiku-4.5`	$1.00	$5.00	200K
Claude Sonnet 4.5	`anthropic/claude-sonnet-4.5`	$3.00	$15.00	200K
Claude Sonnet 4.6	`anthropic/claude-sonnet-4.6`	$3.00	$15.00	200K
Claude Opus 4.5	`anthropic/claude-opus-4.5`	$5.00	$25.00	200K
Claude Opus 4.6	`anthropic/claude-opus-4.6`	$5.00	$25.00	200K
Claude Opus 4.7	`anthropic/claude-opus-4.7`	$5.00	$25.00	200K

A few things stand out. First, Sonnet 4.5 and 4.6 are priced identically — if you’re starting fresh, use 4.6. Second, all three Opus generations share the same list price. The difference is capability: Opus 4.7 scores 87.6% on SWE-bench Verified, 4.6 scores 80.8%, and 4.5 is behind both. Third, the Haiku-to-Sonnet jump is 3x on input and 3x on output — a meaningful step, not a rounding error.

Pull quote: “Sonnet 4.6 scores 79.6% on SWE-bench Verified at 40% less than Opus 4.7. For most production workloads, that gap doesn’t justify the premium.”

What You Actually Pay: The Tokenizer Factor

List prices are clean. Real costs are messier.

Claude Opus 4.7 ships with a new tokenizer that maps the same content to more tokens than 4.6 did. Anthropic’s migration guide puts the range at 1.0-1.35x. In practice:

Natural language prose: ~1.0-1.05x (negligible)
Mixed code and text: ~1.1-1.2x (10-20% more)
Dense Python or TypeScript: ~1.2-1.35x (20-35% more)

A team spending $2,000/month on Opus 4.6 for a code review pipeline should budget $2,200-2,700/month for the same volume on 4.7. The performance gains are real — but “same price” is technically accurate and practically misleading.

Haiku and Sonnet don’t have this issue. Their tokenizers are stable across generations.

Prompt Caching: Where the Real Savings Are

Prompt caching is the most underused cost lever in the Claude API. The math:

Cache write: base input price × 1.25
Cache read: base input price × 0.10

So for Sonnet 4.6 ($3.00/M input):

Cache write: $3.75/M
Cache read: $0.30/M

If you have a 10,000-token system prompt that you send with every request, and you make 1,000 requests per day:

Without caching: 10,000 × 1,000 × $3.00/M = $30/day
With caching (1 write + 999 reads): ($3.75 + 999 × $0.30) / 1,000 × 10,000 = ~$3.04/day

That’s roughly 90% savings on the system prompt portion. For RAG pipelines with large context windows, the numbers get even more dramatic.

Caching requires the cache_control parameter in your messages. The minimum cacheable block is 2,048 tokens for Sonnet 4.6, and 4,096 tokens for Opus (4.5/4.6/4.7) and Haiku 4.5.

Picking the Right Model

The decision isn’t just about price — it’s about where each model hits its ceiling.

Use Haiku 4.5 for:

Classification and routing
Simple summarization
High-volume, low-complexity tasks where latency matters
Anything where you’re making thousands of calls per hour

Use Sonnet 4.6 for:

Most production workloads
Code generation and review (79.6% SWE-bench Verified)
Customer-facing applications where quality matters but Opus is overkill
The default choice when you’re not sure

Use Opus 4.7 for:

Complex multi-file refactoring
Long autonomous agent runs
Vision-heavy workflows (98.5% vision accuracy, 3.75MP images)
Tasks where you’ve tested Sonnet and it’s actually hitting a ceiling

The honest answer for most teams: start with Sonnet 4.6, benchmark against your actual workload, and only move to Opus if you can measure the quality difference. The 8-point SWE-bench gap between Sonnet 4.6 and Opus 4.7 is real, but most production tasks don’t exercise the tail where that gap shows up.

Estimating Your Monthly Bill

Token counts vary by task, but rough benchmarks:

Task	Typical Input Tokens	Typical Output Tokens
Short Q&A	200-500	100-300
Code review (single file)	2,000-5,000	500-1,500
Document summarization	5,000-20,000	500-2,000
Agent loop (per step)	3,000-10,000	500-2,000

For a code review pipeline processing 500 files/day with Sonnet 4.6:

Input: 500 × 3,500 avg = 1.75M tokens × $3.00/M = $5.25/day
Output: 500 × 1,000 avg = 0.5M tokens × $15.00/M = $7.50/day
Total: ~$12.75/day, ~$383/month

Add prompt caching for the system prompt and you’d cut that by 20-30%.

Accessing Claude via ofox.ai

Through ofox.ai, all Claude models are available on an OpenAI-compatible endpoint. One API key, no separate Anthropic account:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ofox.ai/v1",
    api_key="your-ofox-key"
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[{"role": "user", "content": "Review this code..."}]
)

Switch models by changing the model parameter. The same key works for anthropic/claude-haiku-4.5, anthropic/claude-opus-4.7, and every other model on the platform.

For Opus 4.7’s extended thinking features (xhigh effort, 100K thinking tokens), use the Anthropic native protocol:

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.ofox.ai/anthropic",
    api_key="your-ofox-key"
)

The aggregator approach also makes A/B testing between models straightforward — same endpoint, same key, just change the model ID.

Related: Claude Haiku 4 API Guide — when to use Haiku vs Sonnet vs Opus for maximum cost savings. Claude Opus 4.7 API Review — deep dive on the tokenizer change and xhigh effort level. Claude Opus 4.6 API Review — the previous generation benchmark. How to Reduce AI API Costs — prompt caching, batching, and model tiering strategies. Best AI Model for Coding 2026 — where Claude fits against GPT and Gemini on coding tasks.

The Full Pricing Table

What You Actually Pay: The Tokenizer Factor

Prompt Caching: Where the Real Savings Are

Picking the Right Model

Estimating Your Monthly Bill

Accessing Claude via ofox.ai

Related Articles

Claude Opus 4.7 Review: Real Costs & What Changed

Claude Opus 4.6 API Review: Pricing, Strengths, and When It's Worth the Premium

Claude Fable 5 vs Sonnet 5 (2026): 5x Pricier, When It Pays