DeepSeek API Pricing: Complete Breakdown & How to Cut Costs (2026)

DeepSeek API Pricing: Complete Breakdown & How to Cut Costs (2026)

TL;DR: DeepSeek V3.2 costs $0.28/M input tokens on the official API — roughly 10× cheaper than GPT-5.4 and 18× cheaper than Claude Opus 4.7 on input tokens. Prompt caching cuts that to $0.028/M on repeated context. If you want to skip the Chinese phone number requirement and use a single API key for all your models, ofox.ai carries the same model at near-identical pricing.

What Is DeepSeek V3.2?

DeepSeek V3.2 is the current flagship from DeepSeek, released in December 2025. It uses a Mixture-of-Experts (MoE) architecture with 685 billion total parameters but only 37 billion active per forward pass — which is why the inference cost is so low despite the model’s capability. Context window is 128K tokens.

On benchmarks, it hit gold-medal level on the 2025 International Mathematical Olympiad and International Olympiad in Informatics (source). For coding and reasoning, it’s in the same tier as GPT-5 and Claude Sonnet 4.6 — at a fraction of the cost.

Official DeepSeek API Pricing

The official DeepSeek API (platform.deepseek.com) exposes two model IDs, both pointing to V3.2:

Model IDModeInput (cache miss)Input (cache hit)Output
deepseek-chatStandard$0.28/M$0.028/M$0.42/M
deepseek-reasonerThinking$0.28/M$0.028/M$0.42/M

Prices are per million tokens. The cache hit rate depends on how much of your prompt is repeated across requests — system prompts and few-shot examples are the main candidates.

The 90% cache discount is automatic: DeepSeek’s API applies it whenever a prefix of your prompt matches a cached version. You don’t need to do anything special to enable it.

Both models support 128K input. Default max output is 4K tokens for deepseek-chat and 32K for deepseek-reasoner, with hard caps of 8K and 64K respectively.

How to Get a DeepSeek API Key

Sign up at platform.deepseek.com with an email address. No Chinese phone number is required for the API platform (unlike the consumer app). Once logged in, go to API Keys and generate a key.

The base URL is https://api.deepseek.com/v1 and the API is OpenAI-compatible, so you can drop it into any OpenAI SDK call by changing base_url and api_key.

One practical issue: DeepSeek’s platform has had intermittent availability problems during high-demand periods. If you’re building something production-critical, it’s worth having a fallback.

Access DeepSeek via ofox

ofox.ai carries DeepSeek V3.2 as deepseek/deepseek-v3.2 at $0.29/M input and $0.43/M output — essentially at parity with the official API. The practical difference is that you get a single API key that also covers Claude Opus 4.7, GPT-5.4, Gemini 3.1 Pro, Qwen3, and 50+ other models. No separate accounts, no separate billing.

This matters if you’re routing between models based on task type or cost — something covered in more depth in the AI API aggregation guide.

Setup is three lines:

from openai import OpenAI

client = OpenAI(api_key="sk-xxx", base_url="https://api.ofox.ai/v1")
response = client.chat.completions.create(
    model="deepseek/deepseek-v3.2",
    messages=[{"role": "user", "content": "Explain MoE architecture in one paragraph"}]
)

If you’re migrating from the OpenAI SDK, the migration guide covers the full swap in under 10 minutes.

How to Cut Your DeepSeek API Costs

The 90% cache discount on input tokens is the biggest lever available. Structure prompts so the system prompt and any static context come first — those are the parts most likely to be cached. A 2,000-token system prompt that gets cached on every call saves $0.504 per million requests at scale. You don’t configure anything; the API applies it automatically when a prefix matches.

Beyond caching: only use deepseek-reasoner when you actually need multi-step reasoning. The extended chain-of-thought adds latency and token count. For classification, summarization, or simple generation, deepseek-chat is faster and costs the same per token.

Context trimming matters too. The 128K window is generous, but you pay for every input token. RAG that pulls only the relevant chunks is cheaper than stuffing the full document — the embedding + RAG guide covers that pattern.

For offline jobs (document processing, dataset annotation, bulk summarization), parallelize with async calls during off-peak hours. The official API doesn’t have a formal batch endpoint, but async Python or Node handles this fine.

For a broader cost-reduction playbook across all models, see how to reduce AI API costs.

DeepSeek vs. Alternatives: Price Comparison

At $0.28/M input and $0.42/M output, DeepSeek V3.2 is hard to beat for general-purpose tasks. Here’s how it stacks up against models in the same capability tier (all prices via ofox.ai):

ModelInputOutputContext
DeepSeek V3.2$0.29/M$0.43/M128K
Qwen3 Max$1.20/M$6.00/M128K
GPT-5.4$2.50/M$15/M128K
Claude Sonnet 4.6$3/M$15/M200K
Claude Opus 4.7$5/M$25/M200K

DeepSeek V3.2 is roughly 8× cheaper than GPT-5.4 on input and 35× cheaper on output. For high-volume workloads where you’re not specifically tied to OpenAI or Anthropic’s ecosystem, that gap is significant.

The tradeoff: DeepSeek’s API has had reliability issues during peak periods, and the model’s English instruction-following is slightly behind Claude Sonnet 4.6 on nuanced tasks. For a full comparison, see the model comparison guide.

Practical Limits to Know

A few things that will bite you if you don’t know them upfront:

  • Rate limits vary by account tier. New accounts start conservative — expect to hit them if you’re testing at volume.
  • deepseek-reasoner supports streaming, but thinking tokens come back separately from the final answer. If you’re parsing the stream, check the docs for the format.
  • Function calling / tool use works in both deepseek-chat and deepseek-reasoner. Both models have tool call support per the official pricing page.
  • The 128K limit is input-only. Output is capped separately: 8K for standard, 64K for reasoner.

The Bottom Line

At $0.28/M input with a 90% cache discount on repeated context, DeepSeek V3.2 is the obvious pick for high-volume workloads where you don’t need Claude’s instruction-following or GPT’s ecosystem integrations. Get started at platform.deepseek.com, or use ofox.ai if you want a single key that covers DeepSeek alongside Claude, GPT, and Gemini.