How much does the Grok API cost?

Grok 4.1 Fast is the cheapest at $0.20/M input and $0.50/M output tokens, with a 2M token context window. Grok 4.20 (flagship reasoning) runs $2.00/M input and $6.00/M output. Grok Code Fast 1 is $0.20/M input and $1.50/M output with a 256K context window. Prices via ofox.ai as of April 2026.

What is the Grok API base URL?

xAI's official endpoint is https://api.x.ai/v1. If you're using ofox.ai as an API gateway, the endpoint is https://api.ofox.ai/v1 — same OpenAI-compatible format, one key for all models.

What model ID do I use for Grok in the API?

Through ofox.ai: grok-4-1-fast for the fast tier, grok-4.20 for the flagship reasoning model, and grok-code-fast-1 for the coding-optimized model. Through xAI directly, model IDs follow the same naming convention.

Does Grok API support the OpenAI SDK?

Yes. Both xAI's endpoint and ofox.ai's gateway are OpenAI-compatible. You only need to change base_url and api_key — the rest of your code stays the same.

What makes Grok 4.1 Fast stand out?

The 2M token context window at $0.20/M input is the main differentiator. That's 5x more context than GPT-5.4 Mini at a lower per-token price. It also has built-in real-time web search via X, which no other model at this price point offers natively.

Grok API: Pricing, Setup & Access Guide (2026)

TL;DR — Three Grok models are worth knowing: Grok 4.1 Fast ($0.20/M input, 2M context) for high-volume work, Grok 4.20 ($2.00/M input) for deep reasoning, and Grok Code Fast 1 ($0.20/M input, 256K context) for coding agents. All three are OpenAI-compatible. Getting to a working API call is a two-line config change in your existing OpenAI SDK setup.

Grok 4.1 Fast gives you a 2-million-token context window at $0.20/M input — that’s 5x more context than GPT-5.4 Mini at a lower price, with real-time X search built in.

What the Grok API Actually Offers

xAI’s API covers three distinct use cases: a cost-efficient general model with an unusually large context window, a flagship reasoning model with a built-in multi-agent architecture, and a coding-specific model optimized for agentic workflows.

Function calling, structured output, image input, and streaming all follow the same schema as the OpenAI Chat Completions API. If you’re already using any OpenAI-compatible SDK, switching to Grok is a two-line change.

Pricing: Every Model Compared

All prices verified via ofox.ai/models, April 2026.

Model	Model ID	Input / 1M	Output / 1M	Context
Grok 4.1 Fast	`grok-4-1-fast`	$0.20	$0.50	2M
Grok 4.20	`grok-4.20`	$2.00	$6.00	2M
Grok 4.20 Multi-Agent	`grok-4.20-multi-agent`	$2.00	$6.00	2M
Grok Code Fast 1	`grok-code-fast-1`	$0.20	$1.50	256K

Prompt caching is automatic — no configuration needed. Cache hit prices vary by model: Grok 4.1 Fast drops to $0.05/M (75% off), Grok 4.20 drops to $0.20/M (90% off from $2.00), and Grok Code Fast 1 drops to $0.02/M (90% off from $0.20). For long, repeated system prompts, this makes the effective cost significantly lower than the sticker price.

Built-in tool fees (web search, X search, code execution): $2.50–$5.00 per 1,000 successful tool calls. These are charged separately from token costs.

How Grok Pricing Compares

Model	Input / 1M	Output / 1M	Context
Grok 4.1 Fast	$0.20	$0.50	2M
Grok Code Fast 1	$0.20	$1.50	256K
DeepSeek V4	$0.30	$0.50	1M
Gemini 3.1 Flash	$0.30	$1.50	1M
GPT-5.4 Mini	~$0.75	$4.50	400K
Claude Sonnet 4.6	$3.00	$15.00	200K
Grok 4.20	$2.00	$6.00	2M

Sources: ofox.ai/models, docs.x.ai/docs/models, April 2026.

Grok 4.1 Fast and Grok Code Fast 1 both land at $0.20 input, same tier as DeepSeek V4. Grok 4.1 Fast has a 2M context window vs DeepSeek’s 1M; Grok Code Fast 1 has 256K, tuned for large codebases. For a full cross-model cost breakdown, see our AI API cost reduction guide.

The Three Models Worth Knowing

Grok 4.1 Fast — the everyday workhorse

At $0.20/M input, Grok 4.1 Fast is cheaper than GPT-5.4 Mini per token, but with 5x more context. The 2M token window means you can load an entire codebase, a long document collection, or a multi-day conversation history without truncating.

Two things set it apart from other $0.20-tier models: real-time web search via X (no knowledge cutoff for current events), and automatic prompt caching that requires zero configuration. Reasoning and non-reasoning modes are both available — toggle based on whether the task needs deliberate step-by-step thinking or fast pattern matching.

Grok 4.20 — multi-agent reasoning

Grok 4.20 is the only model currently available via API that exposes a multi-agent architecture at the call level. One request dispatches four internal agents — Grok, Harper, Benjamin, and Lucas — that cross-check each other’s reasoning and actively debate conclusions to reduce hallucinations.

At $2.00/M input, it’s 10x more expensive than Grok 4.1 Fast. For most tasks, that premium isn’t justified. For high-stakes analysis — technical due diligence, research synthesis, complex decision support — the multi-perspective output is meaningfully better than a single-model call. The grok-4.20-multi-agent model ID explicitly routes to this architecture.

Grok Code Fast 1 — for coding agents

Grok Code Fast 1 (grok-code-fast-1) is purpose-built for agentic coding workflows. The 256K context window holds large codebases in memory across multi-step tool calls. At $0.20/M input and $1.50/M output, it’s priced for high-volume use in CI pipelines, code review agents, and IDE integrations.

It supports function calling, structured output, and streaming — the full toolkit for building coding agents. For a comparison of how it stacks up against Claude and GPT on actual coding benchmarks, see our best AI model for coding guide.

Setup: Two Ways to Access the Grok API

Option 1: Through ofox.ai (recommended for most developers)

ofox.ai exposes the full Grok model family through an OpenAI-compatible endpoint. One API key covers Grok, Claude, GPT, Gemini, and 100+ other models — no separate xAI account needed.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ofox.ai/v1",
    api_key="your-ofox-api-key"
)

response = client.chat.completions.create(
    model="grok-4-1-fast",
    messages=[{"role": "user", "content": "Explain prompt caching in one paragraph."}]
)
print(response.choices[0].message.content)

JavaScript/TypeScript:

import OpenAI from "openai";

const client = new OpenAI({
    baseURL: "https://api.ofox.ai/v1",
    apiKey: "your-ofox-api-key"
});

const response = await client.chat.completions.create({
    model: "grok-4-1-fast",
    messages: [{ role: "user", content: "Explain prompt caching in one paragraph." }]
});

If you’re already using any model through ofox.ai, switching to Grok is changing one string. For a full walkthrough of migrating from OpenAI SDK to ofox.ai, see our OpenAI SDK migration guide.

Option 2: Through xAI directly

xAI’s official endpoint is https://api.x.ai/v1. New accounts receive free credits to get started — check x.ai/api for current amounts. Enrolling in xAI’s Data Sharing Program adds additional monthly free usage.

client = OpenAI(
    base_url="https://api.x.ai/v1",
    api_key="your-xai-api-key"
)

The direct route gets you the fastest access to new model releases and exclusive features like Live Search and X Search. The tradeoff: a separate billing relationship, and payment requires an international credit card.

When to Use Which Model

Context size and cost tolerance drive most of this decision:

High-volume pipelines, RAG, long-document tasks → Grok 4.1 Fast. The 2M context and $0.20 input price are hard to beat at this tier.
Coding agents, IDE integrations, code review → Grok Code Fast 1. The 256K window and coding-specific tuning make it the right fit.
Complex analysis, research, high-stakes decisions → Grok 4.20 or Grok 4.20 Multi-Agent. The 10x price premium is only worth it when output quality genuinely matters more than cost.
Real-time information → Any Grok model. The X integration gives you current data that models with fixed knowledge cutoffs can’t match.

For a broader comparison of how Grok fits into the current model landscape, see our LLM leaderboard and model comparison guide.

Practical Notes

Prompt caching is automatic. Unlike Anthropic’s explicit cache control headers, Grok caches repeated prefixes without any configuration. Cache hit rates differ by model: Grok 4.1 Fast saves 75% ($0.05/M), Grok 4.20 saves 90% ($0.20/M), and Grok Code Fast 1 saves 90% ($0.02/M). If your system prompt is long and consistent across requests, you’re already paying well below the sticker price.

Tool call pricing is separate. The $2.50–$5.00/1,000 fee applies to built-in tools (web search, X search, code execution). Standard function calling with your own tools is charged at normal token rates.

The 2M token window is real, but very long contexts can affect output quality on complex reasoning tasks. For most practical workloads — codebases, document collections, long conversations — you won’t hit the ceiling.

New xAI accounts have conservative rate limits. If you’re building production workloads, plan for this or use an API gateway like ofox.ai that manages rate limits across multiple upstream providers.

Getting Started

Get an API key at ofox.ai (covers Grok + all other models) or x.ai/api (xAI direct, free credits for new accounts)
Set base_url to https://api.ofox.ai/v1 or https://api.x.ai/v1
Use model ID grok-4-1-fast for general tasks, grok-code-fast-1 for coding, grok-4.20 for deep reasoning

The Grok API’s real unlock for developers isn’t just the pricing — it’s a 2M context window at the $0.20 tier that makes whole-codebase and whole-document reasoning practical without the cost math falling apart.

For teams already using an API gateway, adding Grok is a one-line model ID change. For teams evaluating whether to consolidate API providers, our AI API aggregation guide covers the tradeoffs.