Grok API: Pricing, Setup & Access Guide (2026)

Grok API: Pricing, Setup & Access Guide (2026)

TL;DR — Three Grok models are worth knowing: Grok 4.1 Fast ($0.20/M input, 2M context) for high-volume work, Grok 4.20 ($2.00/M input) for deep reasoning, and Grok Code Fast 1 ($0.20/M input, 256K context) for coding agents. All three are OpenAI-compatible. Getting to a working API call is a two-line config change in your existing OpenAI SDK setup.

Grok 4.1 Fast gives you a 2-million-token context window at $0.20/M input — that’s 5x more context than GPT-5.4 Mini at a lower price, with real-time X search built in.

What the Grok API Actually Offers

xAI’s API covers three distinct use cases: a cost-efficient general model with an unusually large context window, a flagship reasoning model with a built-in multi-agent architecture, and a coding-specific model optimized for agentic workflows.

Function calling, structured output, image input, and streaming all follow the same schema as the OpenAI Chat Completions API. If you’re already using any OpenAI-compatible SDK, switching to Grok is a two-line change.

Pricing: Every Model Compared

All prices verified via ofox.ai/models, April 2026.

ModelModel IDInput / 1MOutput / 1MContext
Grok 4.1 Fastgrok-4-1-fast$0.20$0.502M
Grok 4.20grok-4.20$2.00$6.002M
Grok 4.20 Multi-Agentgrok-4.20-multi-agent$2.00$6.002M
Grok Code Fast 1grok-code-fast-1$0.20$1.50256K

Prompt caching is automatic — no configuration needed. Cache hit prices vary by model: Grok 4.1 Fast drops to $0.05/M (75% off), Grok 4.20 drops to $0.20/M (90% off from $2.00), and Grok Code Fast 1 drops to $0.02/M (90% off from $0.20). For long, repeated system prompts, this makes the effective cost significantly lower than the sticker price.

Built-in tool fees (web search, X search, code execution): $2.50–$5.00 per 1,000 successful tool calls. These are charged separately from token costs.

How Grok Pricing Compares

ModelInput / 1MOutput / 1MContext
Grok 4.1 Fast$0.20$0.502M
Grok Code Fast 1$0.20$1.50256K
DeepSeek V4$0.30$0.501M
Gemini 3.1 Flash$0.30$1.501M
GPT-5.4 Mini~$0.75$4.50400K
Claude Sonnet 4.6$3.00$15.00200K
Grok 4.20$2.00$6.002M

Sources: ofox.ai/models, docs.x.ai/docs/models, April 2026.

Grok 4.1 Fast and Grok Code Fast 1 both land at $0.20 input, same tier as DeepSeek V4. Grok 4.1 Fast has a 2M context window vs DeepSeek’s 1M; Grok Code Fast 1 has 256K, tuned for large codebases. For a full cross-model cost breakdown, see our AI API cost reduction guide.

The Three Models Worth Knowing

Grok 4.1 Fast — the everyday workhorse

At $0.20/M input, Grok 4.1 Fast is cheaper than GPT-5.4 Mini per token, but with 5x more context. The 2M token window means you can load an entire codebase, a long document collection, or a multi-day conversation history without truncating.

Two things set it apart from other $0.20-tier models: real-time web search via X (no knowledge cutoff for current events), and automatic prompt caching that requires zero configuration. Reasoning and non-reasoning modes are both available — toggle based on whether the task needs deliberate step-by-step thinking or fast pattern matching.

Grok 4.20 — multi-agent reasoning

Grok 4.20 is the only model currently available via API that exposes a multi-agent architecture at the call level. One request dispatches four internal agents — Grok, Harper, Benjamin, and Lucas — that cross-check each other’s reasoning and actively debate conclusions to reduce hallucinations.

At $2.00/M input, it’s 10x more expensive than Grok 4.1 Fast. For most tasks, that premium isn’t justified. For high-stakes analysis — technical due diligence, research synthesis, complex decision support — the multi-perspective output is meaningfully better than a single-model call. The grok-4.20-multi-agent model ID explicitly routes to this architecture.

Grok Code Fast 1 — for coding agents

Grok Code Fast 1 (grok-code-fast-1) is purpose-built for agentic coding workflows. The 256K context window holds large codebases in memory across multi-step tool calls. At $0.20/M input and $1.50/M output, it’s priced for high-volume use in CI pipelines, code review agents, and IDE integrations.

It supports function calling, structured output, and streaming — the full toolkit for building coding agents. For a comparison of how it stacks up against Claude and GPT on actual coding benchmarks, see our best AI model for coding guide.

Setup: Two Ways to Access the Grok API

ofox.ai exposes the full Grok model family through an OpenAI-compatible endpoint. One API key covers Grok, Claude, GPT, Gemini, and 100+ other models — no separate xAI account needed.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ofox.ai/v1",
    api_key="your-ofox-api-key"
)

response = client.chat.completions.create(
    model="grok-4-1-fast",
    messages=[{"role": "user", "content": "Explain prompt caching in one paragraph."}]
)
print(response.choices[0].message.content)

JavaScript/TypeScript:

import OpenAI from "openai";

const client = new OpenAI({
    baseURL: "https://api.ofox.ai/v1",
    apiKey: "your-ofox-api-key"
});

const response = await client.chat.completions.create({
    model: "grok-4-1-fast",
    messages: [{ role: "user", content: "Explain prompt caching in one paragraph." }]
});

If you’re already using any model through ofox.ai, switching to Grok is changing one string. For a full walkthrough of migrating from OpenAI SDK to ofox.ai, see our OpenAI SDK migration guide.

Option 2: Through xAI directly

xAI’s official endpoint is https://api.x.ai/v1. New accounts receive free credits to get started — check x.ai/api for current amounts. Enrolling in xAI’s Data Sharing Program adds additional monthly free usage.

client = OpenAI(
    base_url="https://api.x.ai/v1",
    api_key="your-xai-api-key"
)

The direct route gets you the fastest access to new model releases and exclusive features like Live Search and X Search. The tradeoff: a separate billing relationship, and payment requires an international credit card.

When to Use Which Model

Context size and cost tolerance drive most of this decision:

  • High-volume pipelines, RAG, long-document tasks → Grok 4.1 Fast. The 2M context and $0.20 input price are hard to beat at this tier.
  • Coding agents, IDE integrations, code review → Grok Code Fast 1. The 256K window and coding-specific tuning make it the right fit.
  • Complex analysis, research, high-stakes decisions → Grok 4.20 or Grok 4.20 Multi-Agent. The 10x price premium is only worth it when output quality genuinely matters more than cost.
  • Real-time information → Any Grok model. The X integration gives you current data that models with fixed knowledge cutoffs can’t match.

For a broader comparison of how Grok fits into the current model landscape, see our LLM leaderboard and model comparison guide.

Practical Notes

Prompt caching is automatic. Unlike Anthropic’s explicit cache control headers, Grok caches repeated prefixes without any configuration. Cache hit rates differ by model: Grok 4.1 Fast saves 75% ($0.05/M), Grok 4.20 saves 90% ($0.20/M), and Grok Code Fast 1 saves 90% ($0.02/M). If your system prompt is long and consistent across requests, you’re already paying well below the sticker price.

Tool call pricing is separate. The $2.50–$5.00/1,000 fee applies to built-in tools (web search, X search, code execution). Standard function calling with your own tools is charged at normal token rates.

The 2M token window is real, but very long contexts can affect output quality on complex reasoning tasks. For most practical workloads — codebases, document collections, long conversations — you won’t hit the ceiling.

New xAI accounts have conservative rate limits. If you’re building production workloads, plan for this or use an API gateway like ofox.ai that manages rate limits across multiple upstream providers.

Getting Started

  1. Get an API key at ofox.ai (covers Grok + all other models) or x.ai/api (xAI direct, free credits for new accounts)
  2. Set base_url to https://api.ofox.ai/v1 or https://api.x.ai/v1
  3. Use model ID grok-4-1-fast for general tasks, grok-code-fast-1 for coding, grok-4.20 for deep reasoning

The Grok API’s real unlock for developers isn’t just the pricing — it’s a 2M context window at the $0.20 tier that makes whole-codebase and whole-document reasoning practical without the cost math falling apart.

For teams already using an API gateway, adding Grok is a one-line model ID change. For teams evaluating whether to consolidate API providers, our AI API aggregation guide covers the tradeoffs.