Cached Contents
Manage explicit context caches (cachedContents) via the Gemini native protocol: actively cache a large context as an object, reference it across requests for deterministic hits and lower cost. OfoxAI is compatible with the Google GenAI SDK.
For use cases, the difference from implicit caching, and best practices, see the Gemini Explicit Caching guide. This page is the endpoint-level API reference.
Endpoints
POST https://api.ofox.ai/gemini/v1beta/cachedContents # Create
GET https://api.ofox.ai/gemini/v1beta/cachedContents/{id} # Get
DELETE https://api.ofox.ai/gemini/v1beta/cachedContents/{id} # DeleteTo reference a cache for content generation, use the standard generateContent endpoint with a cachedContent field in the body:
POST https://api.ofox.ai/gemini/v1beta/models/{model}:generateContentAuthentication
Use the x-goog-api-key header:
x-goog-api-key: <YOUR_OFOXAI_API_KEY>Resource Fields
Key fields of the CachedContent resource:
| Field | Type | Description |
|---|---|---|
name | string (output only) | Cache handle, e.g. cachedContents/{id}, returned on create |
model | string (required, immutable) | Model the cache is bound to, e.g. models/gemini-3.1-pro-preview |
contents | array | Content to cache (same structure as generateContent’s contents) |
systemInstruction | object | System instruction to cache (optional) |
tools | array | Tool definitions to cache (optional) |
ttl | string | Time to live, a seconds string (e.g. "600s"); mutually exclusive with expireTime |
expireTime | string | Expiration timestamp (RFC 3339); mutually exclusive with ttl |
displayName | string (immutable) | Custom name (optional) |
usageMetadata.totalTokenCount | integer | Number of cached tokens (used for billing) |
Supported TTL range: minimum / default 600s (10 minutes), maximum 3600s (1 hour).
Create a Cache
Python
from google import genai
from google.genai import types
client = genai.Client(
api_key="<YOUR_OFOXAI_API_KEY>",
http_options={"api_version": "v1beta", "base_url": "https://api.ofox.ai/gemini"},
)
cache = client.caches.create(
model="google/gemini-3.1-pro-preview",
config=types.CreateCachedContentConfig(
contents=[open("knowledge_base.txt").read()],
system_instruction="You answer strictly based on the provided document.",
ttl="600s",
display_name="kb-v1",
),
)
print(cache.name) # cachedContents/xxxxxxxx
print(cache.usage_metadata.total_token_count)Response
{
"name": "cachedContents/xxxxxxxx",
"model": "google/gemini-3.1-pro-preview",
"createTime": "2026-06-26T08:00:00Z",
"updateTime": "2026-06-26T08:00:00Z",
"expireTime": "2026-06-26T08:10:00Z",
"displayName": "kb-v1",
"usageMetadata": {
"totalTokenCount": 14407
}
}Get / Delete
Get and delete do not require model; OfoxAI locates the upstream from the cache handle.
Python
# Get one
info = client.caches.get(name=cache.name)
print(info.expire_time)
# Delete
client.caches.delete(name=cache.name)Reference a Cache to Generate
Add a cachedContent field to the generateContent body to reference the cache; contents only carries the new question for this turn:
Python
response = client.models.generate_content(
model="google/gemini-3.1-pro-preview",
contents="Based on the document above, summarize three key points",
config=types.GenerateContentConfig(cached_content=cache.name),
)
print(response.text)
print(response.usage_metadata.cached_content_token_count) # cached tokens hitOn a hit, the response usageMetadata.cachedContentTokenCount shows how many tokens came from the cache.
Billing
| Stage | Formula |
|---|---|
| Create cache | totalTokenCount × cache_write rate |
| Reference hit | cachedContentTokenCount × cache_read rate (~0.10x of standard input) |
| New content per reference | New prompt / output for the turn billed at standard rates |
Each model’s cache_write / cache_read unit prices are in the model catalog .
OfoxAI load-balances across multiple GCP projects, and explicit caches are region-scoped. OfoxAI automatically hard-locks references back to the upstream that created the cache, with zero drift; a cache handle can only be referenced / queried / deleted by the API Key that created it (cross-account access returns 403). See Explicit Caching guide · Deterministic Routing.