Cached Contents

Manage explicit context caches (cachedContents) via the Gemini native protocol: actively cache a large context as an object, reference it across requests for deterministic hits and lower cost. OfoxAI is compatible with the Google GenAI SDK.

For use cases, the difference from implicit caching, and best practices, see the Gemini Explicit Caching guide. This page is the endpoint-level API reference.

Endpoints


POST   https://api.ofox.ai/gemini/v1beta/cachedContents              # Create
GET    https://api.ofox.ai/gemini/v1beta/cachedContents/{id}         # Get
DELETE https://api.ofox.ai/gemini/v1beta/cachedContents/{id}         # Delete

To reference a cache for content generation, use the standard generateContent endpoint with a cachedContent field in the body:


POST   https://api.ofox.ai/gemini/v1beta/models/{model}:generateContent

Authentication

Use the x-goog-api-key header:


x-goog-api-key: <YOUR_OFOXAI_API_KEY>

Resource Fields

Key fields of the CachedContent resource:

Field	Type	Description
`name`	string (output only)	Cache handle, e.g. `cachedContents/{id}`, returned on create
`model`	string (required, immutable)	Model the cache is bound to, e.g. `models/gemini-3.1-pro-preview`
`contents`	array	Content to cache (same structure as generateContent’s `contents`)
`systemInstruction`	object	System instruction to cache (optional)
`tools`	array	Tool definitions to cache (optional)
`ttl`	string	Time to live, a seconds string (e.g. `"600s"`); mutually exclusive with `expireTime`
`expireTime`	string	Expiration timestamp (RFC 3339); mutually exclusive with `ttl`
`displayName`	string (immutable)	Custom name (optional)
`usageMetadata.totalTokenCount`	integer	Number of cached tokens (used for billing)

Supported TTL range: minimum / default 600s (10 minutes), maximum 3600s (1 hour).

Create a Cache

Python

create.py


from google import genai
from google.genai import types
 
client = genai.Client(
    api_key="<YOUR_OFOXAI_API_KEY>",
    http_options={"api_version": "v1beta", "base_url": "https://api.ofox.ai/gemini"},
)
 
cache = client.caches.create(
    model="google/gemini-3.1-pro-preview",
    config=types.CreateCachedContentConfig(
        contents=[open("knowledge_base.txt").read()],
        system_instruction="You answer strictly based on the provided document.",
        ttl="600s",
        display_name="kb-v1",
    ),
)
 
print(cache.name)                          # cachedContents/xxxxxxxx
print(cache.usage_metadata.total_token_count)

TypeScript

create.ts


import { GoogleGenAI } from '@google/genai'
import fs from 'node:fs'
 
const ai = new GoogleGenAI({
  apiKey: '<YOUR_OFOXAI_API_KEY>',
  httpOptions: { apiVersion: 'v1beta', baseUrl: 'https://api.ofox.ai/gemini' },
})
 
const cache = await ai.caches.create({
  model: 'google/gemini-3.1-pro-preview',
  config: {
    contents: [fs.readFileSync('knowledge_base.txt', 'utf-8')],
    systemInstruction: 'You answer strictly based on the provided document.',
    ttl: '600s',
    displayName: 'kb-v1',
  },
})
 
console.log(cache.name) // cachedContents/xxxxxxxx
console.log(cache.usageMetadata?.totalTokenCount)

cURL

Terminal


curl "https://api.ofox.ai/gemini/v1beta/cachedContents" \
  -H "x-goog-api-key: $OFOX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3.1-pro-preview",
    "contents": [
      { "role": "user", "parts": [{ "text": "<large context to cache>" }] }
    ],
    "ttl": "600s"
  }'

Response


{
  "name": "cachedContents/xxxxxxxx",
  "model": "google/gemini-3.1-pro-preview",
  "createTime": "2026-06-26T08:00:00Z",
  "updateTime": "2026-06-26T08:00:00Z",
  "expireTime": "2026-06-26T08:10:00Z",
  "displayName": "kb-v1",
  "usageMetadata": {
    "totalTokenCount": 14407
  }
}

Get / Delete

Get and delete do not require model; OfoxAI locates the upstream from the cache handle.

Python

manage.py


# Get one
info = client.caches.get(name=cache.name)
print(info.expire_time)
 
# Delete
client.caches.delete(name=cache.name)

TypeScript

manage.ts


// Get one
const info = await ai.caches.get({ name: cache.name })
console.log(info.expireTime)
 
// Delete
await ai.caches.delete({ name: cache.name })

cURL

Terminal


# Get one
curl "https://api.ofox.ai/gemini/v1beta/cachedContents/xxxxxxxx" \
  -H "x-goog-api-key: $OFOX_API_KEY"
 
# Delete
curl -X DELETE "https://api.ofox.ai/gemini/v1beta/cachedContents/xxxxxxxx" \
  -H "x-goog-api-key: $OFOX_API_KEY"

Reference a Cache to Generate

Add a cachedContent field to the generateContent body to reference the cache; contents only carries the new question for this turn:

Python

use.py


response = client.models.generate_content(
    model="google/gemini-3.1-pro-preview",
    contents="Based on the document above, summarize three key points",
    config=types.GenerateContentConfig(cached_content=cache.name),
)
 
print(response.text)
print(response.usage_metadata.cached_content_token_count)  # cached tokens hit

TypeScript

use.ts


const response = await ai.models.generateContent({
  model: 'google/gemini-3.1-pro-preview',
  contents: 'Based on the document above, summarize three key points',
  config: { cachedContent: cache.name },
})
 
console.log(response.text)
console.log(response.usageMetadata?.cachedContentTokenCount) // cached tokens hit

cURL

Terminal


curl "https://api.ofox.ai/gemini/v1beta/models/google/gemini-3.1-pro-preview:generateContent" \
  -H "x-goog-api-key: $OFOX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "cachedContent": "cachedContents/xxxxxxxx",
    "contents": [
      { "role": "user", "parts": [{ "text": "Based on the document above, summarize three key points" }] }
    ]
  }'

On a hit, the response usageMetadata.cachedContentTokenCount shows how many tokens came from the cache.

Billing

Stage	Formula
Create cache	`totalTokenCount × cache_write rate`
Reference hit	`cachedContentTokenCount × cache_read rate` (~0.10x of standard input)
New content per reference	New prompt / output for the turn billed at standard rates

Each model’s cache_write / cache_read unit prices are in the model catalog .

OfoxAI load-balances across multiple GCP projects, and explicit caches are region-scoped. OfoxAI automatically hard-locks references back to the upstream that created the cache, with zero drift; a cache handle can only be referenced / queried / deleted by the API Key that created it (cross-account access returns 403). See Explicit Caching guide · Deterministic Routing.

Cached Contents

Endpoints

Authentication

Resource Fields

Create a Cache

Python

TypeScript

cURL

Response

Get / Delete

Python

TypeScript

cURL

Reference a Cache to Generate

Python

TypeScript

cURL

Billing

Related