Cached Contents

Gemini ネイティブプロトコルを通じて明示的コンテキストキャッシュ（cachedContents）を管理します：大きなコンテキストを能動的に1つのオブジェクトとしてキャッシュし、リクエスト間で参照することで、ヒットの確定性が高く、課金も低く抑えられます。OfoxAI は Google GenAI SDK と互換性があります。

明示的キャッシュのユースケース、暗黙的キャッシュとの違い、ベストプラクティスは Gemini 明示的キャッシュガイドを参照してください。本ページはエンドポイントレベルの API リファレンスです。

エンドポイント


POST   https://api.ofox.ai/gemini/v1beta/cachedContents              # 作成
GET    https://api.ofox.ai/gemini/v1beta/cachedContents/{id}         # 取得
DELETE https://api.ofox.ai/gemini/v1beta/cachedContents/{id}         # 削除

キャッシュを参照してコンテンツを生成する場合は、標準の generateContent エンドポイントを使い、リクエストボディに cachedContent フィールドを付けます：


POST   https://api.ofox.ai/gemini/v1beta/models/{model}:generateContent

認証

x-goog-api-key Header を使用します：


x-goog-api-key: <あなたの OFOXAI_API_KEY>

リソースフィールド

CachedContent リソースの主なフィールド：

フィールド	型	説明
`name`	string（読み取り専用）	キャッシュハンドル。`cachedContents/{id}` の形式で、作成後に返されます
`model`	string（必須、変更不可）	キャッシュが紐付くモデル。例：`models/gemini-3.1-pro-preview`
`contents`	array	キャッシュする内容（generateContent の `contents` と同じ構造）
`systemInstruction`	object	キャッシュするシステム指示（任意）
`tools`	array	キャッシュするツール定義（任意）
`ttl`	string	存続時間。秒数の文字列（例：`"600s"`）。`expireTime` とどちらか一方
`expireTime`	string	有効期限の時点（RFC 3339）。`ttl` とどちらか一方
`displayName`	string（変更不可）	カスタム名（任意）
`usageMetadata.totalTokenCount`	integer	キャッシュされた token 数（課金に使用）

TTL のサポート範囲：最小 / デフォルト 600s（10分）、最大 3600s（1時間）。

キャッシュを作成する

Python

create.py


from google import genai
from google.genai import types
 
client = genai.Client(
    api_key="<あなたの OFOXAI_API_KEY>",
    http_options={"api_version": "v1beta", "base_url": "https://api.ofox.ai/gemini"},
)
 
cache = client.caches.create(
    model="google/gemini-3.1-pro-preview",
    config=types.CreateCachedContentConfig(
        contents=[open("knowledge_base.txt").read()],
        system_instruction="あなたは与えられたドキュメントのみに基づいて回答するアシスタントです。",
        ttl="600s",
        display_name="kb-v1",
    ),
)
 
print(cache.name)                          # cachedContents/xxxxxxxx
print(cache.usage_metadata.total_token_count)

TypeScript

create.ts


import { GoogleGenAI } from '@google/genai'
import fs from 'node:fs'
 
const ai = new GoogleGenAI({
  apiKey: '<あなたの OFOXAI_API_KEY>',
  httpOptions: { apiVersion: 'v1beta', baseUrl: 'https://api.ofox.ai/gemini' },
})
 
const cache = await ai.caches.create({
  model: 'google/gemini-3.1-pro-preview',
  config: {
    contents: [fs.readFileSync('knowledge_base.txt', 'utf-8')],
    systemInstruction: 'あなたは与えられたドキュメントのみに基づいて回答するアシスタントです。',
    ttl: '600s',
    displayName: 'kb-v1',
  },
})
 
console.log(cache.name) // cachedContents/xxxxxxxx
console.log(cache.usageMetadata?.totalTokenCount)

cURL

Terminal


curl "https://api.ofox.ai/gemini/v1beta/cachedContents" \
  -H "x-goog-api-key: $OFOX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3.1-pro-preview",
    "contents": [
      { "role": "user", "parts": [{ "text": "<キャッシュする大きなコンテキスト>" }] }
    ],
    "ttl": "600s"
  }'

レスポンス


{
  "name": "cachedContents/xxxxxxxx",
  "model": "google/gemini-3.1-pro-preview",
  "createTime": "2026-06-26T08:00:00Z",
  "updateTime": "2026-06-26T08:00:00Z",
  "expireTime": "2026-06-26T08:10:00Z",
  "displayName": "kb-v1",
  "usageMetadata": {
    "totalTokenCount": 14407
  }
}

取得 / 削除

取得・削除では model を渡す必要はありません。OfoxAI はキャッシュハンドルから上流を特定します。

Python

manage.py


# 単一取得
info = client.caches.get(name=cache.name)
print(info.expire_time)
 
# 削除
client.caches.delete(name=cache.name)

TypeScript

manage.ts


// 単一取得
const info = await ai.caches.get({ name: cache.name })
console.log(info.expireTime)
 
// 削除
await ai.caches.delete({ name: cache.name })

cURL

Terminal


# 単一取得
curl "https://api.ofox.ai/gemini/v1beta/cachedContents/xxxxxxxx" \
  -H "x-goog-api-key: $OFOX_API_KEY"
 
# 削除
curl -X DELETE "https://api.ofox.ai/gemini/v1beta/cachedContents/xxxxxxxx" \
  -H "x-goog-api-key: $OFOX_API_KEY"

キャッシュを参照して生成する

generateContent のリクエストボディに cachedContent フィールドを追加してキャッシュを参照します。contents には今回新しく追加する質問だけを入れます：

Python

use.py


response = client.models.generate_content(
    model="google/gemini-3.1-pro-preview",
    contents="上記のドキュメントに基づいて、要点を3つまとめてください",
    config=types.GenerateContentConfig(cached_content=cache.name),
)
 
print(response.text)
print(response.usage_metadata.cached_content_token_count)  # ヒットしたキャッシュ token 数

TypeScript

use.ts


const response = await ai.models.generateContent({
  model: 'google/gemini-3.1-pro-preview',
  contents: '上記のドキュメントに基づいて、要点を3つまとめてください',
  config: { cachedContent: cache.name },
})
 
console.log(response.text)
console.log(response.usageMetadata?.cachedContentTokenCount) // ヒットしたキャッシュ token 数

cURL

Terminal


curl "https://api.ofox.ai/gemini/v1beta/models/google/gemini-3.1-pro-preview:generateContent" \
  -H "x-goog-api-key: $OFOX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "cachedContent": "cachedContents/xxxxxxxx",
    "contents": [
      { "role": "user", "parts": [{ "text": "上記のドキュメントに基づいて、要点を3つまとめてください" }] }
    ]
  }'

ヒットすると、レスポンスの usageMetadata.cachedContentTokenCount にキャッシュを経由した token 数が表示されます。

課金

ステージ	計算式
キャッシュ作成	`totalTokenCount × cache_write 単価`
参照ヒット	`cachedContentTokenCount × cache_read 単価`（標準入力価格の約 0.10x）
参照時の新規コンテンツ	今回新しく追加した prompt / 出力は標準価格

各モデルの cache_write / cache_read 単価はモデルカタログを参照してください。

OfoxAI は複数の GCP プロジェクト間でロードバランシングを行い、明示的キャッシュはリージョン紐付け（region-scoped）です。OfoxAI はキャッシュハンドルから自動的にキャッシュを作成した上流へハードロックし、参照のドリフトはゼロです。キャッシュハンドルは、それを作成した API Key のみが参照 / 取得 / 削除できます（アカウントをまたいだアクセスは 403 を返します）。詳細は明示的キャッシュガイド · 確定的ルーティングを参照してください。

Cached Contents

エンドポイント

認証

リソースフィールド

キャッシュを作成する

Python

TypeScript

cURL

レスポンス

取得 / 削除

Python

TypeScript

cURL

キャッシュを参照して生成する

Python

TypeScript

cURL

課金

関連ドキュメント