Cached Contents

Gemini 네이티브 프로토콜로 명시적 Context Caching(cachedContents)을 관리합니다. 대용량 컨텍스트를 직접 객체로 캐싱하여 요청 간에 참조하면, 적중 확정성이 높아지고 비용이 낮아집니다. OfoxAI는 Google GenAI SDK와 호환됩니다.

명시적 캐싱의 사용 시나리오, 암시적 캐싱과의 차이, 모범 사례는 Gemini 명시적 캐싱 가이드를 참고하세요. 이 페이지는 엔드포인트 수준의 API 참조입니다.

엔드포인트


POST   https://api.ofox.ai/gemini/v1beta/cachedContents              # 생성
GET    https://api.ofox.ai/gemini/v1beta/cachedContents/{id}         # 조회
DELETE https://api.ofox.ai/gemini/v1beta/cachedContents/{id}         # 삭제

캐시를 참조하여 콘텐츠를 생성할 때는 표준 generateContent 엔드포인트를 사용하고, 요청 본문에 cachedContent 필드를 넣습니다.


POST   https://api.ofox.ai/gemini/v1beta/models/{model}:generateContent

인증

x-goog-api-key Header를 사용합니다.


x-goog-api-key: <당신의 OFOXAI_API_KEY>

리소스 필드

CachedContent 리소스의 주요 필드:

필드	타입	설명
`name`	string(읽기 전용)	캐시 핸들, `cachedContents/{id}` 형태이며 생성 후 반환됨
`model`	string(필수, 불변)	캐시가 바인딩된 모델, 예: `models/gemini-3.1-pro-preview`
`contents`	array	캐싱할 콘텐츠(generateContent의 `contents`와 동일 구조)
`systemInstruction`	object	캐싱할 시스템 지시문(선택)
`tools`	array	캐싱할 도구 정의(선택)
`ttl`	string	존속 시간, 초 단위 문자열(예: `"600s"`); `expireTime`과 택일
`expireTime`	string	만료 시점(RFC 3339); `ttl`과 택일
`displayName`	string(불변)	사용자 지정 이름(선택)
`usageMetadata.totalTokenCount`	integer	캐싱된 token 수(과금에 사용)

TTL 지원 범위: 최소 / 기본 600s(10분), 최대 3600s(1시간).

캐시 생성

Python

create.py


from google import genai
from google.genai import types
 
client = genai.Client(
    api_key="<당신의 OFOXAI_API_KEY>",
    http_options={"api_version": "v1beta", "base_url": "https://api.ofox.ai/gemini"},
)
 
cache = client.caches.create(
    model="google/gemini-3.1-pro-preview",
    config=types.CreateCachedContentConfig(
        contents=[open("knowledge_base.txt").read()],
        system_instruction="你是一个只依据所给文档回答的助手。",
        ttl="600s",
        display_name="kb-v1",
    ),
)
 
print(cache.name)                          # cachedContents/xxxxxxxx
print(cache.usage_metadata.total_token_count)

TypeScript

create.ts


import { GoogleGenAI } from '@google/genai'
import fs from 'node:fs'
 
const ai = new GoogleGenAI({
  apiKey: '<당신의 OFOXAI_API_KEY>',
  httpOptions: { apiVersion: 'v1beta', baseUrl: 'https://api.ofox.ai/gemini' },
})
 
const cache = await ai.caches.create({
  model: 'google/gemini-3.1-pro-preview',
  config: {
    contents: [fs.readFileSync('knowledge_base.txt', 'utf-8')],
    systemInstruction: '你是一个只依据所给文档回答的助手。',
    ttl: '600s',
    displayName: 'kb-v1',
  },
})
 
console.log(cache.name) // cachedContents/xxxxxxxx
console.log(cache.usageMetadata?.totalTokenCount)

cURL

Terminal


curl "https://api.ofox.ai/gemini/v1beta/cachedContents" \
  -H "x-goog-api-key: $OFOX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3.1-pro-preview",
    "contents": [
      { "role": "user", "parts": [{ "text": "<캐싱할 대용량 컨텍스트>" }] }
    ],
    "ttl": "600s"
  }'

응답


{
  "name": "cachedContents/xxxxxxxx",
  "model": "google/gemini-3.1-pro-preview",
  "createTime": "2026-06-26T08:00:00Z",
  "updateTime": "2026-06-26T08:00:00Z",
  "expireTime": "2026-06-26T08:10:00Z",
  "displayName": "kb-v1",
  "usageMetadata": {
    "totalTokenCount": 14407
  }
}

조회 / 삭제

조회와 삭제에는 model을 전달할 필요가 없습니다. OfoxAI는 캐시 핸들만으로 업스트림을 찾아냅니다.

Python

manage.py


# 단건 조회
info = client.caches.get(name=cache.name)
print(info.expire_time)
 
# 삭제
client.caches.delete(name=cache.name)

TypeScript

manage.ts


// 단건 조회
const info = await ai.caches.get({ name: cache.name })
console.log(info.expireTime)
 
// 삭제
await ai.caches.delete({ name: cache.name })

cURL

Terminal


# 단건 조회
curl "https://api.ofox.ai/gemini/v1beta/cachedContents/xxxxxxxx" \
  -H "x-goog-api-key: $OFOX_API_KEY"
 
# 삭제
curl -X DELETE "https://api.ofox.ai/gemini/v1beta/cachedContents/xxxxxxxx" \
  -H "x-goog-api-key: $OFOX_API_KEY"

캐시 참조 생성

generateContent 요청 본문에 cachedContent 필드를 추가하여 캐시를 참조합니다. contents에는 이번에 새로 추가된 질문만 넣습니다.

Python

use.py


response = client.models.generate_content(
    model="google/gemini-3.1-pro-preview",
    contents="根据上面的文档，总结三条要点",
    config=types.GenerateContentConfig(cached_content=cache.name),
)
 
print(response.text)
print(response.usage_metadata.cached_content_token_count)  # 적중한 캐시 token 수

TypeScript

use.ts


const response = await ai.models.generateContent({
  model: 'google/gemini-3.1-pro-preview',
  contents: '根据上面的文档，总结三条要点',
  config: { cachedContent: cache.name },
})
 
console.log(response.text)
console.log(response.usageMetadata?.cachedContentTokenCount) // 적중한 캐시 token 수

cURL

Terminal


curl "https://api.ofox.ai/gemini/v1beta/models/google/gemini-3.1-pro-preview:generateContent" \
  -H "x-goog-api-key: $OFOX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "cachedContent": "cachedContents/xxxxxxxx",
    "contents": [
      { "role": "user", "parts": [{ "text": "根据上面的文档，总结三条要点" }] }
    ]
  }'

적중 시, 응답의 usageMetadata.cachedContentTokenCount에 캐시로 처리된 token 수가 표시됩니다.

과금

단계	과금 공식
캐시 생성	`totalTokenCount × cache_write 단가`
참조 적중	`cachedContentTokenCount × cache_read 단가`(표준 입력 가격의 약 0.10x)
참조 새 콘텐츠	이번에 새로 추가된 prompt / 출력은 표준 가격으로 과금

각 모델의 cache_write / cache_read 단가는 모델 카탈로그 를 확인하세요.

OfoxAI는 여러 GCP 프로젝트 간에 부하를 분산하며, 명시적 캐시는 리전 바인딩(region-scoped)입니다. OfoxAI는 캐시 핸들로 캐시를 생성한 업스트림에 자동으로 하드 락(hard-lock)하여 참조에 드리프트가 전혀 없습니다. 캐시 핸들은 이를 생성한 API Key만 참조 / 조회 / 삭제할 수 있습니다(계정 간 접근은 403 반환). 자세한 내용은 명시적 캐싱 가이드 · 확정적 라우팅을 참고하세요.

Cached Contents

엔드포인트

인증

리소스 필드

캐시 생성

Python

TypeScript

cURL

응답

조회 / 삭제

Python

TypeScript

cURL

캐시 참조 생성

Python

TypeScript

cURL

과금

관련 문서