Gemini

Google: Gemini 2.5 Flash Lite

Chat
google/gemini-2.5-flash-lite

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, [thinking] (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the Reasoning API parameter to selectively trade off cost for intelligence.

1M context window
66K max output tokens
Released: 2025-07-22
Supported Protocols:OpenAIopenaiGeminigemini
Available Providers:GoogleCloudVertex
Capabilities:VisionFunction CallingPrompt CachingPDF Input

Providers

GoogleCloudVertex
Input Tokens
$0.1/M
Output Tokens
$0.4/M
Cache Read
$0.025/M
Cache Write
$1/M
Audio Input
$0.3/M
Cached Audio
$0.3/M
Web Search
$0.035/R
Protocols
OpenAIopenai/v1/chat/completions
Geminigemini

Code Examples

from google import genai
client = genai.Client(
api_key="YOUR_OFOX_API_KEY",
http_options={"api_version": "v1beta", "url": "https://api.ofox.ai/gemini"},
)
response = client.models.generate_content(
model="google/gemini-2.5-flash-lite",
contents="Hello!",
)
print(response.text)

Frequently Asked Questions

Google: Gemini 2.5 Flash Lite on Ofox.ai costs $0.1/M per million input tokens and $0.4/M per million output tokens. Pay-as-you-go, no monthly fees.