Gemini

Google: Gemini 2.5 Flash Lite

Chat
google/gemini-2.5-flash-lite

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, [thinking] (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the Reasoning API parameter to selectively trade off cost for intelligence.

1M context window
66K max output tokens
Released: 2025-07-22
Supported Protocols:OpenAIopenaiGeminigemini
Available Providers:GoogleCloudVertex
Capabilities:VisionFunction CallingPrompt CachingPDF Input

Pricing

TypePrice
Input Tokens$0.1/M
Output Tokens$0.4/M
Audio Input$0.3/M
Cache Read$0.025/M
Cache Write$1/M
Cached Audio$0.3/M
Web Search$0.035/R

Code Examples

from google import genai
client = genai.Client(
api_key="YOUR_OFOX_API_KEY",
http_options={"api_version": "v1beta", "url": "https://api.ofox.ai/gemini"},
)
response = client.models.generate_content(
model="google/gemini-2.5-flash-lite",
contents="Hello!",
)
print(response.text)

Frequently Asked Questions

Google: Gemini 2.5 Flash Lite on Ofox.ai costs $0.1/M per million input tokens and $0.4/M per million output tokens. Pay-as-you-go, no monthly fees.

Discord

Join our Discord server

Discord โ†’