Google: Gemini 2.5 Flash Lite
Chatgoogle/gemini-2.5-flash-liteGemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, [thinking] (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the Reasoning API parameter to selectively trade off cost for intelligence.
1M context window
66K max output tokens
Released: 2025-07-22
Supported Protocols:openaigemini
Available Providers:Vertex
Capabilities:VisionFunction CallingPrompt CachingPDF Input
Pricing
| Type | Price |
|---|---|
| Input Tokens | $0.1/M |
| Output Tokens | $0.4/M |
| Audio Input | $0.3/M |
| Cache Read | $0.025/M |
| Cache Write | $1/M |
| Cached Audio | $0.3/M |
| Web Search | $0.035/R |
Code Examples
from google import genaiclient = genai.Client(api_key="YOUR_OFOX_API_KEY",http_options={"api_version": "v1beta", "url": "https://api.ofox.ai/gemini"},)response = client.models.generate_content(model="google/gemini-2.5-flash-lite",contents="Hello!",)print(response.text)
Related Models
Frequently Asked Questions
Google: Gemini 2.5 Flash Lite on Ofox.ai costs $0.1/M per million input tokens and $0.4/M per million output tokens. Pay-as-you-go, no monthly fees.
Google: Gemini 2.5 Flash Lite supports a context window of 1M tokens with max output of 66K tokens, allowing you to process large documents and maintain long conversations.
Simply set your base URL to https://api.ofox.ai/v1 and use your Ofox API key. The API is OpenAI-compatible — just change the base URL and API key in your existing code.
Google: Gemini 2.5 Flash Lite supports the following capabilities: Vision, Function Calling, Prompt Caching, PDF Input. Access all features through the Ofox.ai unified API.