June ๐ŸŽ‰ GPT 15% Off ๐ŸŽ‰ All series, all month ๐Ÿ”ฅLearn more
Gemini

Google: Gemini 3.1 Flash Lite

Chat
google/gemini-3.1-flash-lite

Gemini 3.1 Flash Lite (GA) is Google's high-efficiency multimodal model optimized for low-latency, high-volume workloads. GA version of the preview model. Supports full thinking levels (minimal, low, medium, high) for cost/performance trade-offs. Priced at half the cost of Gemini 3 Flash. Released May 7, 2026.

Context Window
1M
Max Output Tokens
64K
Released
2026-05-07
Capabilities
VisionFunction CallingReasoningPrompt CachingWeb SearchAudio InputVideo InputPDF Input
Available Providers
GoogleCloudVertex
Supported Protocols
OpenAIopenaiGeminigemini

Providers

GoogleCloudVertex
Input Tokens
$0.25/M
Output Tokens
$1.5/M
Cache Read
$0.025/M
Cache Write
$1/M
Audio Input
$0.5/M
Cache Write (1 hour)
$1/M
Cached Audio
$0.05/M
Web Search
$0.014/R
Protocols
OpenAIopenai/v1/chat/completions
Geminigemini

Code Examples

from google import genai
client = genai.Client(
api_key="YOUR_OFOX_API_KEY",
http_options={"api_version": "v1beta", "base_url": "https://api.ofox.ai/gemini"},
)
response = client.models.generate_content(
model="google/gemini-3.1-flash-lite",
contents="Hello!",
)
print(response.text)

Frequently Asked Questions

Google: Gemini 3.1 Flash Lite on Ofox.ai costs $0.25/M per million input tokens and $1.5/M per million output tokens. Pay-as-you-go, no monthly fees.