Gemini 3.5 Pro: Release Date, Expected Specs, and What Flash Already Tells Us
Google announced its flagship at I/O 2026 and then told the audience to wait a month for it. The Flash model that shipped instead is already outscoring the previous-generation Pro on coding benchmarks.
TL;DR. Gemini 3.5 Pro was supposed to headline Google I/O 2026 on May 19. It didn’t. Sundar Pichai told the audience “give us until next month” — meaning June 2026, no committed date. What did ship is Gemini 3.5 Flash, and its benchmarks are the most useful data we have for forecasting Pro: Flash already beats Gemini 3.1 Pro on Terminal-Bench 2.1 (76.2% vs 70.3%), MCP Atlas (83.6% vs 78.2%), and Finance Agent v2 (57.9% vs 43.0%). If Pro extends the same gap over Flash, Google is shipping a coding-and-agents flagship in June that will force a real rethink against Claude Opus 4.7 and GPT-5.5. This piece is the realistic read on dates, pricing, capabilities, and how to prepare your code.
What Google actually announced
Gemini 3.5 Pro was named, demoed internally, and pushed. Sundar Pichai’s exact phrasing on the I/O keynote: “We’re also hard at work on 3.5 Pro. It’s already being used internally, and we look forward to rolling it out next month.” That’s the entire official statement. No spec sheet, no benchmark card, no API preview, no pricing tier.
The delay drew audible groans from the live audience — Business Insider’s reporter on the floor caught it — because everything else in the keynote (Spark, Antigravity 2, Search AI Mode) was framed around the Pro tier that wasn’t there. (Let’s Data Science)
What we got instead was the Gemini 3.5 Flash launch — $1.50/M input, $9.00/M output, 1M context window, 4x output token throughput vs comparable frontier models, GA day-of on Gemini API, AI Studio, Vertex, Antigravity 2, and the Gemini app. Flash is the working artifact. Pro is the artifact we have to reason about from indirect evidence.
When “next month” probably means
Google’s I/O timing tells you the window even if the date is open. The I/O keynote was May 19, 2026. “Next month” gives a range of June 1 to June 30. Two priors narrow it:
- 3.5 is reversing the launch order — but Pichai has capped the wait. Historically Pro shipped first: Gemini 3 Pro on November 18, 2025 with 3 Flash following on December 17; Gemini 3.1 Pro on February 19, 2026 with the 3.1 Flash family rolling out afterward. With 3.5 Flash leading at I/O, there is no clean prior for a Flash → Pro gap. What we do have is Pichai’s “next month” commitment from the keynote, which caps the wait at June 30.
- Google’s quarterly cadence. Pro tiers historically ship before the end of a fiscal quarter when announced, partly for board optics. June 30 is the Q2 end. Expect a drop in the last full week of June — best guess June 22-26 — unless safety or serving capacity slips it.
What could push it out: additional Frontier Safety Framework evaluation (Google has telegraphed this process for every 3.x flagship), TPU serving capacity if Spark and the new agent platform are eating capacity, or a benchmark embargo with a paper drop. None of these would push past July.
What Flash already tells us about Pro
This is the actual analytical work, and it’s the only honest way to forecast a model that hasn’t shipped. Three things Flash makes legible.
1. The generation jump is real, not incremental
Gemini 3.5 Flash beats Gemini 3.1 Pro on the benchmarks Google itself prioritized. From Google’s published Gemini 3.5 Flash model card:
| Benchmark | What it measures | 3.5 Flash | 3.1 Pro | Delta |
|---|---|---|---|---|
| Terminal-Bench 2.1 | Real terminal coding tasks | 76.2% | 70.3% | +5.9 |
| MCP Atlas | Scaled tool-use reliability | 83.6% | 78.2% | +5.4 |
| Finance Agent v2 | Multi-step financial workflows | 57.9% | 43.0% | +14.9 |
| GDPval-AA (Elo) | Economic-value task suite | 1656 | 1314 | +342 |
| CharXiv Reasoning | Chart/figure understanding | 84.2% | 83.3% | +0.9 |
A Flash beating a Pro on coding-and-agentic work has not happened before in this family. The implication: the 3.5 generation isn’t just a quality bump, it’s a re-architecture for agentic loops specifically. Pro should extend the trend.
A useful crude forecast: if 3.1 Pro → 3.5 Pro mirrors the gap between 3.1 Flash and 3.5 Flash (roughly +6-15 points on agentic benchmarks), Gemini 3.5 Pro lands at ~82-85% Terminal-Bench, ~88-90% MCP Atlas, and well into 70+ on Finance Agent v2. That’s flagship territory against Claude Opus 4.7 and GPT-5.5, which trade leadership across this benchmark set depending on the task. Compare with the current flagship benchmark showdown — the picture changes meaningfully if Pro hits the upper end of this range.
2. Pricing has a floor and a ceiling
Flash launched at $1.50/M input, $9.00/M output. Gemini 3.1 Pro sits at $2.00/M input, $12.00/M output. That’s an unusual layout — Flash is now 25% cheaper than the prior-gen Pro while being benchmark-superior on coding. The new Pro has to be priced higher than 3.1 Pro to make commercial sense, but it can’t go too high without making the Flash + Pro combo less attractive than bundling DeepSeek V4 Pro and Gemini Flash through a single endpoint for cost-shaping.
Realistic price band for 3.5 Pro:
- Floor: $2.50/M input, $15/M output (a 25% premium over 3.1 Pro, mirroring the 3.5 Flash premium over 3.1 Flash)
- Ceiling: $3.50/M input, $20/M output (the upper bound before it starts overlapping with Anthropic and OpenAI flagship pricing and losing its differentiation)
- Most likely: $3.00/M input, $18/M output
For comparison: GPT-5.5 is roughly $5/$30, Claude Opus 4.7 is $5/$25. Even at the high end of the band Gemini 3.5 Pro stays meaningfully cheaper for output-heavy workloads — which is most agentic loops.
3. The 1M context window is staying
Gemini 3.5 Flash kept the 1,048,576 input / 65,536 output token window. No evidence Google is reducing context in the 3.5 generation. Pro almost certainly keeps or expands this — long context remains a Gemini selling point alongside Claude Opus 4.7 (200k default, 1M on the dedicated long-context variant) and GPT-5.5 (1M via the standard API, 400k inside Codex), and Google’s Project Mariner and Antigravity 2 product story both depend on it. If anything, expect 3.5 Pro to push to 2M context as a marketing point.
The remaining open question is recall quality at 128k+. 3.5 Flash actually regressed on MRCR v2 at 128k (77.3% vs 3.1 Pro’s 84.9%) — a six-point drop. That regression is the single biggest open question about 3.5 Pro. If Pro inherits it, the “1M context” claim becomes weaker in practice for real long-document retrieval and you’d still want 3.1 Pro for those workloads.
What Pro probably won’t be
A grounded forecast also needs to say what isn’t changing.
- It’s not getting a step-change in modalities. 3.5 Flash already takes text, images, video, audio, and PDFs in and emits text. Pro almost certainly matches but doesn’t extend this set on day one. Native image-out lives in Nano Banana / Imagen, not the main Gemini chat tier.
- It’s not going to drop below Flash’s prices. Google needs a Pro tier with margins. The whole point of having Flash + Pro is price discrimination across workload sensitivity.
- It’s not shipping with a public model card before launch day. Google’s pattern has been simultaneous model card + GA. Don’t expect benchmark leaks; expect a Tuesday morning launch with the deck ready.
- The naming probably stays “Gemini 3.5 Pro.” There’s been no signal pointing toward a rename, and Google has been more naming-disciplined than OpenAI in the 3.x generation.
How to prepare today
If you’re shipping anything that will rely on Gemini 3.5 Pro the moment it lands, the practical prep is:
1. Build against 3.5 Flash now. The API surface and tool-use shape are the same between Flash and Pro tiers in this generation. Through ofox the model id is google/gemini-3.5-flash. When Pro ships, swap to google/gemini-3.5-pro — no SDK or schema rewrite. The ofox OpenAI-compatible endpoint handles request translation either way.
# Today
client.chat.completions.create(
model="google/gemini-3.5-flash",
messages=[...],
)
# Day Pro ships, change one string
client.chat.completions.create(
model="google/gemini-3.5-pro",
messages=[...],
)
2. Use Flash as the floor in routing. A common pattern: route trivial work to Flash, escalate to a flagship (Opus 4.7, GPT-5.5, or soon Gemini 3.5 Pro) only when Flash returns low confidence. See the Claude Code hybrid routing pattern for the production-grade version of this. When 3.5 Pro lands, you swap which flagship sits behind the escalation gate — your routing logic doesn’t change.
3. Don’t pre-commit to pricing. The realistic range above ($2.50-$3.50 input, $15-$20 output) is informed speculation. If you’re writing a cost projection for finance, plug in both endpoints of the band and ship two scenarios.
4. Watch for the model card on Google’s blog. That’s how every 3.x model has launched — a single blog post on blog.google with the full benchmark grid. No staged rollouts, no Twitter teasers from product managers. Subscribe to the Gemini API changelog if you want push notification of the moment Pro becomes addressable.
The bigger picture for model selection
By late June 2026 three things are happening at once:
- Claude Opus 4.7 still holds reasoning-heavy benchmarks, especially long-horizon agent runs.
- GPT-5.5 owns raw multimodal reasoning and the deepest ecosystem of tooling.
- Gemini 3.5 Pro — if Flash’s gains carry forward — undercuts both on price and pushes them on Terminal-Bench-style agentic coding.
Picking the right model gets harder before it gets easier. The LLM API selection decision matrix and the leaderboard view will both need a rewrite the week Pro ships. For the canonical three-way framing of how to choose, see the Claude vs GPT vs Gemini comparison guide — that’s the piece that’ll get the biggest update.
If you’re choosing today and the work is coding-and-agents leaning, Gemini 3.5 Flash already beats last-generation Pro at 25% lower cost. There’s no reason to wait. If the work is reasoning-heavy or you care about long-context recall quality, stay on Gemini 3.1 Pro or Claude Opus 4.7 for now and re-evaluate when Pro lands. The thing not to do is sit on your hands assuming the new shiny thing will solve a problem you could solve this week.
Six weeks of public Flash benchmarks have arrived before the Pro model card exists — and they say the cost-quality frontier in coding is about to shift on a Tuesday in late June.
Sources and citations
- Sundar Pichai’s “next month” quote and the I/O 2026 keynote framing: Google’s official blog (May 19, 2026)
- Gemini 3.5 Flash benchmark grid: DeepMind model card
- Flash pricing and context window: Google AI for Developers — Gemini API changelog
- The delay framing and audience reaction: Let’s Data Science I/O recap
- Comparative pricing for Claude Opus 4.7 and GPT-5.5: confirmed against ofox model catalog at the time of writing


