Gemini 3.5 Flash Just Made Agentic Workflows Cheap
Google released Gemini 3.5 Flash at I/O 2026, and the headline is not the model. It is the price.
3.5 Flash beats Gemini 3.1 Pro on most coding and agentic benchmarks, runs roughly 4x faster, and costs $1.50 per million input tokens and $9.00 per million output tokens. That is Flash pricing on a model Google is positioning as frontier for agent work.
For anyone shipping AI products, this matters more than the demo where agents built a virtual city.
What actually shipped
Gemini 3.5 Flash is the first model in a new family Google is calling 3.5. Pro arrives next month. The Flash release is generally available across the Gemini app, AI Mode in Search, the Gemini API, Google AI Studio, Vertex AI, and Antigravity, which is Google's agent-first IDE.
The benchmark numbers Google self-reports against 3.1 Pro:
- Terminal-Bench 2.1: 76.2%
- MCP Atlas: 83.6%
- GDPval-AA: 1,656 Elo
- CharXiv Reasoning: 84.2%
3.5 Flash trails 3.1 Pro on raw knowledge benchmarks like Humanity's Last Exam and ARC-AGI-2. That tradeoff is the whole point. Google moved the model down the cost curve on workloads that look like real work, and accepted a small regression on benchmarks that look like exam questions.
Why this matters for AI products
Most agentic products fail one of two tests in production. Either the per-task cost is too high to make the unit economics work, or the latency is too high to make the experience feel responsive when you chain three or four tool calls.
3.5 Flash is the first frontier-tier model that makes both numbers smaller at the same time. Box reports a 19.6% improvement over 3 Flash on their enterprise eval set. Armadin reports a 72% reduction in token use on a multi-turn cyber benchmark. Those are the kinds of numbers that move a feature from "demo that works in a deck" to "feature we can leave on for paying customers."
The other shift is parallel subagents. 3.5 Flash is built to spawn multiple agents that operate at the same time on long-horizon tasks. Until recently, running five agents in parallel was a slide. Now it is a budget line item that is actually defensible.
What parallel subagents unlock
The shift from sequential to parallel agent design is the part most teams will underestimate. A sequential agent loop is easy to reason about. A parallel fan-out is harder to design but produces qualitatively different products.
Three patterns become viable on 3.5 Flash pricing. Research workflows that hit one source at a time can now hit ten in parallel and synthesize. Code modification across a large repo can split by file or module instead of crawling sequentially. Evaluation harnesses that grade output on multiple criteria can run those criteria as separate agents and merge scores in seconds.
None of these patterns are new. What is new is that they fit inside a startup budget. The public version of this shift is Google's demo where agents built a working operating system in six hours. The private version is dozens of teams quietly rewriting their agent loops this quarter.
What to do this week
If you are running an AI product or planning one:
- Reprice your current agentic workloads on 3.5 Flash. The new pricing changes which features survive.
- Look at the parts of your product where you skipped multi-agent designs because of cost. Some of those are now viable.
- If you are mid-build on an MVP, ask whether your model choice still holds. The frontier moved.
The bigger pattern across this release cycle is that agentic capability is no longer the bottleneck. Cost discipline and workflow design are. The teams that ship in the next two quarters will be the ones who treat model pricing as a product decision, not a procurement one.
