After briefly detailing last week, Google is rolling out Gemini 2.5 Flash in preview today. A “thinking budget” lets developers control how much reasoning occurs depending on the prompt and use case. 

All models in the Gemini 2.5 family have reasoning capabilities that think “through their thoughts before responding” for “enhanced performance and improved accuracy.” This is ideal for prompts that require multi-step reasoning, like math problems and analyzing research questions

Instead of immediately generating an output, the model can perform a “thinking” process to better understand the query, break down complex tasks, and plan its response.

For developers

Gemini’s Flash models are known for their speed and lower cost. That’s not changing with 2.5 Flash, but Google is introducing reasoning capabilities where developers are able to “set thinking budgets to control cost vs quality.” 


Key specifications for Gemini 2.5 Flash in preview (gemini-2.5-flash-preview-04-17):

Advertisement – scroll for more content

window.adSlotsConfig = window.adSlotsConfig || [];

adSlotsConfig.push( {
slotID: ‘/1049447/Outbrain’,
slotName: ‘div-gpt-ad-outbrain-ad-669137’,
sizes: [300, 250],
slotPosition: ‘mid_article’
} );


Specifically, developers control the “number of tokens a model can generate while thinking” from 0 to 24,576 tokens. There’s a slider in Google AI Studio and Vertex AI, as well as an API parameter. In the graphs below, you can see how reasoning quality improves as the budget increases.

If the thinking budget is set to zero, this new model will match 2.0 Flash’s cost & latency.

If a budget isn’t specified, Gemini 2.5 Flash “automatically decides how much to think based on the perceived task complexity.” Google provides examples of minimal, medium, and high reasoning: 


Prompts with minimal reasoning:

Prompts with medium reasoning:

Prompts with high reasoning:


In the context of agents, another example is how quick summaries would involve a low thinking budget, while detailed analysis requires a higher one. 

Gemini 2.5 Flash is available to preview for developers in Google AI Studio and Vertex AI. Google says it will “continue to improve Gemini 2.5 Flash, with more coming soon, before we make it generally available for full production use.”

Gemini app

2.5 Flash (experimental) is also coming to the Gemini app with the ability to automatically adjust how much reasoning occurs based on the prompt’s complexity. End users don’t get any sort of manual adjustment in the app.

At launch, the various Gemini app capabilities, like apps/Extensions, file upload, etc., are supported, while this model will replace 2.0 Flash Thinking (experimental), which was last updated in March.

FTC: We use income earning auto affiliate links. More.

You’re reading 9to5Google — experts who break news about Google and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Google on Twitter, Facebook, and LinkedIn to stay in the loop. Don’t know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel