This uses Gemini's native [thinking
summaries](https://cloud.google.com/vertex-ai/generative-ai/docs/thinking#thought-summaries)
which were recently added to the API.
Why? The grafted thinking would sometimes cause weird issues where the
model, especially Gemini 2.5 Flash, got confused and put dyad tags like
`<dyad-write>` inside the `<think>` tags.
This also improves the UX because you can see the native thoughts rather
than having the Gemini response load for a while without any feedback.
I tried adding Anthropic extended thinking, however it requires temp to
be set at 1, which isn't ideal for Dyad's use case where we need precise
syntax following.
This code was quite complex and hairy and resulted in very opaque errors
(for both free and pro users). There's not much benefit to budget saver
because Google removed 2.5 Pro free quota a while ago (after it
graduated the model from experimental to preview). Dyad Pro users can
still use 2.5 Flash free quota by disabling Dyad Pro by clicking on the
Dyad Pro button at the top.
* Do not make API key input (password) - hurts usability
* Support LLM gateway (and add GPT 4.1 mini model)
* Show Dyad Pro button
* Fix to use auto (not dyad) for detecting dyad pro
* Fix description of gpt 4.1-mini