15 Commits

Author SHA1 Message Date
Will Chen
6235f7bb9d Summarize chat trigger (#1890)
<!-- CURSOR_SUMMARY -->
> [!NOTE]
> Adds a context-limit banner with one-click “summarize into new chat,”
refactors token counting with react-query, and persists per-message max
token usage.
> 
> - **Chat UX**
> - **Context limit banner** (`ContextLimitBanner.tsx`,
`MessagesList.tsx`): shows when within 40k tokens of `contextWindow`,
with tooltip and action to summarize into a new chat.
> - **Summarize flow**: extracted to `useSummarizeInNewChat` and used in
chat input and banner; new summarize system prompt
(`summarize_chat_system_prompt.ts`).
> - **Token usage & counting**
> - **Persist max tokens used per assistant message**: DB migration
(`messages.max_tokens_used`), schema updates, and saving usage during
streaming (`chat_stream_handlers.ts`).
> - **Token counting refactor** (`useCountTokens.ts`): react-query with
debounce; returns `estimatedTotalTokens` and `actualMaxTokens`;
invalidated on model change and stream end; `TokenBar` updated.
> - **Surfacing usage**: tooltip on latest assistant message shows total
tokens (`ChatMessage.tsx`).
> - **Model/config tweaks**
> - Set `auto` model `contextWindow` to `200_000`
(`language_model_constants.ts`).
>   - Improve chat auto-scroll dependency (`ChatPanel.tsx`).
>   - Fix app path validation regex (`app_handlers.ts`).
> - **Testing & dev server**
> - E2E tests for banner and summarize
(`e2e-tests/context_limit_banner.spec.ts` + fixtures/snapshot).
> - Fake LLM server streams usage to simulate high token scenarios
(`testing/fake-llm-server/*`).
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
2ae16a14d50699cc772407426419192c2fdf2ec3. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->













<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds a “Summarize into new chat” trigger and a context limit banner to
help keep conversations focused and avoid hitting model limits. Also
tracks and surfaces actual token usage per assistant message, with a
token counting refactor for reliability.

- **New Features**
- Summarize into new chat from the input or banner; improved system
prompt with clear output format.
- Context limit banner shows when within 40k tokens of the model’s
context window and offers a one-click summarize action.
  - Tooltip on the latest assistant message shows total tokens used.

- **Refactors**
- Token counting now uses react-query and returns estimatedTotalTokens
and actualMaxTokens; counts are invalidated on model change and when
streaming settles.
- Persist per-message max_tokens_used in the messages table; backend
aggregates model usage during streaming and saves it.
- Adjusted default “Auto” model contextWindow to 200k for more realistic
limits.
- Improved chat scrolling while streaming; fixed app path validation
regex.

<sup>Written for commit 2ae16a14d50699cc772407426419192c2fdf2ec3.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2025-12-04 23:00:28 -08:00
Tanner-Maasen
2ffbbbca8f Add Azure OpenAI Custom Model Integration (#1001)
Fixes #710 

This PR implements comprehensive Azure OpenAI integration for Dyad,
enabling users to leverage Azure
OpenAI models through proper environment variable configuration. The
implementation adds Azure as a
supported provider with full integration into the existing language
model architecture, including support
  for GPT-5 models. Key features include environment-based
configuration using `AZURE_API_KEY` and `AZURE_RESOURCE_NAME`,
specialized UI components that provide clear
setup instructions and status indicators, and seamless integration with
Dyad's existing provider system.
The Azure provider leverages the @ai-sdk/azure package (v1.3.25) for
compatibility with the current
  TypeScript language model interfaces.

The implementation includes robust error handling for missing
configuration, comprehensive test coverage
with 9 new unit tests covering critical functionality like model client
creation and error scenarios, and
  an E2E test for the Azure-specific settings UI. 

<img width="1510" height="908" alt="Screenshot 2025-08-18 at 9 14 32 PM"
src="https://github.com/user-attachments/assets/04aa99e1-1590-4bb0-86c9-a67b97bc7500"
/>

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Will Chen <willchen90@gmail.com>
2025-08-30 20:47:25 -07:00
Will Chen
d535db6251 Upgrade to AI sdk with codemod (#1000) 2025-08-18 22:21:27 -07:00
Will Chen
bd809a010d GitHub workflows (#428)
Fixes #348 
Fixes #274 
Fixes #149 

- Connect to existing repos
- Push to other branches on GitHub besides main
- Allows force push (with confirmation) dialog

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
2025-06-17 16:59:26 -07:00
Will Chen
30b5c0d0ef Replace thinking with native Gemini thinking summaries (#400)
This uses Gemini's native [thinking
summaries](https://cloud.google.com/vertex-ai/generative-ai/docs/thinking#thought-summaries)
which were recently added to the API.

Why? The grafted thinking would sometimes cause weird issues where the
model, especially Gemini 2.5 Flash, got confused and put dyad tags like
`<dyad-write>` inside the `<think>` tags.

This also improves the UX because you can see the native thoughts rather
than having the Gemini response load for a while without any feedback.

I tried adding Anthropic extended thinking, however it requires temp to
be set at 1, which isn't ideal for Dyad's use case where we need precise
syntax following.
2025-06-16 17:29:32 -07:00
Will Chen
c227a08d11 Gateway e2e (#323) 2025-06-03 16:35:46 -07:00
Will Chen
fc1ebe9e8a e2e tests for engine (#322) 2025-06-03 16:11:16 -07:00
Will Chen
c0adf8d3f2 Attach image e2e tests (#301) 2025-06-01 00:44:19 -07:00
Will Chen
8a743ca4f5 LM studio e2e test (#297) 2025-05-31 23:04:28 -07:00
Will Chen
af7d6fa9f8 Create ollama e2e test (#296) 2025-05-31 22:01:48 -07:00
Will Chen
efb814ec95 Create tests: dumps message, "retry" (#281) 2025-05-31 21:15:41 -07:00
Will Chen
647fd0169e make it easy to write multiple e2e tests (#280) 2025-05-29 00:03:51 -07:00
Will Chen
509e044137 Boilerplate free tests (#277) 2025-05-28 22:55:54 -07:00
Will Chen
f4c7d614bd Escape dyad tags inside thinking blocks (#229) 2025-05-22 16:06:28 -07:00
Will Chen
069c221292 Implement saver mode (#154) 2025-05-13 15:34:41 -07:00