Summarize chat trigger (#1890)

<!-- CURSOR_SUMMARY -->
> [!NOTE]
> Adds a context-limit banner with one-click “summarize into new chat,”
refactors token counting with react-query, and persists per-message max
token usage.
> 
> - **Chat UX**
> - **Context limit banner** (`ContextLimitBanner.tsx`,
`MessagesList.tsx`): shows when within 40k tokens of `contextWindow`,
with tooltip and action to summarize into a new chat.
> - **Summarize flow**: extracted to `useSummarizeInNewChat` and used in
chat input and banner; new summarize system prompt
(`summarize_chat_system_prompt.ts`).
> - **Token usage & counting**
> - **Persist max tokens used per assistant message**: DB migration
(`messages.max_tokens_used`), schema updates, and saving usage during
streaming (`chat_stream_handlers.ts`).
> - **Token counting refactor** (`useCountTokens.ts`): react-query with
debounce; returns `estimatedTotalTokens` and `actualMaxTokens`;
invalidated on model change and stream end; `TokenBar` updated.
> - **Surfacing usage**: tooltip on latest assistant message shows total
tokens (`ChatMessage.tsx`).
> - **Model/config tweaks**
> - Set `auto` model `contextWindow` to `200_000`
(`language_model_constants.ts`).
>   - Improve chat auto-scroll dependency (`ChatPanel.tsx`).
>   - Fix app path validation regex (`app_handlers.ts`).
> - **Testing & dev server**
> - E2E tests for banner and summarize
(`e2e-tests/context_limit_banner.spec.ts` + fixtures/snapshot).
> - Fake LLM server streams usage to simulate high token scenarios
(`testing/fake-llm-server/*`).
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
2ae16a14d50699cc772407426419192c2fdf2ec3. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->













<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds a “Summarize into new chat” trigger and a context limit banner to
help keep conversations focused and avoid hitting model limits. Also
tracks and surfaces actual token usage per assistant message, with a
token counting refactor for reliability.

- **New Features**
- Summarize into new chat from the input or banner; improved system
prompt with clear output format.
- Context limit banner shows when within 40k tokens of the model’s
context window and offers a one-click summarize action.
  - Tooltip on the latest assistant message shows total tokens used.

- **Refactors**
- Token counting now uses react-query and returns estimatedTotalTokens
and actualMaxTokens; counts are invalidated on model change and when
streaming settles.
- Persist per-message max_tokens_used in the messages table; backend
aggregates model usage during streaming and saves it.
- Adjusted default “Auto” model contextWindow to 200k for more realistic
limits.
- Improved chat scrolling while streaming; fixed app path validation
regex.

<sup>Written for commit 2ae16a14d50699cc772407426419192c2fdf2ec3.
Summary will update automatically on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
This commit is contained in:
Will Chen
2025-12-04 23:00:28 -08:00
committed by GitHub
parent 90c5805b57
commit 6235f7bb9d
24 changed files with 1185 additions and 91 deletions

View File

@@ -0,0 +1,46 @@
import { test, Timeout } from "./helpers/test_helper";
import { expect } from "@playwright/test";
test("context limit banner appears and summarize works", async ({ po }) => {
await po.setUp();
// Send a message that triggers high token usage (110k tokens)
// With a default context window of 128k, this leaves only 18k tokens remaining
// which is below the 40k threshold to show the banner
await po.sendPrompt("tc=context-limit-response [high-tokens=110000]");
// Verify the context limit banner appears
const contextLimitBanner = po.page.getByTestId("context-limit-banner");
await expect(contextLimitBanner).toBeVisible({ timeout: Timeout.MEDIUM });
// Verify banner text
await expect(contextLimitBanner).toContainText(
"You're close to the context limit for this chat.",
);
// Click the summarize button
await contextLimitBanner
.getByRole("button", { name: "Summarize into new chat" })
.click();
// Wait for the new chat to load and message to complete
await po.waitForChatCompletion();
// Snapshot the messages in the new chat
await po.snapshotMessages();
});
test("context limit banner does not appear when within limit", async ({
po,
}) => {
await po.setUp();
// Send a message with low token usage (50k tokens)
// With a 128k context window, this leaves 78k tokens remaining
// which is above the 40k threshold - banner should NOT appear
await po.sendPrompt("tc=context-limit-response [high-tokens=50000]");
// Verify the context limit banner does NOT appear
const contextLimitBanner = po.page.getByTestId("context-limit-banner");
await expect(contextLimitBanner).not.toBeVisible();
});

View File

@@ -0,0 +1,4 @@
Here is a simple response to test the context limit banner functionality.
This message simulates being close to the model's context window limit.

View File

@@ -0,0 +1,14 @@
- paragraph: Summarize from chat-id=1
- img
- text: file1.txt
- button "Edit":
- img
- img
- text: file1.txt
- paragraph: More EOM
- button:
- img
- img
- text: less than a minute ago
- button "Retry":
- img