Summarize chat trigger (#1890)

> [!NOTE] > Adds a context-limit banner with one-click “summarize into new chat,” refactors token counting with react-query, and persists per-message max token usage. > > - **Chat UX** > - **Context limit banner** (`ContextLimitBanner.tsx`, `MessagesList.tsx`): shows when within 40k tokens of `contextWindow`, with tooltip and action to summarize into a new chat. > - **Summarize flow**: extracted to `useSummarizeInNewChat` and used in chat input and banner; new summarize system prompt (`summarize_chat_system_prompt.ts`). > - **Token usage & counting** > - **Persist max tokens used per assistant message**: DB migration (`messages.max_tokens_used`), schema updates, and saving usage during streaming (`chat_stream_handlers.ts`). > - **Token counting refactor** (`useCountTokens.ts`): react-query with debounce; returns `estimatedTotalTokens` and `actualMaxTokens`; invalidated on model change and stream end; `TokenBar` updated. > - **Surfacing usage**: tooltip on latest assistant message shows total tokens (`ChatMessage.tsx`). > - **Model/config tweaks** > - Set `auto` model `contextWindow` to `200_000` (`language_model_constants.ts`). > - Improve chat auto-scroll dependency (`ChatPanel.tsx`). > - Fix app path validation regex (`app_handlers.ts`). > - **Testing & dev server** > - E2E tests for banner and summarize (`e2e-tests/context_limit_banner.spec.ts` + fixtures/snapshot). > - Fake LLM server streams usage to simulate high token scenarios (`testing/fake-llm-server/*`). > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 2ae16a14d50699cc772407426419192c2fdf2ec3. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>   --- ## Summary by cubic Adds a “Summarize into new chat” trigger and a context limit banner to help keep conversations focused and avoid hitting model limits. Also tracks and surfaces actual token usage per assistant message, with a token counting refactor for reliability. - **New Features** - Summarize into new chat from the input or banner; improved system prompt with clear output format. - Context limit banner shows when within 40k tokens of the model’s context window and offers a one-click summarize action. - Tooltip on the latest assistant message shows total tokens used. - **Refactors** - Token counting now uses react-query and returns estimatedTotalTokens and actualMaxTokens; counts are invalidated on model change and when streaming settles. - Persist per-message max_tokens_used in the messages table; backend aggregates model usage during streaming and saves it. - Adjusted default “Auto” model contextWindow to 200k for more realistic limits. - Improved chat scrolling while streaming; fixed app path validation regex. <sup>Written for commit 2ae16a14d50699cc772407426419192c2fdf2ec3. Summary will update automatically on new commits.</sup>
2025-12-04 23:00:28 -08:00
parent 90c5805b57
commit 6235f7bb9d
24 changed files with 1185 additions and 91 deletions
--- a/e2e-tests/context_limit_banner.spec.ts
+++ b/e2e-tests/context_limit_banner.spec.ts
@@ -0,0 +1,46 @@
+import { test, Timeout } from "./helpers/test_helper";
+import { expect } from "@playwright/test";
+
+test("context limit banner appears and summarize works", async ({ po }) => {
+  await po.setUp();
+
+  // Send a message that triggers high token usage (110k tokens)
+  // With a default context window of 128k, this leaves only 18k tokens remaining
+  // which is below the 40k threshold to show the banner
+  await po.sendPrompt("tc=context-limit-response [high-tokens=110000]");
+
+  // Verify the context limit banner appears
+  const contextLimitBanner = po.page.getByTestId("context-limit-banner");
+  await expect(contextLimitBanner).toBeVisible({ timeout: Timeout.MEDIUM });
+
+  // Verify banner text
+  await expect(contextLimitBanner).toContainText(
+    "You're close to the context limit for this chat.",
+  );
+
+  // Click the summarize button
+  await contextLimitBanner
+    .getByRole("button", { name: "Summarize into new chat" })
+    .click();
+
+  // Wait for the new chat to load and message to complete
+  await po.waitForChatCompletion();
+
+  // Snapshot the messages in the new chat
+  await po.snapshotMessages();
+});
+
+test("context limit banner does not appear when within limit", async ({
+  po,
+}) => {
+  await po.setUp();
+
+  // Send a message with low token usage (50k tokens)
+  // With a 128k context window, this leaves 78k tokens remaining
+  // which is above the 40k threshold - banner should NOT appear
+  await po.sendPrompt("tc=context-limit-response [high-tokens=50000]");
+
+  // Verify the context limit banner does NOT appear
+  const contextLimitBanner = po.page.getByTestId("context-limit-banner");
+  await expect(contextLimitBanner).not.toBeVisible();
+});
--- a/e2e-tests/fixtures/context-limit-response.md
+++ b/e2e-tests/fixtures/context-limit-response.md
@@ -0,0 +1,4 @@
+Here is a simple response to test the context limit banner functionality.
+
+This message simulates being close to the model's context window limit.
+
--- a/e2e-tests/snapshots/context_limit_banner.spec.ts_context-limit-banner-appears-and-summarize-works-1.aria.yml
+++ b/e2e-tests/snapshots/context_limit_banner.spec.ts_context-limit-banner-appears-and-summarize-works-1.aria.yml
@@ -0,0 +1,14 @@
+- paragraph: Summarize from chat-id=1
+- img
+- text: file1.txt
+- button "Edit":
+  - img
+- img
+- text: file1.txt
+- paragraph: More EOM
+- button:
+  - img
+- img
+- text: less than a minute ago
+- button "Retry":
+  - img