Summarize chat trigger (#1890)
<!-- CURSOR_SUMMARY --> > [!NOTE] > Adds a context-limit banner with one-click “summarize into new chat,” refactors token counting with react-query, and persists per-message max token usage. > > - **Chat UX** > - **Context limit banner** (`ContextLimitBanner.tsx`, `MessagesList.tsx`): shows when within 40k tokens of `contextWindow`, with tooltip and action to summarize into a new chat. > - **Summarize flow**: extracted to `useSummarizeInNewChat` and used in chat input and banner; new summarize system prompt (`summarize_chat_system_prompt.ts`). > - **Token usage & counting** > - **Persist max tokens used per assistant message**: DB migration (`messages.max_tokens_used`), schema updates, and saving usage during streaming (`chat_stream_handlers.ts`). > - **Token counting refactor** (`useCountTokens.ts`): react-query with debounce; returns `estimatedTotalTokens` and `actualMaxTokens`; invalidated on model change and stream end; `TokenBar` updated. > - **Surfacing usage**: tooltip on latest assistant message shows total tokens (`ChatMessage.tsx`). > - **Model/config tweaks** > - Set `auto` model `contextWindow` to `200_000` (`language_model_constants.ts`). > - Improve chat auto-scroll dependency (`ChatPanel.tsx`). > - Fix app path validation regex (`app_handlers.ts`). > - **Testing & dev server** > - E2E tests for banner and summarize (`e2e-tests/context_limit_banner.spec.ts` + fixtures/snapshot). > - Fake LLM server streams usage to simulate high token scenarios (`testing/fake-llm-server/*`). > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 2ae16a14d50699cc772407426419192c2fdf2ec3. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Adds a “Summarize into new chat” trigger and a context limit banner to help keep conversations focused and avoid hitting model limits. Also tracks and surfaces actual token usage per assistant message, with a token counting refactor for reliability. - **New Features** - Summarize into new chat from the input or banner; improved system prompt with clear output format. - Context limit banner shows when within 40k tokens of the model’s context window and offers a one-click summarize action. - Tooltip on the latest assistant message shows total tokens used. - **Refactors** - Token counting now uses react-query and returns estimatedTotalTokens and actualMaxTokens; counts are invalidated on model change and when streaming settles. - Persist per-message max_tokens_used in the messages table; backend aggregates model usage during streaming and saves it. - Adjusted default “Auto” model contextWindow to 200k for more realistic limits. - Improved chat scrolling while streaming; fixed app path validation regex. <sup>Written for commit 2ae16a14d50699cc772407426419192c2fdf2ec3. Summary will update automatically on new commits.</sup> <!-- End of auto-generated description by cubic. -->
This commit is contained in:
@@ -371,6 +371,15 @@ export default Index;
|
||||
return;
|
||||
}
|
||||
|
||||
// Check for high token usage marker to simulate near context limit
|
||||
const highTokensMatch =
|
||||
typeof lastMessage?.content === "string" &&
|
||||
!lastMessage?.content.startsWith("Summarize the following chat:") &&
|
||||
lastMessage?.content?.match?.(/\[high-tokens=(\d+)\]/);
|
||||
const highTokensValue = highTokensMatch
|
||||
? parseInt(highTokensMatch[1], 10)
|
||||
: null;
|
||||
|
||||
// Split the message into characters to simulate streaming
|
||||
const messageChars = messageContent.split("");
|
||||
|
||||
@@ -388,8 +397,15 @@ export default Index;
|
||||
res.write(createStreamChunk(batch));
|
||||
index += batchSize;
|
||||
} else {
|
||||
// Send the final chunk
|
||||
res.write(createStreamChunk("", "assistant", true));
|
||||
// Send the final chunk with optional usage info for high token simulation
|
||||
const usage = highTokensValue
|
||||
? {
|
||||
prompt_tokens: highTokensValue - 100,
|
||||
completion_tokens: 100,
|
||||
total_tokens: highTokensValue,
|
||||
}
|
||||
: undefined;
|
||||
res.write(createStreamChunk("", "assistant", true, usage));
|
||||
clearInterval(interval);
|
||||
res.end();
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user