Files
moreminimore-vibe/testing/fake-llm-server/README.md
Kunthawat Greethong d22227bb13 feat: implement fuzzy search and replace functionality with Levenshtein distance
- Added `applySearchReplace` function to handle search and replace operations with fuzzy matching capabilities.
- Introduced tests for various scenarios including fuzzy matching with typos, exact matches, and handling whitespace differences.
- Created a parser for search/replace blocks to facilitate the new functionality.
- Updated prompts for search-replace operations to clarify usage and examples.
- Added utility functions for text normalization and language detection based on file extensions.
- Implemented a minimal stdio MCP server for local testing with tools for adding numbers and printing environment variables.
2025-12-05 11:28:57 +07:00

117 lines
2.4 KiB
Markdown

# Fake LLM Server
A simple server that mimics the OpenAI streaming chat completions API for testing purposes.
## Features
- Implements a basic version of the OpenAI chat completions API
- Supports both streaming and non-streaming responses
- Always responds with "hello world" message
- Simulates a 429 rate limit error when the last message is "[429]"
- Configurable through environment variables
## Installation
```bash
npm install
```
## Usage
Start the server:
```bash
# Development mode
npm run dev
# Production mode
npm run build
npm start
```
### Example usage
```
curl -X POST http://localhost:3500/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Say something"}],"model":"any-model","stream":true}'
```
The server will be available at http://localhost:3500 by default.
## API Endpoints
### POST /v1/chat/completions
This endpoint mimics OpenAI's chat completions API.
#### Request Format
```json
{
"messages": [{ "role": "user", "content": "Your prompt here" }],
"model": "any-model",
"stream": true
}
```
- Set `stream: true` to receive a streaming response
- Set `stream: false` or omit it for a regular JSON response
#### Response
For non-streaming requests, you'll get a standard JSON response:
```json
{
"id": "chatcmpl-123456789",
"object": "chat.completion",
"created": 1699000000,
"model": "fake-model",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "hello world"
},
"finish_reason": "stop"
}
]
}
```
For streaming requests, you'll receive a series of server-sent events (SSE), each containing a chunk of the response.
### Simulating Rate Limit Errors
To test how your application handles rate limiting, send a message with content exactly equal to `[429]`:
```json
{
"messages": [{ "role": "user", "content": "[429]" }],
"model": "any-model"
}
```
This will return a 429 status code with the following response:
```json
{
"error": {
"message": "Too many requests. Please try again later.",
"type": "rate_limit_error",
"param": null,
"code": "rate_limit_exceeded"
}
}
```
## Configuration
You can configure the server by modifying the `PORT` variable in the code.
## Use Case
This server is primarily intended for testing applications that integrate with OpenAI's API, allowing you to develop and test without making actual API calls to OpenAI.