Implement saver mode (#154)
This commit is contained in:
116
testing/fake-llm-server/README.md
Normal file
116
testing/fake-llm-server/README.md
Normal file
@@ -0,0 +1,116 @@
|
||||
# Fake LLM Server
|
||||
|
||||
A simple server that mimics the OpenAI streaming chat completions API for testing purposes.
|
||||
|
||||
## Features
|
||||
|
||||
- Implements a basic version of the OpenAI chat completions API
|
||||
- Supports both streaming and non-streaming responses
|
||||
- Always responds with "hello world" message
|
||||
- Simulates a 429 rate limit error when the last message is "[429]"
|
||||
- Configurable through environment variables
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
Start the server:
|
||||
|
||||
```bash
|
||||
# Development mode
|
||||
npm run dev
|
||||
|
||||
# Production mode
|
||||
npm run build
|
||||
npm start
|
||||
```
|
||||
|
||||
### Example usage
|
||||
|
||||
```
|
||||
curl -X POST http://localhost:3500/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"messages":[{"role":"user","content":"Say something"}],"model":"any-model","stream":true}'
|
||||
```
|
||||
|
||||
The server will be available at http://localhost:3500 by default.
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### POST /v1/chat/completions
|
||||
|
||||
This endpoint mimics OpenAI's chat completions API.
|
||||
|
||||
#### Request Format
|
||||
|
||||
```json
|
||||
{
|
||||
"messages": [{ "role": "user", "content": "Your prompt here" }],
|
||||
"model": "any-model",
|
||||
"stream": true
|
||||
}
|
||||
```
|
||||
|
||||
- Set `stream: true` to receive a streaming response
|
||||
- Set `stream: false` or omit it for a regular JSON response
|
||||
|
||||
#### Response
|
||||
|
||||
For non-streaming requests, you'll get a standard JSON response:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "chatcmpl-123456789",
|
||||
"object": "chat.completion",
|
||||
"created": 1699000000,
|
||||
"model": "fake-model",
|
||||
"choices": [
|
||||
{
|
||||
"index": 0,
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": "hello world"
|
||||
},
|
||||
"finish_reason": "stop"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
For streaming requests, you'll receive a series of server-sent events (SSE), each containing a chunk of the response.
|
||||
|
||||
### Simulating Rate Limit Errors
|
||||
|
||||
To test how your application handles rate limiting, send a message with content exactly equal to `[429]`:
|
||||
|
||||
```json
|
||||
{
|
||||
"messages": [{ "role": "user", "content": "[429]" }],
|
||||
"model": "any-model"
|
||||
}
|
||||
```
|
||||
|
||||
This will return a 429 status code with the following response:
|
||||
|
||||
```json
|
||||
{
|
||||
"error": {
|
||||
"message": "Too many requests. Please try again later.",
|
||||
"type": "rate_limit_error",
|
||||
"param": null,
|
||||
"code": "rate_limit_exceeded"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
You can configure the server by modifying the `PORT` variable in the code.
|
||||
|
||||
## Use Case
|
||||
|
||||
This server is primarily intended for testing applications that integrate with OpenAI's API, allowing you to develop and test without making actual API calls to OpenAI.
|
||||
Reference in New Issue
Block a user