moreminimore-vibe/testing/fake-llm-server/README.md

# Fake LLM Server

A simple server that mimics the OpenAI streaming chat completions API for testing purposes.

## Features

- Implements a basic version of the OpenAI chat completions API
- Supports both streaming and non-streaming responses
- Always responds with "hello world" message
- Simulates a 429 rate limit error when the last message is "[429]"
- Configurable through environment variables

## Installation

```bash
npm install
```

## Usage

Start the server:

```bash
# Development mode
npm run dev

# Production mode
npm run build
npm start
```

### Example usage

```
curl -X POST http://localhost:3500/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Say something"}],"model":"any-model","stream":true}'
```

The server will be available at http://localhost:3500 by default.

## API Endpoints

### POST /v1/chat/completions

This endpoint mimics OpenAI's chat completions API.

#### Request Format

```json
{
  "messages": [{ "role": "user", "content": "Your prompt here" }],
  "model": "any-model",
  "stream": true
}
```

- Set `stream: true` to receive a streaming response
- Set `stream: false` or omit it for a regular JSON response

#### Response

For non-streaming requests, you'll get a standard JSON response:

```json
{
  "id": "chatcmpl-123456789",
  "object": "chat.completion",
  "created": 1699000000,
  "model": "fake-model",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "hello world"
      },
      "finish_reason": "stop"
    }
  ]
}
```

For streaming requests, you'll receive a series of server-sent events (SSE), each containing a chunk of the response.

### Simulating Rate Limit Errors

To test how your application handles rate limiting, send a message with content exactly equal to `[429]`:

```json
{
  "messages": [{ "role": "user", "content": "[429]" }],
  "model": "any-model"
}
```

This will return a 429 status code with the following response:

```json
{
  "error": {
    "message": "Too many requests. Please try again later.",
    "type": "rate_limit_error",
    "param": null,
    "code": "rate_limit_exceeded"
  }
}
```

## Configuration

You can configure the server by modifying the `PORT` variable in the code.

## Use Case

This server is primarily intended for testing applications that integrate with OpenAI's API, allowing you to develop and test without making actual API calls to OpenAI.