# Fake LLM Server A simple server that mimics the OpenAI streaming chat completions API for testing purposes. ## Features - Implements a basic version of the OpenAI chat completions API - Supports both streaming and non-streaming responses - Always responds with "hello world" message - Simulates a 429 rate limit error when the last message is "[429]" - Configurable through environment variables ## Installation ```bash npm install ``` ## Usage Start the server: ```bash # Development mode npm run dev # Production mode npm run build npm start ``` ### Example usage ``` curl -X POST http://localhost:3500/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"messages":[{"role":"user","content":"Say something"}],"model":"any-model","stream":true}' ``` The server will be available at http://localhost:3500 by default. ## API Endpoints ### POST /v1/chat/completions This endpoint mimics OpenAI's chat completions API. #### Request Format ```json { "messages": [{ "role": "user", "content": "Your prompt here" }], "model": "any-model", "stream": true } ``` - Set `stream: true` to receive a streaming response - Set `stream: false` or omit it for a regular JSON response #### Response For non-streaming requests, you'll get a standard JSON response: ```json { "id": "chatcmpl-123456789", "object": "chat.completion", "created": 1699000000, "model": "fake-model", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "hello world" }, "finish_reason": "stop" } ] } ``` For streaming requests, you'll receive a series of server-sent events (SSE), each containing a chunk of the response. ### Simulating Rate Limit Errors To test how your application handles rate limiting, send a message with content exactly equal to `[429]`: ```json { "messages": [{ "role": "user", "content": "[429]" }], "model": "any-model" } ``` This will return a 429 status code with the following response: ```json { "error": { "message": "Too many requests. Please try again later.", "type": "rate_limit_error", "param": null, "code": "rate_limit_exceeded" } } ``` ## Configuration You can configure the server by modifying the `PORT` variable in the code. ## Use Case This server is primarily intended for testing applications that integrate with OpenAI's API, allowing you to develop and test without making actual API calls to OpenAI.