Files
moreminimore-vibe/testing/fake-llm-server
Tanner-Maasen 2ffbbbca8f Add Azure OpenAI Custom Model Integration (#1001)
Fixes #710 

This PR implements comprehensive Azure OpenAI integration for Dyad,
enabling users to leverage Azure
OpenAI models through proper environment variable configuration. The
implementation adds Azure as a
supported provider with full integration into the existing language
model architecture, including support
  for GPT-5 models. Key features include environment-based
configuration using `AZURE_API_KEY` and `AZURE_RESOURCE_NAME`,
specialized UI components that provide clear
setup instructions and status indicators, and seamless integration with
Dyad's existing provider system.
The Azure provider leverages the @ai-sdk/azure package (v1.3.25) for
compatibility with the current
  TypeScript language model interfaces.

The implementation includes robust error handling for missing
configuration, comprehensive test coverage
with 9 new unit tests covering critical functionality like model client
creation and error scenarios, and
  an E2E test for the Azure-specific settings UI. 

<img width="1510" height="908" alt="Screenshot 2025-08-18 at 9 14 32 PM"
src="https://github.com/user-attachments/assets/04aa99e1-1590-4bb0-86c9-a67b97bc7500"
/>

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Will Chen <willchen90@gmail.com>
2025-08-30 20:47:25 -07:00
..
2025-06-17 16:59:26 -07:00
2025-06-17 16:59:26 -07:00
2025-05-13 15:34:41 -07:00
2025-05-13 15:34:41 -07:00

Fake LLM Server

A simple server that mimics the OpenAI streaming chat completions API for testing purposes.

Features

  • Implements a basic version of the OpenAI chat completions API
  • Supports both streaming and non-streaming responses
  • Always responds with "hello world" message
  • Simulates a 429 rate limit error when the last message is "[429]"
  • Configurable through environment variables

Installation

npm install

Usage

Start the server:

# Development mode
npm run dev

# Production mode
npm run build
npm start

Example usage

curl -X POST http://localhost:3500/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Say something"}],"model":"any-model","stream":true}'

The server will be available at http://localhost:3500 by default.

API Endpoints

POST /v1/chat/completions

This endpoint mimics OpenAI's chat completions API.

Request Format

{
  "messages": [{ "role": "user", "content": "Your prompt here" }],
  "model": "any-model",
  "stream": true
}
  • Set stream: true to receive a streaming response
  • Set stream: false or omit it for a regular JSON response

Response

For non-streaming requests, you'll get a standard JSON response:

{
  "id": "chatcmpl-123456789",
  "object": "chat.completion",
  "created": 1699000000,
  "model": "fake-model",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "hello world"
      },
      "finish_reason": "stop"
    }
  ]
}

For streaming requests, you'll receive a series of server-sent events (SSE), each containing a chunk of the response.

Simulating Rate Limit Errors

To test how your application handles rate limiting, send a message with content exactly equal to [429]:

{
  "messages": [{ "role": "user", "content": "[429]" }],
  "model": "any-model"
}

This will return a 429 status code with the following response:

{
  "error": {
    "message": "Too many requests. Please try again later.",
    "type": "rate_limit_error",
    "param": null,
    "code": "rate_limit_exceeded"
  }
}

Configuration

You can configure the server by modifying the PORT variable in the code.

Use Case

This server is primarily intended for testing applications that integrate with OpenAI's API, allowing you to develop and test without making actual API calls to OpenAI.