60 lines
3.4 KiB
Markdown
60 lines
3.4 KiB
Markdown
# LLM Gateway – Features & Implementation Status
|
||
|
||
This document provides a high-level overview of the LLM Gateway's capabilities and the current production status of each component.
|
||
|
||
## Core Features
|
||
|
||
- **Unified Interface**: Single API surface for text, image, video, and audio generation, abstracting away provider-specific SDKs.
|
||
- **Provider Agnostic**: Switch between Gemini, Hugging Face, Stability, WaveSpeed, etc., via configuration or runtime parameters.
|
||
- **Subscription Enforcement**: Strict pre-flight checks against user plans (Free, Basic, Pro, Enterprise) before any API call.
|
||
- **Cost Awareness**: Granular tracking of input/output tokens, request counts, and media generation costs per provider/model.
|
||
- **Resilience**: Built-in retries (exponential backoff) for transient failures (rate limits, timeouts).
|
||
- **Observability**: Centralized logging (`APIUsageLog`) and usage aggregation (`UsageSummary`) for all modalities.
|
||
- **Streaming Support**: (Partial) Infrastructure exists for text streaming, though primarily used for blocking responses currently.
|
||
|
||
## Implementation Status
|
||
|
||
### 1. Text Generation
|
||
| Feature | Provider | Status | Notes |
|
||
| :--- | :--- | :--- | :--- |
|
||
| **Chat/Completion** | Google Gemini | ✅ Production | Default provider. Supports `gemini-2.0-flash`. |
|
||
| **Chat/Completion** | Hugging Face | ✅ Production | via Inference Providers (e.g., `mistralai/Mistral-7B`). |
|
||
| **Structured JSON** | Gemini | ✅ Production | Uses `response_schema` for reliable parsing. |
|
||
| **Structured JSON** | Hugging Face | ✅ Production | Uses `response_format={ "type": "json_object" }`. |
|
||
|
||
### 2. Image Generation
|
||
| Feature | Provider | Status | Notes |
|
||
| :--- | :--- | :--- | :--- |
|
||
| **Text-to-Image** | Google Gemini | ✅ Production | Imagen 3 models. |
|
||
| **Text-to-Image** | Hugging Face | ✅ Production | FLUX.1 via fal-ai/Black Forest Labs. |
|
||
| **Text-to-Image** | Stability AI | ✅ Production | Core/SD3 models. |
|
||
| **Text-to-Image** | WaveSpeed | ✅ Production | High-speed generation. |
|
||
| **Image Editing** | WaveSpeed | ✅ Production | Inpainting, background removal, face swap. |
|
||
|
||
### 3. Video Generation
|
||
| Feature | Provider | Status | Notes |
|
||
| :--- | :--- | :--- | :--- |
|
||
| **Text-to-Video** | WaveSpeed | ✅ Production | HunyuanVideo-1.5, LTX-2 Pro. |
|
||
| **Image-to-Video** | WaveSpeed | 🚧 Planned | Roadmap item. |
|
||
|
||
### 4. Audio Generation
|
||
| Feature | Provider | Status | Notes |
|
||
| :--- | :--- | :--- | :--- |
|
||
| **Text-to-Speech** | Gemini | ✅ Production | Audio generation capability. |
|
||
| **Text-to-Speech** | WaveSpeed | ✅ Production | Fast TTS. |
|
||
| **Speech-to-Text** | Gemini | ✅ Production | Transcription (via `audio_to_text_generation`). |
|
||
|
||
### 5. Research & Tools
|
||
| Feature | Provider | Status | Notes |
|
||
| :--- | :--- | :--- | :--- |
|
||
| **Web Search** | Tavily | ✅ Production | Integrated for grounded research. |
|
||
| **Web Search** | Serper | ✅ Production | Google Search API alternative. |
|
||
| **Web Search** | Exa | ✅ Production | Neural search. |
|
||
|
||
## Roadmap & Next Steps
|
||
|
||
- **Streaming Standardization**: Unify streaming interfaces across all text providers for consistent frontend UX.
|
||
- **Model Fallbacks**: Automatic failover to secondary providers if the primary is down (currently manual/env-based).
|
||
- **Fine-tuning Support**: Add gateway endpoints for triggering and using fine-tuned jobs.
|
||
- **Caching Layer**: Redis-based semantic caching for frequent queries to reduce costs.
|