Files
ALwrity/docs/llm_gateway/Features_and_Status.md

60 lines
3.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# LLM Gateway Features & Implementation Status
This document provides a high-level overview of the LLM Gateway's capabilities and the current production status of each component.
## Core Features
- **Unified Interface**: Single API surface for text, image, video, and audio generation, abstracting away provider-specific SDKs.
- **Provider Agnostic**: Switch between Gemini, Hugging Face, Stability, WaveSpeed, etc., via configuration or runtime parameters.
- **Subscription Enforcement**: Strict pre-flight checks against user plans (Free, Basic, Pro, Enterprise) before any API call.
- **Cost Awareness**: Granular tracking of input/output tokens, request counts, and media generation costs per provider/model.
- **Resilience**: Built-in retries (exponential backoff) for transient failures (rate limits, timeouts).
- **Observability**: Centralized logging (`APIUsageLog`) and usage aggregation (`UsageSummary`) for all modalities.
- **Streaming Support**: (Partial) Infrastructure exists for text streaming, though primarily used for blocking responses currently.
## Implementation Status
### 1. Text Generation
| Feature | Provider | Status | Notes |
| :--- | :--- | :--- | :--- |
| **Chat/Completion** | Google Gemini | ✅ Production | Default provider. Supports `gemini-2.0-flash`. |
| **Chat/Completion** | Hugging Face | ✅ Production | via Inference Providers (e.g., `mistralai/Mistral-7B`). |
| **Structured JSON** | Gemini | ✅ Production | Uses `response_schema` for reliable parsing. |
| **Structured JSON** | Hugging Face | ✅ Production | Uses `response_format={ "type": "json_object" }`. |
### 2. Image Generation
| Feature | Provider | Status | Notes |
| :--- | :--- | :--- | :--- |
| **Text-to-Image** | Google Gemini | ✅ Production | Imagen 3 models. |
| **Text-to-Image** | Hugging Face | ✅ Production | FLUX.1 via fal-ai/Black Forest Labs. |
| **Text-to-Image** | Stability AI | ✅ Production | Core/SD3 models. |
| **Text-to-Image** | WaveSpeed | ✅ Production | High-speed generation. |
| **Image Editing** | WaveSpeed | ✅ Production | Inpainting, background removal, face swap. |
### 3. Video Generation
| Feature | Provider | Status | Notes |
| :--- | :--- | :--- | :--- |
| **Text-to-Video** | WaveSpeed | ✅ Production | HunyuanVideo-1.5, LTX-2 Pro. |
| **Image-to-Video** | WaveSpeed | 🚧 Planned | Roadmap item. |
### 4. Audio Generation
| Feature | Provider | Status | Notes |
| :--- | :--- | :--- | :--- |
| **Text-to-Speech** | Gemini | ✅ Production | Audio generation capability. |
| **Text-to-Speech** | WaveSpeed | ✅ Production | Fast TTS. |
| **Speech-to-Text** | Gemini | ✅ Production | Transcription (via `audio_to_text_generation`). |
### 5. Research & Tools
| Feature | Provider | Status | Notes |
| :--- | :--- | :--- | :--- |
| **Web Search** | Tavily | ✅ Production | Integrated for grounded research. |
| **Web Search** | Serper | ✅ Production | Google Search API alternative. |
| **Web Search** | Exa | ✅ Production | Neural search. |
## Roadmap & Next Steps
- **Streaming Standardization**: Unify streaming interfaces across all text providers for consistent frontend UX.
- **Model Fallbacks**: Automatic failover to secondary providers if the primary is down (currently manual/env-based).
- **Fine-tuning Support**: Add gateway endpoints for triggering and using fine-tuned jobs.
- **Caching Layer**: Redis-based semantic caching for frequent queries to reduce costs.