Files
ALwrity/docs/llm_gateway/Features_and_Status.md

3.4 KiB
Raw Blame History

LLM Gateway Features & Implementation Status

This document provides a high-level overview of the LLM Gateway's capabilities and the current production status of each component.

Core Features

  • Unified Interface: Single API surface for text, image, video, and audio generation, abstracting away provider-specific SDKs.
  • Provider Agnostic: Switch between Gemini, Hugging Face, Stability, WaveSpeed, etc., via configuration or runtime parameters.
  • Subscription Enforcement: Strict pre-flight checks against user plans (Free, Basic, Pro, Enterprise) before any API call.
  • Cost Awareness: Granular tracking of input/output tokens, request counts, and media generation costs per provider/model.
  • Resilience: Built-in retries (exponential backoff) for transient failures (rate limits, timeouts).
  • Observability: Centralized logging (APIUsageLog) and usage aggregation (UsageSummary) for all modalities.
  • Streaming Support: (Partial) Infrastructure exists for text streaming, though primarily used for blocking responses currently.

Implementation Status

1. Text Generation

Feature Provider Status Notes
Chat/Completion Google Gemini Production Default provider. Supports gemini-2.0-flash.
Chat/Completion Hugging Face Production via Inference Providers (e.g., mistralai/Mistral-7B).
Structured JSON Gemini Production Uses response_schema for reliable parsing.
Structured JSON Hugging Face Production Uses response_format={ "type": "json_object" }.

2. Image Generation

Feature Provider Status Notes
Text-to-Image Google Gemini Production Imagen 3 models.
Text-to-Image Hugging Face Production FLUX.1 via fal-ai/Black Forest Labs.
Text-to-Image Stability AI Production Core/SD3 models.
Text-to-Image WaveSpeed Production High-speed generation.
Image Editing WaveSpeed Production Inpainting, background removal, face swap.

3. Video Generation

Feature Provider Status Notes
Text-to-Video WaveSpeed Production HunyuanVideo-1.5, LTX-2 Pro.
Image-to-Video WaveSpeed 🚧 Planned Roadmap item.

4. Audio Generation

Feature Provider Status Notes
Text-to-Speech Gemini Production Audio generation capability.
Text-to-Speech WaveSpeed Production Fast TTS.
Speech-to-Text Gemini Production Transcription (via audio_to_text_generation).

5. Research & Tools

Feature Provider Status Notes
Web Search Tavily Production Integrated for grounded research.
Web Search Serper Production Google Search API alternative.
Web Search Exa Production Neural search.

Roadmap & Next Steps

  • Streaming Standardization: Unify streaming interfaces across all text providers for consistent frontend UX.
  • Model Fallbacks: Automatic failover to secondary providers if the primary is down (currently manual/env-based).
  • Fine-tuning Support: Add gateway endpoints for triggering and using fine-tuned jobs.
  • Caching Layer: Redis-based semantic caching for frequent queries to reduce costs.