7.3 KiB
7.3 KiB
ALwrity LLM Gateway – Architecture Overview
ALwrity’s LLM Gateway lives under llm_providers and provides a consistent, production‑oriented interface for text, image, audio, and video generation across multiple model providers. It encapsulates provider differences, applies subscription enforcement, and centralizes observability and reliability patterns.
Goals
- Unified surface for LLM operations across providers
- Strong subscription enforcement and cost awareness
- Resilient calls with retries and structured error handling
- Extensible provider architecture with clear contracts
- Transparent metrics, usage logging, and pricing integration
High‑Level Flow
- Entry points route requests to the appropriate capability:
- Text generation via main_text_generation.py
- Image generation and editing via image_generation
- Video generation via video_generation
- Audio/STT via audio_to_text_generation
- Subscription enforcement integrates before provider calls:
- Uses PricingService and UsageTrackingService to validate tokens/operations
- Blocks requests that exceed limits with actionable error payloads
- Provider module performs the call with provider‑specific SDKs/APIs
- Results are normalized to ALwrity types and returned upstream
Core Components
- Text Generation Entry: main_text_generation.py
- Detects available providers via APIKeyManager
- Applies strict subscription checks using PricingService and UsageTrackingService
- Routes to Gemini or Hugging Face implementations
- Image Generation Contracts: base.py
- Options and Result dataclasses
- Protocols for generation, edit, and face‑swap providers
- Video Generation Contracts: base.py
- Options and Result dataclasses
- Async protocol with progress callbacks
- Provider Implementations:
- Gemini text: gemini_provider.py
- Hugging Face text: huggingface_provider.py
- Hugging Face image: hf_provider.py
- WaveSpeed video: wavespeed_provider.py
Provider Abstraction
- Image providers conform to:
- ImageGenerationProvider.generate(options) -> ImageGenerationResult
- ImageEditProvider.edit(options) -> ImageGenerationResult
- FaceSwapProvider.swap_face(options) -> ImageGenerationResult
- Video providers conform to:
- VideoGenerationProvider.generate_video(options, progress_cb) -> VideoGenerationResult
These contracts ensure consistent options/result types so downstream UI and logging remain stable regardless of provider.
Subscription Enforcement
- Performed in the text pipeline entry point before any provider call:
- See enforcement and usage checks in main_text_generation.py
- Preflight operations endpoint also validates multi‑operation cost/limits:
- See preflight.py
- Image/video modules typically rely on the calling route to validate limits first, then perform provider calls.
Configuration and Secrets
- Gemini: GEMINI_API_KEY
- Loaded and validated in gemini_provider.py
- Hugging Face: HF_TOKEN
- Loaded and validated in huggingface_provider.py
- Hugging Face image defaults: HF_IMAGE_MODEL
- Used in image_generation/hf_provider.py
- Provider clients must never log secrets; logs are provider‑scoped via get_service_logger.
Reliability and Error Handling
- Exponential backoff retries using tenacity:
- Gemini text: gemini_text_response
- Hugging Face text: huggingface_text_response
- Structured exceptions surface HTTP 429 for limit breaches with usage info
- Provider modules return normalized results; callers handle downstream persistence and telemetry
Pricing and Cost Awareness
- Preflight cost estimation computes operation costs per provider/model:
- See multi‑operation handling in preflight.py
- Video cost calculation is provider/model aware:
- See WaveSpeed services and
calculate_costin video_generation/wavespeed_provider.py
- See WaveSpeed services and
Observability
- Service‑scoped loggers for each provider/module
- Central usage logs recorded via subscription services on the calling routes
- Provider metadata normalized in result objects for consistent analytics
Extensibility Guidelines
- Implement the appropriate Protocol interface in a new provider module
- Normalize options and results to the gateway dataclasses
- Keep environment/key validation local to the provider module
- Add cost mapping in PricingService and preflight for new operations/models
- Wire subscription validation in the calling route before invoking provider
Request Lifecycle (Text)
- Client submits prompt to text endpoint
- Entry point determines provider (env or APIKeyManager) and validates subscription limits
- Provider‑specific function executes with retries and returns normalized text
- Caller logs usage and returns response to client
Request Lifecycle (Media)
- Client submits generation/edit/face‑swap request
- Route validates plan limits (tokens, requests, or per‑operation limits)
- Provider service executes call and produces normalized binary payload and metadata
- Caller logs usage and returns media/links to client
This architecture isolates provider variability while standardizing contracts, enabling safe expansion to new models and modalities without destabilizing upstream consumers.