Files
ALwrity/docs/llm_gateway/Architecture.md

7.3 KiB
Raw Blame History

ALwrity LLM Gateway Architecture Overview

ALwritys LLM Gateway lives under llm_providers and provides a consistent, productionoriented interface for text, image, audio, and video generation across multiple model providers. It encapsulates provider differences, applies subscription enforcement, and centralizes observability and reliability patterns.

Goals

  • Unified surface for LLM operations across providers
  • Strong subscription enforcement and cost awareness
  • Resilient calls with retries and structured error handling
  • Extensible provider architecture with clear contracts
  • Transparent metrics, usage logging, and pricing integration

HighLevel Flow

  1. Entry points route requests to the appropriate capability:
  2. Subscription enforcement integrates before provider calls:
    • Uses PricingService and UsageTrackingService to validate tokens/operations
    • Blocks requests that exceed limits with actionable error payloads
  3. Provider module performs the call with providerspecific SDKs/APIs
  4. Results are normalized to ALwrity types and returned upstream

Core Components

  • Text Generation Entry: main_text_generation.py
    • Detects available providers via APIKeyManager
    • Applies strict subscription checks using PricingService and UsageTrackingService
    • Routes to Gemini or Hugging Face implementations
  • Image Generation Contracts: base.py
    • Options and Result dataclasses
    • Protocols for generation, edit, and faceswap providers
  • Video Generation Contracts: base.py
    • Options and Result dataclasses
    • Async protocol with progress callbacks
  • Provider Implementations:

Provider Abstraction

  • Image providers conform to:
    • ImageGenerationProvider.generate(options) -> ImageGenerationResult
    • ImageEditProvider.edit(options) -> ImageGenerationResult
    • FaceSwapProvider.swap_face(options) -> ImageGenerationResult
  • Video providers conform to:
    • VideoGenerationProvider.generate_video(options, progress_cb) -> VideoGenerationResult

These contracts ensure consistent options/result types so downstream UI and logging remain stable regardless of provider.

Subscription Enforcement

  • Performed in the text pipeline entry point before any provider call:
  • Preflight operations endpoint also validates multioperation cost/limits:
  • Image/video modules typically rely on the calling route to validate limits first, then perform provider calls.

Configuration and Secrets

Reliability and Error Handling

  • Exponential backoff retries using tenacity:
  • Structured exceptions surface HTTP 429 for limit breaches with usage info
  • Provider modules return normalized results; callers handle downstream persistence and telemetry

Pricing and Cost Awareness

  • Preflight cost estimation computes operation costs per provider/model:
  • Video cost calculation is provider/model aware:

Observability

  • Servicescoped loggers for each provider/module
  • Central usage logs recorded via subscription services on the calling routes
  • Provider metadata normalized in result objects for consistent analytics

Extensibility Guidelines

  • Implement the appropriate Protocol interface in a new provider module
  • Normalize options and results to the gateway dataclasses
  • Keep environment/key validation local to the provider module
  • Add cost mapping in PricingService and preflight for new operations/models
  • Wire subscription validation in the calling route before invoking provider

Request Lifecycle (Text)

  1. Client submits prompt to text endpoint
  2. Entry point determines provider (env or APIKeyManager) and validates subscription limits
  3. Providerspecific function executes with retries and returns normalized text
  4. Caller logs usage and returns response to client

Request Lifecycle (Media)

  1. Client submits generation/edit/faceswap request
  2. Route validates plan limits (tokens, requests, or peroperation limits)
  3. Provider service executes call and produces normalized binary payload and metadata
  4. Caller logs usage and returns media/links to client

This architecture isolates provider variability while standardizing contracts, enabling safe expansion to new models and modalities without destabilizing upstream consumers.