Files
ALwrity/docs/image-generation-comparison.md

9.9 KiB

Image Generation Implementation Comparison

Overview

This document compares how Podcast Maker, Story Writer, and Blog Writer implement AI image generation, focusing on model selection, provider routing, and best practices.


1. Podcast Maker (backend/api/podcast/handlers/images.py)

Key Features:

  • Dual Mode: Character-consistent generation (Ideogram Character) vs. standard generation
  • Auto Provider Selection: Uses provider: None to auto-select based on environment
  • Specialized Prompt Building: Podcast-optimized prompts with scene context
  • Pre-flight Validation: Subscription checks before API calls

Model Usage:

# Character-consistent generation (when base_avatar_url provided)
generate_character_image(
    prompt=image_prompt,
    reference_image_bytes=base_avatar_bytes,
    user_id=user_id,
    style=style,  # "Realistic", "Fiction", "Auto"
    aspect_ratio=aspect_ratio,  # "1:1", "16:9", "9:16", "4:3", "3:4"
    rendering_speed=rendering_speed,  # "Default", "Turbo", "Quality"
)
# Model: ideogram-ai/ideogram-character (WaveSpeed)
# Cost: ~$0.10/image

# Standard generation (no base avatar)
generate_image(
    prompt=image_prompt,
    options={
        "provider": None,  # Auto-select
        "width": request.width,
        "height": request.height,
    },
    user_id=user_id
)
# Provider: Auto-selected (WaveSpeed, HuggingFace, or Stability)
# Cost: ~$0.04/image (varies by provider)

Prompt Building Strategy:

  • Scene Context: Scene title, content preview, visual keywords
  • Podcast Theme: Idea/topic context
  • Technical Requirements: 16:9 aspect ratio, video-optimized composition
  • Style Constraints: Realistic photography, professional broadcast quality

Error Handling:

  • Character Generation Failure: Raises HTTPException (no fallback to standard)
  • Timeout/Connection Issues: Returns 504 with retry recommendation
  • Other Errors: Returns 502 with error details

2. Story Writer (backend/services/story_writer/image_generation_service.py)

Key Features:

  • Simple Wrapper: Thin service layer around generate_image()
  • Batch Processing: Generates images for multiple scenes sequentially
  • Progress Callbacks: Supports progress tracking for batch operations
  • Error Resilience: Continues with next scene if one fails

Model Usage:

# Single scene generation
generate_image(
    prompt=image_prompt,  # From scene.image_prompt
    options={
        "provider": provider,  # Optional, can be None for auto-select
        "width": width,  # Default: 1024
        "height": height,  # Default: 1024
        "model": model,  # Optional
    },
    user_id=user_id
)

# Batch generation
generate_scene_images(
    scenes=scenes_data,
    user_id=user_id,
    provider=request.provider,  # Optional
    width=request.width or 1024,
    height=request.height or 1024,
    model=request.model,  # Optional
    progress_callback=progress_callback  # Optional
)

Prompt Strategy:

  • Direct Use: Uses scene.image_prompt directly (no prompt building)
  • Pre-generated: Prompts are created during story outline phase
  • No Modification: Service doesn't modify prompts

Error Handling:

  • HTTPException: Re-raised (e.g., 429 subscription limits)
  • Other Exceptions: Wrapped in RuntimeError, continues with next scene
  • Partial Success: Returns results with error field for failed scenes

3. Blog Writer (frontend/src/components/ImageGen/ImageGenerator.tsx)

Key Features:

  • Provider Selection: User can choose WaveSpeed, HuggingFace, or Stability
  • Model Selection: Dropdown based on selected provider
  • Dimension Validation: Frontend validation with model-specific limits
  • Prompt Optimization: "Optimize Prompt" button for blog-optimized prompts
  • Cost Display: Shows cost information for WaveSpeed models

Model Usage:

// Frontend component
const req: ImageGenerationRequest = {
  prompt,
  negative_prompt: negative,
  provider,  // 'wavespeed' | 'huggingface' | 'stability'
  model,  // e.g., 'qwen-image', 'ideogram-v3-turbo'
  width,
  height
};

// Backend routing (main_image_generation.py)
// Auto-detects Wavespeed models and remaps provider
wavespeed_models = ["qwen-image", "ideogram-v3-turbo"]
if model_lower in wavespeed_models and provider_name != "wavespeed":
    provider_name = "wavespeed"

Available Models:

  • WaveSpeed: qwen-image ($0.05), ideogram-v3-turbo ($0.10)
  • HuggingFace: black-forest-labs/FLUX.1-Krea-dev, black-forest-labs/FLUX.1-dev, runwayml/flux-dev
  • Stability AI: stable-diffusion-xl-1024-v1-0, stable-diffusion-xl-base-1.0

Dimension Limits:

  • WaveSpeed Models: Max 1024x1024
  • Other Models: Max 2048x2048
  • Frontend Validation: Clamps dimensions and shows errors

Prompt Optimization:

  • Backend Endpoint: /api/images/suggest-prompts
  • Blog-Optimized: Focuses on data visualization, infographics, text overlay areas
  • Context-Aware: Uses title, section, research, persona for better prompts

4. Common Patterns & Best Practices

Provider Selection:

# Pattern 1: Auto-select (Podcast Maker)
options = {"provider": None}  # Let _select_provider() decide

# Pattern 2: Explicit (Story Writer, Blog Writer)
options = {"provider": "wavespeed"}  # User or service specifies

# Pattern 3: Model-based remapping (Blog Writer backend)
# Automatically remaps provider based on model name

Model Routing:

# Backend auto-detection (main_image_generation.py)
# Detects Wavespeed models and remaps provider
wavespeed_models = ["qwen-image", "ideogram-v3-turbo"]
if model_lower in wavespeed_models and provider_name != "wavespeed":
    provider_name = "wavespeed"

Error Handling:

# Pattern 1: Re-raise HTTPExceptions (subscription limits)
except HTTPException:
    raise

# Pattern 2: Wrap in RuntimeError (Story Writer)
except Exception as e:
    raise RuntimeError(f"Failed to generate image: {str(e)}") from e

# Pattern 3: Return error in result (Story Writer batch)
image_results.append({
    "error": str(e),
    "image_url": None,
})

Subscription Validation:

# Pre-flight validation (Podcast Maker)
validate_image_generation_operations(
    pricing_service=pricing_service,
    user_id=user_id,
    num_images=1
)

# Built-in validation (main_image_generation.py)
_validate_image_operation(
    user_id=user_id,
    operation_type="image-generation",
    num_operations=1,
)

5. Key Differences

Feature Podcast Maker Story Writer Blog Writer
Provider Selection Auto-select Optional explicit User selects
Model Selection Auto (Character) or Auto-select Optional explicit User selects
Prompt Building Custom podcast prompts Pre-generated User + optimization
Dimension Limits No validation No validation Frontend validation
Error Handling Strict (no fallback) Resilient (continues) User-friendly alerts
Cost Display Estimated in response Not shown Shown in UI
Special Features Character consistency Batch processing Prompt optimization

6. Recommendations for Blog Writer

Already Implemented:

  1. Provider/model selection UI
  2. Dimension validation
  3. Model-based provider remapping
  4. Cost information display
  5. Prompt optimization

🔄 Could Improve:

  1. Pre-flight Validation: Add subscription checks before API calls (like Podcast Maker)
  2. Error Messages: More specific error messages based on error type
  3. Batch Generation: Support generating multiple images for blog sections
  4. Progress Tracking: Show progress for multiple image generations
  5. Retry Logic: Automatic retry for transient failures

📝 Implementation Notes:

  • Provider Routing: Backend correctly auto-detects Wavespeed models
  • Dimension Limits: Frontend validation prevents invalid dimensions
  • Cost Tracking: Handled by centralized generate_image() function
  • Asset Library: Images are saved to asset library automatically

7. Model-Specific Details

WaveSpeed Models:

  • qwen-image: $0.05/image, max 1024x1024, fast generation
  • ideogram-v3-turbo: $0.10/image, max 1024x1024, superior text rendering
  • ideogram-character: $0.10/image, character consistency (Podcast only)

HuggingFace Models:

  • FLUX.1-Krea-dev: Photorealistic, optimized for blog images
  • FLUX.1-dev: General purpose
  • flux-dev: RunwayML variant

Stability AI Models:

  • SDXL 1024: Professional quality, $0.04/image
  • SDXL Base: Standard quality

8. Code References

Backend:

  • backend/services/llm_providers/main_image_generation.py - Core generation logic
  • backend/services/llm_providers/image_generation/wavespeed_provider.py - WaveSpeed implementation
  • backend/api/podcast/handlers/images.py - Podcast image generation
  • backend/services/story_writer/image_generation_service.py - Story Writer service
  • backend/api/images.py - Blog Writer image API

Frontend:

  • frontend/src/components/ImageGen/ImageGenerator.tsx - Blog Writer component
  • frontend/src/components/shared/ImageGenerationModal.tsx - Shared modal (Podcast/YouTube)
  • frontend/src/components/StoryWriter/Phases/StoryOutlineParts/ImageEditModal.tsx - Story Writer UI

Summary

All three tools use the centralized generate_image() function but with different approaches:

  1. Podcast Maker: Specialized for character consistency, auto-selects providers
  2. Story Writer: Simple wrapper, batch processing, error resilient
  3. Blog Writer: User-controlled provider/model selection, frontend validation, prompt optimization

The Blog Writer implementation is the most user-friendly with explicit controls, while Podcast Maker focuses on specialized use cases and Story Writer prioritizes simplicity and batch operations.