Files

ajaysi 8193cdba67 AI Analysis and Content Strategy fixes. Enhanced Strategy Routes refactoring.

2026-01-10 19:32:50 +05:30

9.9 KiB

Raw Blame History

Image Generation Implementation Comparison

Overview

This document compares how Podcast Maker, Story Writer, and Blog Writer implement AI image generation, focusing on model selection, provider routing, and best practices.

1. Podcast Maker (`backend/api/podcast/handlers/images.py`)

Key Features:

Dual Mode: Character-consistent generation (Ideogram Character) vs. standard generation
Auto Provider Selection: Uses provider: None to auto-select based on environment
Specialized Prompt Building: Podcast-optimized prompts with scene context
Pre-flight Validation: Subscription checks before API calls

Model Usage:

# Character-consistent generation (when base_avatar_url provided)
generate_character_image(
    prompt=image_prompt,
    reference_image_bytes=base_avatar_bytes,
    user_id=user_id,
    style=style,  # "Realistic", "Fiction", "Auto"
    aspect_ratio=aspect_ratio,  # "1:1", "16:9", "9:16", "4:3", "3:4"
    rendering_speed=rendering_speed,  # "Default", "Turbo", "Quality"
)
# Model: ideogram-ai/ideogram-character (WaveSpeed)
# Cost: ~$0.10/image

# Standard generation (no base avatar)
generate_image(
    prompt=image_prompt,
    options={
        "provider": None,  # Auto-select
        "width": request.width,
        "height": request.height,
    },
    user_id=user_id
)
# Provider: Auto-selected (WaveSpeed, HuggingFace, or Stability)
# Cost: ~$0.04/image (varies by provider)

Prompt Building Strategy:

Scene Context: Scene title, content preview, visual keywords
Podcast Theme: Idea/topic context
Technical Requirements: 16:9 aspect ratio, video-optimized composition
Style Constraints: Realistic photography, professional broadcast quality

Error Handling:

Character Generation Failure: Raises HTTPException (no fallback to standard)
Timeout/Connection Issues: Returns 504 with retry recommendation
Other Errors: Returns 502 with error details

2. Story Writer (`backend/services/story_writer/image_generation_service.py`)

Key Features:

Simple Wrapper: Thin service layer around generate_image()
Batch Processing: Generates images for multiple scenes sequentially
Progress Callbacks: Supports progress tracking for batch operations
Error Resilience: Continues with next scene if one fails

Model Usage:

# Single scene generation
generate_image(
    prompt=image_prompt,  # From scene.image_prompt
    options={
        "provider": provider,  # Optional, can be None for auto-select
        "width": width,  # Default: 1024
        "height": height,  # Default: 1024
        "model": model,  # Optional
    },
    user_id=user_id
)

# Batch generation
generate_scene_images(
    scenes=scenes_data,
    user_id=user_id,
    provider=request.provider,  # Optional
    width=request.width or 1024,
    height=request.height or 1024,
    model=request.model,  # Optional
    progress_callback=progress_callback  # Optional
)

Prompt Strategy:

Direct Use: Uses scene.image_prompt directly (no prompt building)
Pre-generated: Prompts are created during story outline phase
No Modification: Service doesn't modify prompts

Error Handling:

HTTPException: Re-raised (e.g., 429 subscription limits)
Other Exceptions: Wrapped in RuntimeError, continues with next scene
Partial Success: Returns results with error field for failed scenes

3. Blog Writer (`frontend/src/components/ImageGen/ImageGenerator.tsx`)

Key Features:

Provider Selection: User can choose WaveSpeed, HuggingFace, or Stability
Model Selection: Dropdown based on selected provider
Dimension Validation: Frontend validation with model-specific limits
Prompt Optimization: "Optimize Prompt" button for blog-optimized prompts
Cost Display: Shows cost information for WaveSpeed models

Model Usage:

// Frontend component
const req: ImageGenerationRequest = {
  prompt,
  negative_prompt: negative,
  provider,  // 'wavespeed' | 'huggingface' | 'stability'
  model,  // e.g., 'qwen-image', 'ideogram-v3-turbo'
  width,
  height
};

// Backend routing (main_image_generation.py)
// Auto-detects Wavespeed models and remaps provider
wavespeed_models = ["qwen-image", "ideogram-v3-turbo"]
if model_lower in wavespeed_models and provider_name != "wavespeed":
    provider_name = "wavespeed"

Available Models:

WaveSpeed: qwen-image ($0.05), ideogram-v3-turbo ($0.10)
HuggingFace: black-forest-labs/FLUX.1-Krea-dev, black-forest-labs/FLUX.1-dev, runwayml/flux-dev
Stability AI: stable-diffusion-xl-1024-v1-0, stable-diffusion-xl-base-1.0

Dimension Limits:

WaveSpeed Models: Max 1024x1024
Other Models: Max 2048x2048
Frontend Validation: Clamps dimensions and shows errors

Prompt Optimization:

Backend Endpoint: /api/images/suggest-prompts
Blog-Optimized: Focuses on data visualization, infographics, text overlay areas
Context-Aware: Uses title, section, research, persona for better prompts

4. Common Patterns & Best Practices

Provider Selection:

# Pattern 1: Auto-select (Podcast Maker)
options = {"provider": None}  # Let _select_provider() decide

# Pattern 2: Explicit (Story Writer, Blog Writer)
options = {"provider": "wavespeed"}  # User or service specifies

# Pattern 3: Model-based remapping (Blog Writer backend)
# Automatically remaps provider based on model name

Model Routing:

# Backend auto-detection (main_image_generation.py)
# Detects Wavespeed models and remaps provider
wavespeed_models = ["qwen-image", "ideogram-v3-turbo"]
if model_lower in wavespeed_models and provider_name != "wavespeed":
    provider_name = "wavespeed"

Error Handling:

# Pattern 1: Re-raise HTTPExceptions (subscription limits)
except HTTPException:
    raise

# Pattern 2: Wrap in RuntimeError (Story Writer)
except Exception as e:
    raise RuntimeError(f"Failed to generate image: {str(e)}") from e

# Pattern 3: Return error in result (Story Writer batch)
image_results.append({
    "error": str(e),
    "image_url": None,
})

Subscription Validation:

# Pre-flight validation (Podcast Maker)
validate_image_generation_operations(
    pricing_service=pricing_service,
    user_id=user_id,
    num_images=1
)

# Built-in validation (main_image_generation.py)
_validate_image_operation(
    user_id=user_id,
    operation_type="image-generation",
    num_operations=1,
)

5. Key Differences

Feature	Podcast Maker	Story Writer	Blog Writer
Provider Selection	Auto-select	Optional explicit	User selects
Model Selection	Auto (Character) or Auto-select	Optional explicit	User selects
Prompt Building	Custom podcast prompts	Pre-generated	User + optimization
Dimension Limits	No validation	No validation	Frontend validation
Error Handling	Strict (no fallback)	Resilient (continues)	User-friendly alerts
Cost Display	Estimated in response	Not shown	Shown in UI
Special Features	Character consistency	Batch processing	Prompt optimization

6. Recommendations for Blog Writer

✅ Already Implemented:

✅ Provider/model selection UI
✅ Dimension validation
✅ Model-based provider remapping
✅ Cost information display
✅ Prompt optimization

🔄 Could Improve:

Pre-flight Validation: Add subscription checks before API calls (like Podcast Maker)
Error Messages: More specific error messages based on error type
Batch Generation: Support generating multiple images for blog sections
Progress Tracking: Show progress for multiple image generations
Retry Logic: Automatic retry for transient failures

📝 Implementation Notes:

Provider Routing: Backend correctly auto-detects Wavespeed models
Dimension Limits: Frontend validation prevents invalid dimensions
Cost Tracking: Handled by centralized generate_image() function
Asset Library: Images are saved to asset library automatically

7. Model-Specific Details

WaveSpeed Models:

qwen-image: $0.05/image, max 1024x1024, fast generation
ideogram-v3-turbo: $0.10/image, max 1024x1024, superior text rendering
ideogram-character: $0.10/image, character consistency (Podcast only)

HuggingFace Models:

FLUX.1-Krea-dev: Photorealistic, optimized for blog images
FLUX.1-dev: General purpose
flux-dev: RunwayML variant

Stability AI Models:

SDXL 1024: Professional quality, $0.04/image
SDXL Base: Standard quality

8. Code References

Backend:

backend/services/llm_providers/main_image_generation.py - Core generation logic
backend/services/llm_providers/image_generation/wavespeed_provider.py - WaveSpeed implementation
backend/api/podcast/handlers/images.py - Podcast image generation
backend/services/story_writer/image_generation_service.py - Story Writer service
backend/api/images.py - Blog Writer image API

Frontend:

frontend/src/components/ImageGen/ImageGenerator.tsx - Blog Writer component
frontend/src/components/shared/ImageGenerationModal.tsx - Shared modal (Podcast/YouTube)
frontend/src/components/StoryWriter/Phases/StoryOutlineParts/ImageEditModal.tsx - Story Writer UI

Summary

All three tools use the centralized generate_image() function but with different approaches:

Podcast Maker: Specialized for character consistency, auto-selects providers
Story Writer: Simple wrapper, batch processing, error resilient
Blog Writer: User-controlled provider/model selection, frontend validation, prompt optimization

The Blog Writer implementation is the most user-friendly with explicit controls, while Podcast Maker focuses on specialized use cases and Story Writer prioritizes simplicity and batch operations.

9.9 KiB Raw Blame History

Image Generation Implementation Comparison

Overview

1. Podcast Maker (backend/api/podcast/handlers/images.py)

Key Features:

Model Usage:

Prompt Building Strategy:

Error Handling:

2. Story Writer (backend/services/story_writer/image_generation_service.py)

Key Features:

Model Usage:

Prompt Strategy:

Error Handling:

3. Blog Writer (frontend/src/components/ImageGen/ImageGenerator.tsx)

Key Features:

Model Usage:

Available Models:

Dimension Limits:

Prompt Optimization:

4. Common Patterns & Best Practices

Provider Selection:

Model Routing:

Error Handling:

Subscription Validation:

5. Key Differences

6. Recommendations for Blog Writer

✅ Already Implemented:

🔄 Could Improve:

📝 Implementation Notes:

7. Model-Specific Details

WaveSpeed Models:

HuggingFace Models:

Stability AI Models:

8. Code References

Backend:

Frontend:

Summary

9.9 KiB

Raw Blame History

1. Podcast Maker (`backend/api/podcast/handlers/images.py`)

2. Story Writer (`backend/services/story_writer/image_generation_service.py`)

3. Blog Writer (`frontend/src/components/ImageGen/ImageGenerator.tsx`)