ALwrity/docs/image-generation-comparison.md

# Image Generation Implementation Comparison

## Overview
This document compares how **Podcast Maker**, **Story Writer**, and **Blog Writer** implement AI image generation, focusing on model selection, provider routing, and best practices.

---

## 1. **Podcast Maker** (`backend/api/podcast/handlers/images.py`)

### Key Features:
- **Dual Mode**: Character-consistent generation (Ideogram Character) vs. standard generation
- **Auto Provider Selection**: Uses `provider: None` to auto-select based on environment
- **Specialized Prompt Building**: Podcast-optimized prompts with scene context
- **Pre-flight Validation**: Subscription checks before API calls

### Model Usage:
```python
# Character-consistent generation (when base_avatar_url provided)
generate_character_image(
    prompt=image_prompt,
    reference_image_bytes=base_avatar_bytes,
    user_id=user_id,
    style=style,  # "Realistic", "Fiction", "Auto"
    aspect_ratio=aspect_ratio,  # "1:1", "16:9", "9:16", "4:3", "3:4"
    rendering_speed=rendering_speed,  # "Default", "Turbo", "Quality"
)
# Model: ideogram-ai/ideogram-character (WaveSpeed)
# Cost: ~$0.10/image

# Standard generation (no base avatar)
generate_image(
    prompt=image_prompt,
    options={
        "provider": None,  # Auto-select
        "width": request.width,
        "height": request.height,
    },
    user_id=user_id
)
# Provider: Auto-selected (WaveSpeed, HuggingFace, or Stability)
# Cost: ~$0.04/image (varies by provider)
```

### Prompt Building Strategy:
- **Scene Context**: Scene title, content preview, visual keywords
- **Podcast Theme**: Idea/topic context
- **Technical Requirements**: 16:9 aspect ratio, video-optimized composition
- **Style Constraints**: Realistic photography, professional broadcast quality

### Error Handling:
- **Character Generation Failure**: Raises HTTPException (no fallback to standard)
- **Timeout/Connection Issues**: Returns 504 with retry recommendation
- **Other Errors**: Returns 502 with error details

---

## 2. **Story Writer** (`backend/services/story_writer/image_generation_service.py`)

### Key Features:
- **Simple Wrapper**: Thin service layer around `generate_image()`
- **Batch Processing**: Generates images for multiple scenes sequentially
- **Progress Callbacks**: Supports progress tracking for batch operations
- **Error Resilience**: Continues with next scene if one fails

### Model Usage:
```python
# Single scene generation
generate_image(
    prompt=image_prompt,  # From scene.image_prompt
    options={
        "provider": provider,  # Optional, can be None for auto-select
        "width": width,  # Default: 1024
        "height": height,  # Default: 1024
        "model": model,  # Optional
    },
    user_id=user_id
)

# Batch generation
generate_scene_images(
    scenes=scenes_data,
    user_id=user_id,
    provider=request.provider,  # Optional
    width=request.width or 1024,
    height=request.height or 1024,
    model=request.model,  # Optional
    progress_callback=progress_callback  # Optional
)
```

### Prompt Strategy:
- **Direct Use**: Uses `scene.image_prompt` directly (no prompt building)
- **Pre-generated**: Prompts are created during story outline phase
- **No Modification**: Service doesn't modify prompts

### Error Handling:
- **HTTPException**: Re-raised (e.g., 429 subscription limits)
- **Other Exceptions**: Wrapped in RuntimeError, continues with next scene
- **Partial Success**: Returns results with error field for failed scenes

---

## 3. **Blog Writer** (`frontend/src/components/ImageGen/ImageGenerator.tsx`)

### Key Features:
- **Provider Selection**: User can choose WaveSpeed, HuggingFace, or Stability
- **Model Selection**: Dropdown based on selected provider
- **Dimension Validation**: Frontend validation with model-specific limits
- **Prompt Optimization**: "Optimize Prompt" button for blog-optimized prompts
- **Cost Display**: Shows cost information for WaveSpeed models

### Model Usage:
```typescript
// Frontend component
const req: ImageGenerationRequest = {
  prompt,
  negative_prompt: negative,
  provider,  // 'wavespeed' | 'huggingface' | 'stability'
  model,  // e.g., 'qwen-image', 'ideogram-v3-turbo'
  width,
  height
};

// Backend routing (main_image_generation.py)
// Auto-detects Wavespeed models and remaps provider
wavespeed_models = ["qwen-image", "ideogram-v3-turbo"]
if model_lower in wavespeed_models and provider_name != "wavespeed":
    provider_name = "wavespeed"
```

### Available Models:
- **WaveSpeed**: `qwen-image` ($0.05), `ideogram-v3-turbo` ($0.10)
- **HuggingFace**: `black-forest-labs/FLUX.1-Krea-dev`, `black-forest-labs/FLUX.1-dev`, `runwayml/flux-dev`
- **Stability AI**: `stable-diffusion-xl-1024-v1-0`, `stable-diffusion-xl-base-1.0`

### Dimension Limits:
- **WaveSpeed Models**: Max 1024x1024
- **Other Models**: Max 2048x2048
- **Frontend Validation**: Clamps dimensions and shows errors

### Prompt Optimization:
- **Backend Endpoint**: `/api/images/suggest-prompts`
- **Blog-Optimized**: Focuses on data visualization, infographics, text overlay areas
- **Context-Aware**: Uses title, section, research, persona for better prompts

---

## 4. **Common Patterns & Best Practices**

### Provider Selection:
```python
# Pattern 1: Auto-select (Podcast Maker)
options = {"provider": None}  # Let _select_provider() decide

# Pattern 2: Explicit (Story Writer, Blog Writer)
options = {"provider": "wavespeed"}  # User or service specifies

# Pattern 3: Model-based remapping (Blog Writer backend)
# Automatically remaps provider based on model name
```

### Model Routing:
```python
# Backend auto-detection (main_image_generation.py)
# Detects Wavespeed models and remaps provider
wavespeed_models = ["qwen-image", "ideogram-v3-turbo"]
if model_lower in wavespeed_models and provider_name != "wavespeed":
    provider_name = "wavespeed"
```

### Error Handling:
```python
# Pattern 1: Re-raise HTTPExceptions (subscription limits)
except HTTPException:
    raise

# Pattern 2: Wrap in RuntimeError (Story Writer)
except Exception as e:
    raise RuntimeError(f"Failed to generate image: {str(e)}") from e

# Pattern 3: Return error in result (Story Writer batch)
image_results.append({
    "error": str(e),
    "image_url": None,
})
```

### Subscription Validation:
```python
# Pre-flight validation (Podcast Maker)
validate_image_generation_operations(
    pricing_service=pricing_service,
    user_id=user_id,
    num_images=1
)

# Built-in validation (main_image_generation.py)
_validate_image_operation(
    user_id=user_id,
    operation_type="image-generation",
    num_operations=1,
)
```

---

## 5. **Key Differences**

| Feature | Podcast Maker | Story Writer | Blog Writer |
|---------|---------------|--------------|-------------|
| **Provider Selection** | Auto-select | Optional explicit | User selects |
| **Model Selection** | Auto (Character) or Auto-select | Optional explicit | User selects |
| **Prompt Building** | Custom podcast prompts | Pre-generated | User + optimization |
| **Dimension Limits** | No validation | No validation | Frontend validation |
| **Error Handling** | Strict (no fallback) | Resilient (continues) | User-friendly alerts |
| **Cost Display** | Estimated in response | Not shown | Shown in UI |
| **Special Features** | Character consistency | Batch processing | Prompt optimization |

---

## 6. **Recommendations for Blog Writer**

### ✅ Already Implemented:
1. ✅ Provider/model selection UI
2. ✅ Dimension validation
3. ✅ Model-based provider remapping
4. ✅ Cost information display
5. ✅ Prompt optimization

### 🔄 Could Improve:
1. **Pre-flight Validation**: Add subscription checks before API calls (like Podcast Maker)
2. **Error Messages**: More specific error messages based on error type
3. **Batch Generation**: Support generating multiple images for blog sections
4. **Progress Tracking**: Show progress for multiple image generations
5. **Retry Logic**: Automatic retry for transient failures

### 📝 Implementation Notes:
- **Provider Routing**: Backend correctly auto-detects Wavespeed models
- **Dimension Limits**: Frontend validation prevents invalid dimensions
- **Cost Tracking**: Handled by centralized `generate_image()` function
- **Asset Library**: Images are saved to asset library automatically

---

## 7. **Model-Specific Details**

### WaveSpeed Models:
- **qwen-image**: $0.05/image, max 1024x1024, fast generation
- **ideogram-v3-turbo**: $0.10/image, max 1024x1024, superior text rendering
- **ideogram-character**: $0.10/image, character consistency (Podcast only)

### HuggingFace Models:
- **FLUX.1-Krea-dev**: Photorealistic, optimized for blog images
- **FLUX.1-dev**: General purpose
- **flux-dev**: RunwayML variant

### Stability AI Models:
- **SDXL 1024**: Professional quality, $0.04/image
- **SDXL Base**: Standard quality

---

## 8. **Code References**

### Backend:
- `backend/services/llm_providers/main_image_generation.py` - Core generation logic
- `backend/services/llm_providers/image_generation/wavespeed_provider.py` - WaveSpeed implementation
- `backend/api/podcast/handlers/images.py` - Podcast image generation
- `backend/services/story_writer/image_generation_service.py` - Story Writer service
- `backend/api/images.py` - Blog Writer image API

### Frontend:
- `frontend/src/components/ImageGen/ImageGenerator.tsx` - Blog Writer component
- `frontend/src/components/shared/ImageGenerationModal.tsx` - Shared modal (Podcast/YouTube)
- `frontend/src/components/StoryWriter/Phases/StoryOutlineParts/ImageEditModal.tsx` - Story Writer UI

---

## Summary

All three tools use the centralized `generate_image()` function but with different approaches:

1. **Podcast Maker**: Specialized for character consistency, auto-selects providers
2. **Story Writer**: Simple wrapper, batch processing, error resilient
3. **Blog Writer**: User-controlled provider/model selection, frontend validation, prompt optimization

The Blog Writer implementation is the most user-friendly with explicit controls, while Podcast Maker focuses on specialized use cases and Story Writer prioritizes simplicity and batch operations.