Files
ALwrity/docs/image-generation-comparison.md

288 lines
9.9 KiB
Markdown

# Image Generation Implementation Comparison
## Overview
This document compares how **Podcast Maker**, **Story Writer**, and **Blog Writer** implement AI image generation, focusing on model selection, provider routing, and best practices.
---
## 1. **Podcast Maker** (`backend/api/podcast/handlers/images.py`)
### Key Features:
- **Dual Mode**: Character-consistent generation (Ideogram Character) vs. standard generation
- **Auto Provider Selection**: Uses `provider: None` to auto-select based on environment
- **Specialized Prompt Building**: Podcast-optimized prompts with scene context
- **Pre-flight Validation**: Subscription checks before API calls
### Model Usage:
```python
# Character-consistent generation (when base_avatar_url provided)
generate_character_image(
prompt=image_prompt,
reference_image_bytes=base_avatar_bytes,
user_id=user_id,
style=style, # "Realistic", "Fiction", "Auto"
aspect_ratio=aspect_ratio, # "1:1", "16:9", "9:16", "4:3", "3:4"
rendering_speed=rendering_speed, # "Default", "Turbo", "Quality"
)
# Model: ideogram-ai/ideogram-character (WaveSpeed)
# Cost: ~$0.10/image
# Standard generation (no base avatar)
generate_image(
prompt=image_prompt,
options={
"provider": None, # Auto-select
"width": request.width,
"height": request.height,
},
user_id=user_id
)
# Provider: Auto-selected (WaveSpeed, HuggingFace, or Stability)
# Cost: ~$0.04/image (varies by provider)
```
### Prompt Building Strategy:
- **Scene Context**: Scene title, content preview, visual keywords
- **Podcast Theme**: Idea/topic context
- **Technical Requirements**: 16:9 aspect ratio, video-optimized composition
- **Style Constraints**: Realistic photography, professional broadcast quality
### Error Handling:
- **Character Generation Failure**: Raises HTTPException (no fallback to standard)
- **Timeout/Connection Issues**: Returns 504 with retry recommendation
- **Other Errors**: Returns 502 with error details
---
## 2. **Story Writer** (`backend/services/story_writer/image_generation_service.py`)
### Key Features:
- **Simple Wrapper**: Thin service layer around `generate_image()`
- **Batch Processing**: Generates images for multiple scenes sequentially
- **Progress Callbacks**: Supports progress tracking for batch operations
- **Error Resilience**: Continues with next scene if one fails
### Model Usage:
```python
# Single scene generation
generate_image(
prompt=image_prompt, # From scene.image_prompt
options={
"provider": provider, # Optional, can be None for auto-select
"width": width, # Default: 1024
"height": height, # Default: 1024
"model": model, # Optional
},
user_id=user_id
)
# Batch generation
generate_scene_images(
scenes=scenes_data,
user_id=user_id,
provider=request.provider, # Optional
width=request.width or 1024,
height=request.height or 1024,
model=request.model, # Optional
progress_callback=progress_callback # Optional
)
```
### Prompt Strategy:
- **Direct Use**: Uses `scene.image_prompt` directly (no prompt building)
- **Pre-generated**: Prompts are created during story outline phase
- **No Modification**: Service doesn't modify prompts
### Error Handling:
- **HTTPException**: Re-raised (e.g., 429 subscription limits)
- **Other Exceptions**: Wrapped in RuntimeError, continues with next scene
- **Partial Success**: Returns results with error field for failed scenes
---
## 3. **Blog Writer** (`frontend/src/components/ImageGen/ImageGenerator.tsx`)
### Key Features:
- **Provider Selection**: User can choose WaveSpeed, HuggingFace, or Stability
- **Model Selection**: Dropdown based on selected provider
- **Dimension Validation**: Frontend validation with model-specific limits
- **Prompt Optimization**: "Optimize Prompt" button for blog-optimized prompts
- **Cost Display**: Shows cost information for WaveSpeed models
### Model Usage:
```typescript
// Frontend component
const req: ImageGenerationRequest = {
prompt,
negative_prompt: negative,
provider, // 'wavespeed' | 'huggingface' | 'stability'
model, // e.g., 'qwen-image', 'ideogram-v3-turbo'
width,
height
};
// Backend routing (main_image_generation.py)
// Auto-detects Wavespeed models and remaps provider
wavespeed_models = ["qwen-image", "ideogram-v3-turbo"]
if model_lower in wavespeed_models and provider_name != "wavespeed":
provider_name = "wavespeed"
```
### Available Models:
- **WaveSpeed**: `qwen-image` ($0.05), `ideogram-v3-turbo` ($0.10)
- **HuggingFace**: `black-forest-labs/FLUX.1-Krea-dev`, `black-forest-labs/FLUX.1-dev`, `runwayml/flux-dev`
- **Stability AI**: `stable-diffusion-xl-1024-v1-0`, `stable-diffusion-xl-base-1.0`
### Dimension Limits:
- **WaveSpeed Models**: Max 1024x1024
- **Other Models**: Max 2048x2048
- **Frontend Validation**: Clamps dimensions and shows errors
### Prompt Optimization:
- **Backend Endpoint**: `/api/images/suggest-prompts`
- **Blog-Optimized**: Focuses on data visualization, infographics, text overlay areas
- **Context-Aware**: Uses title, section, research, persona for better prompts
---
## 4. **Common Patterns & Best Practices**
### Provider Selection:
```python
# Pattern 1: Auto-select (Podcast Maker)
options = {"provider": None} # Let _select_provider() decide
# Pattern 2: Explicit (Story Writer, Blog Writer)
options = {"provider": "wavespeed"} # User or service specifies
# Pattern 3: Model-based remapping (Blog Writer backend)
# Automatically remaps provider based on model name
```
### Model Routing:
```python
# Backend auto-detection (main_image_generation.py)
# Detects Wavespeed models and remaps provider
wavespeed_models = ["qwen-image", "ideogram-v3-turbo"]
if model_lower in wavespeed_models and provider_name != "wavespeed":
provider_name = "wavespeed"
```
### Error Handling:
```python
# Pattern 1: Re-raise HTTPExceptions (subscription limits)
except HTTPException:
raise
# Pattern 2: Wrap in RuntimeError (Story Writer)
except Exception as e:
raise RuntimeError(f"Failed to generate image: {str(e)}") from e
# Pattern 3: Return error in result (Story Writer batch)
image_results.append({
"error": str(e),
"image_url": None,
})
```
### Subscription Validation:
```python
# Pre-flight validation (Podcast Maker)
validate_image_generation_operations(
pricing_service=pricing_service,
user_id=user_id,
num_images=1
)
# Built-in validation (main_image_generation.py)
_validate_image_operation(
user_id=user_id,
operation_type="image-generation",
num_operations=1,
)
```
---
## 5. **Key Differences**
| Feature | Podcast Maker | Story Writer | Blog Writer |
|---------|---------------|--------------|-------------|
| **Provider Selection** | Auto-select | Optional explicit | User selects |
| **Model Selection** | Auto (Character) or Auto-select | Optional explicit | User selects |
| **Prompt Building** | Custom podcast prompts | Pre-generated | User + optimization |
| **Dimension Limits** | No validation | No validation | Frontend validation |
| **Error Handling** | Strict (no fallback) | Resilient (continues) | User-friendly alerts |
| **Cost Display** | Estimated in response | Not shown | Shown in UI |
| **Special Features** | Character consistency | Batch processing | Prompt optimization |
---
## 6. **Recommendations for Blog Writer**
### ✅ Already Implemented:
1. ✅ Provider/model selection UI
2. ✅ Dimension validation
3. ✅ Model-based provider remapping
4. ✅ Cost information display
5. ✅ Prompt optimization
### 🔄 Could Improve:
1. **Pre-flight Validation**: Add subscription checks before API calls (like Podcast Maker)
2. **Error Messages**: More specific error messages based on error type
3. **Batch Generation**: Support generating multiple images for blog sections
4. **Progress Tracking**: Show progress for multiple image generations
5. **Retry Logic**: Automatic retry for transient failures
### 📝 Implementation Notes:
- **Provider Routing**: Backend correctly auto-detects Wavespeed models
- **Dimension Limits**: Frontend validation prevents invalid dimensions
- **Cost Tracking**: Handled by centralized `generate_image()` function
- **Asset Library**: Images are saved to asset library automatically
---
## 7. **Model-Specific Details**
### WaveSpeed Models:
- **qwen-image**: $0.05/image, max 1024x1024, fast generation
- **ideogram-v3-turbo**: $0.10/image, max 1024x1024, superior text rendering
- **ideogram-character**: $0.10/image, character consistency (Podcast only)
### HuggingFace Models:
- **FLUX.1-Krea-dev**: Photorealistic, optimized for blog images
- **FLUX.1-dev**: General purpose
- **flux-dev**: RunwayML variant
### Stability AI Models:
- **SDXL 1024**: Professional quality, $0.04/image
- **SDXL Base**: Standard quality
---
## 8. **Code References**
### Backend:
- `backend/services/llm_providers/main_image_generation.py` - Core generation logic
- `backend/services/llm_providers/image_generation/wavespeed_provider.py` - WaveSpeed implementation
- `backend/api/podcast/handlers/images.py` - Podcast image generation
- `backend/services/story_writer/image_generation_service.py` - Story Writer service
- `backend/api/images.py` - Blog Writer image API
### Frontend:
- `frontend/src/components/ImageGen/ImageGenerator.tsx` - Blog Writer component
- `frontend/src/components/shared/ImageGenerationModal.tsx` - Shared modal (Podcast/YouTube)
- `frontend/src/components/StoryWriter/Phases/StoryOutlineParts/ImageEditModal.tsx` - Story Writer UI
---
## Summary
All three tools use the centralized `generate_image()` function but with different approaches:
1. **Podcast Maker**: Specialized for character consistency, auto-selects providers
2. **Story Writer**: Simple wrapper, batch processing, error resilient
3. **Blog Writer**: User-controlled provider/model selection, frontend validation, prompt optimization
The Blog Writer implementation is the most user-friendly with explicit controls, while Podcast Maker focuses on specialized use cases and Story Writer prioritizes simplicity and batch operations.