288 lines
9.9 KiB
Markdown
288 lines
9.9 KiB
Markdown
# Image Generation Implementation Comparison
|
|
|
|
## Overview
|
|
This document compares how **Podcast Maker**, **Story Writer**, and **Blog Writer** implement AI image generation, focusing on model selection, provider routing, and best practices.
|
|
|
|
---
|
|
|
|
## 1. **Podcast Maker** (`backend/api/podcast/handlers/images.py`)
|
|
|
|
### Key Features:
|
|
- **Dual Mode**: Character-consistent generation (Ideogram Character) vs. standard generation
|
|
- **Auto Provider Selection**: Uses `provider: None` to auto-select based on environment
|
|
- **Specialized Prompt Building**: Podcast-optimized prompts with scene context
|
|
- **Pre-flight Validation**: Subscription checks before API calls
|
|
|
|
### Model Usage:
|
|
```python
|
|
# Character-consistent generation (when base_avatar_url provided)
|
|
generate_character_image(
|
|
prompt=image_prompt,
|
|
reference_image_bytes=base_avatar_bytes,
|
|
user_id=user_id,
|
|
style=style, # "Realistic", "Fiction", "Auto"
|
|
aspect_ratio=aspect_ratio, # "1:1", "16:9", "9:16", "4:3", "3:4"
|
|
rendering_speed=rendering_speed, # "Default", "Turbo", "Quality"
|
|
)
|
|
# Model: ideogram-ai/ideogram-character (WaveSpeed)
|
|
# Cost: ~$0.10/image
|
|
|
|
# Standard generation (no base avatar)
|
|
generate_image(
|
|
prompt=image_prompt,
|
|
options={
|
|
"provider": None, # Auto-select
|
|
"width": request.width,
|
|
"height": request.height,
|
|
},
|
|
user_id=user_id
|
|
)
|
|
# Provider: Auto-selected (WaveSpeed, HuggingFace, or Stability)
|
|
# Cost: ~$0.04/image (varies by provider)
|
|
```
|
|
|
|
### Prompt Building Strategy:
|
|
- **Scene Context**: Scene title, content preview, visual keywords
|
|
- **Podcast Theme**: Idea/topic context
|
|
- **Technical Requirements**: 16:9 aspect ratio, video-optimized composition
|
|
- **Style Constraints**: Realistic photography, professional broadcast quality
|
|
|
|
### Error Handling:
|
|
- **Character Generation Failure**: Raises HTTPException (no fallback to standard)
|
|
- **Timeout/Connection Issues**: Returns 504 with retry recommendation
|
|
- **Other Errors**: Returns 502 with error details
|
|
|
|
---
|
|
|
|
## 2. **Story Writer** (`backend/services/story_writer/image_generation_service.py`)
|
|
|
|
### Key Features:
|
|
- **Simple Wrapper**: Thin service layer around `generate_image()`
|
|
- **Batch Processing**: Generates images for multiple scenes sequentially
|
|
- **Progress Callbacks**: Supports progress tracking for batch operations
|
|
- **Error Resilience**: Continues with next scene if one fails
|
|
|
|
### Model Usage:
|
|
```python
|
|
# Single scene generation
|
|
generate_image(
|
|
prompt=image_prompt, # From scene.image_prompt
|
|
options={
|
|
"provider": provider, # Optional, can be None for auto-select
|
|
"width": width, # Default: 1024
|
|
"height": height, # Default: 1024
|
|
"model": model, # Optional
|
|
},
|
|
user_id=user_id
|
|
)
|
|
|
|
# Batch generation
|
|
generate_scene_images(
|
|
scenes=scenes_data,
|
|
user_id=user_id,
|
|
provider=request.provider, # Optional
|
|
width=request.width or 1024,
|
|
height=request.height or 1024,
|
|
model=request.model, # Optional
|
|
progress_callback=progress_callback # Optional
|
|
)
|
|
```
|
|
|
|
### Prompt Strategy:
|
|
- **Direct Use**: Uses `scene.image_prompt` directly (no prompt building)
|
|
- **Pre-generated**: Prompts are created during story outline phase
|
|
- **No Modification**: Service doesn't modify prompts
|
|
|
|
### Error Handling:
|
|
- **HTTPException**: Re-raised (e.g., 429 subscription limits)
|
|
- **Other Exceptions**: Wrapped in RuntimeError, continues with next scene
|
|
- **Partial Success**: Returns results with error field for failed scenes
|
|
|
|
---
|
|
|
|
## 3. **Blog Writer** (`frontend/src/components/ImageGen/ImageGenerator.tsx`)
|
|
|
|
### Key Features:
|
|
- **Provider Selection**: User can choose WaveSpeed, HuggingFace, or Stability
|
|
- **Model Selection**: Dropdown based on selected provider
|
|
- **Dimension Validation**: Frontend validation with model-specific limits
|
|
- **Prompt Optimization**: "Optimize Prompt" button for blog-optimized prompts
|
|
- **Cost Display**: Shows cost information for WaveSpeed models
|
|
|
|
### Model Usage:
|
|
```typescript
|
|
// Frontend component
|
|
const req: ImageGenerationRequest = {
|
|
prompt,
|
|
negative_prompt: negative,
|
|
provider, // 'wavespeed' | 'huggingface' | 'stability'
|
|
model, // e.g., 'qwen-image', 'ideogram-v3-turbo'
|
|
width,
|
|
height
|
|
};
|
|
|
|
// Backend routing (main_image_generation.py)
|
|
// Auto-detects Wavespeed models and remaps provider
|
|
wavespeed_models = ["qwen-image", "ideogram-v3-turbo"]
|
|
if model_lower in wavespeed_models and provider_name != "wavespeed":
|
|
provider_name = "wavespeed"
|
|
```
|
|
|
|
### Available Models:
|
|
- **WaveSpeed**: `qwen-image` ($0.05), `ideogram-v3-turbo` ($0.10)
|
|
- **HuggingFace**: `black-forest-labs/FLUX.1-Krea-dev`, `black-forest-labs/FLUX.1-dev`, `runwayml/flux-dev`
|
|
- **Stability AI**: `stable-diffusion-xl-1024-v1-0`, `stable-diffusion-xl-base-1.0`
|
|
|
|
### Dimension Limits:
|
|
- **WaveSpeed Models**: Max 1024x1024
|
|
- **Other Models**: Max 2048x2048
|
|
- **Frontend Validation**: Clamps dimensions and shows errors
|
|
|
|
### Prompt Optimization:
|
|
- **Backend Endpoint**: `/api/images/suggest-prompts`
|
|
- **Blog-Optimized**: Focuses on data visualization, infographics, text overlay areas
|
|
- **Context-Aware**: Uses title, section, research, persona for better prompts
|
|
|
|
---
|
|
|
|
## 4. **Common Patterns & Best Practices**
|
|
|
|
### Provider Selection:
|
|
```python
|
|
# Pattern 1: Auto-select (Podcast Maker)
|
|
options = {"provider": None} # Let _select_provider() decide
|
|
|
|
# Pattern 2: Explicit (Story Writer, Blog Writer)
|
|
options = {"provider": "wavespeed"} # User or service specifies
|
|
|
|
# Pattern 3: Model-based remapping (Blog Writer backend)
|
|
# Automatically remaps provider based on model name
|
|
```
|
|
|
|
### Model Routing:
|
|
```python
|
|
# Backend auto-detection (main_image_generation.py)
|
|
# Detects Wavespeed models and remaps provider
|
|
wavespeed_models = ["qwen-image", "ideogram-v3-turbo"]
|
|
if model_lower in wavespeed_models and provider_name != "wavespeed":
|
|
provider_name = "wavespeed"
|
|
```
|
|
|
|
### Error Handling:
|
|
```python
|
|
# Pattern 1: Re-raise HTTPExceptions (subscription limits)
|
|
except HTTPException:
|
|
raise
|
|
|
|
# Pattern 2: Wrap in RuntimeError (Story Writer)
|
|
except Exception as e:
|
|
raise RuntimeError(f"Failed to generate image: {str(e)}") from e
|
|
|
|
# Pattern 3: Return error in result (Story Writer batch)
|
|
image_results.append({
|
|
"error": str(e),
|
|
"image_url": None,
|
|
})
|
|
```
|
|
|
|
### Subscription Validation:
|
|
```python
|
|
# Pre-flight validation (Podcast Maker)
|
|
validate_image_generation_operations(
|
|
pricing_service=pricing_service,
|
|
user_id=user_id,
|
|
num_images=1
|
|
)
|
|
|
|
# Built-in validation (main_image_generation.py)
|
|
_validate_image_operation(
|
|
user_id=user_id,
|
|
operation_type="image-generation",
|
|
num_operations=1,
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
## 5. **Key Differences**
|
|
|
|
| Feature | Podcast Maker | Story Writer | Blog Writer |
|
|
|---------|---------------|--------------|-------------|
|
|
| **Provider Selection** | Auto-select | Optional explicit | User selects |
|
|
| **Model Selection** | Auto (Character) or Auto-select | Optional explicit | User selects |
|
|
| **Prompt Building** | Custom podcast prompts | Pre-generated | User + optimization |
|
|
| **Dimension Limits** | No validation | No validation | Frontend validation |
|
|
| **Error Handling** | Strict (no fallback) | Resilient (continues) | User-friendly alerts |
|
|
| **Cost Display** | Estimated in response | Not shown | Shown in UI |
|
|
| **Special Features** | Character consistency | Batch processing | Prompt optimization |
|
|
|
|
---
|
|
|
|
## 6. **Recommendations for Blog Writer**
|
|
|
|
### ✅ Already Implemented:
|
|
1. ✅ Provider/model selection UI
|
|
2. ✅ Dimension validation
|
|
3. ✅ Model-based provider remapping
|
|
4. ✅ Cost information display
|
|
5. ✅ Prompt optimization
|
|
|
|
### 🔄 Could Improve:
|
|
1. **Pre-flight Validation**: Add subscription checks before API calls (like Podcast Maker)
|
|
2. **Error Messages**: More specific error messages based on error type
|
|
3. **Batch Generation**: Support generating multiple images for blog sections
|
|
4. **Progress Tracking**: Show progress for multiple image generations
|
|
5. **Retry Logic**: Automatic retry for transient failures
|
|
|
|
### 📝 Implementation Notes:
|
|
- **Provider Routing**: Backend correctly auto-detects Wavespeed models
|
|
- **Dimension Limits**: Frontend validation prevents invalid dimensions
|
|
- **Cost Tracking**: Handled by centralized `generate_image()` function
|
|
- **Asset Library**: Images are saved to asset library automatically
|
|
|
|
---
|
|
|
|
## 7. **Model-Specific Details**
|
|
|
|
### WaveSpeed Models:
|
|
- **qwen-image**: $0.05/image, max 1024x1024, fast generation
|
|
- **ideogram-v3-turbo**: $0.10/image, max 1024x1024, superior text rendering
|
|
- **ideogram-character**: $0.10/image, character consistency (Podcast only)
|
|
|
|
### HuggingFace Models:
|
|
- **FLUX.1-Krea-dev**: Photorealistic, optimized for blog images
|
|
- **FLUX.1-dev**: General purpose
|
|
- **flux-dev**: RunwayML variant
|
|
|
|
### Stability AI Models:
|
|
- **SDXL 1024**: Professional quality, $0.04/image
|
|
- **SDXL Base**: Standard quality
|
|
|
|
---
|
|
|
|
## 8. **Code References**
|
|
|
|
### Backend:
|
|
- `backend/services/llm_providers/main_image_generation.py` - Core generation logic
|
|
- `backend/services/llm_providers/image_generation/wavespeed_provider.py` - WaveSpeed implementation
|
|
- `backend/api/podcast/handlers/images.py` - Podcast image generation
|
|
- `backend/services/story_writer/image_generation_service.py` - Story Writer service
|
|
- `backend/api/images.py` - Blog Writer image API
|
|
|
|
### Frontend:
|
|
- `frontend/src/components/ImageGen/ImageGenerator.tsx` - Blog Writer component
|
|
- `frontend/src/components/shared/ImageGenerationModal.tsx` - Shared modal (Podcast/YouTube)
|
|
- `frontend/src/components/StoryWriter/Phases/StoryOutlineParts/ImageEditModal.tsx` - Story Writer UI
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
All three tools use the centralized `generate_image()` function but with different approaches:
|
|
|
|
1. **Podcast Maker**: Specialized for character consistency, auto-selects providers
|
|
2. **Story Writer**: Simple wrapper, batch processing, error resilient
|
|
3. **Blog Writer**: User-controlled provider/model selection, frontend validation, prompt optimization
|
|
|
|
The Blog Writer implementation is the most user-friendly with explicit controls, while Podcast Maker focuses on specialized use cases and Story Writer prioritizes simplicity and batch operations.
|