AI Analysis and Content Strategy fixes. Enhanced Strategy Routes refactoring.
This commit is contained in:
287
docs/image-generation-comparison.md
Normal file
287
docs/image-generation-comparison.md
Normal file
@@ -0,0 +1,287 @@
|
||||
# Image Generation Implementation Comparison
|
||||
|
||||
## Overview
|
||||
This document compares how **Podcast Maker**, **Story Writer**, and **Blog Writer** implement AI image generation, focusing on model selection, provider routing, and best practices.
|
||||
|
||||
---
|
||||
|
||||
## 1. **Podcast Maker** (`backend/api/podcast/handlers/images.py`)
|
||||
|
||||
### Key Features:
|
||||
- **Dual Mode**: Character-consistent generation (Ideogram Character) vs. standard generation
|
||||
- **Auto Provider Selection**: Uses `provider: None` to auto-select based on environment
|
||||
- **Specialized Prompt Building**: Podcast-optimized prompts with scene context
|
||||
- **Pre-flight Validation**: Subscription checks before API calls
|
||||
|
||||
### Model Usage:
|
||||
```python
|
||||
# Character-consistent generation (when base_avatar_url provided)
|
||||
generate_character_image(
|
||||
prompt=image_prompt,
|
||||
reference_image_bytes=base_avatar_bytes,
|
||||
user_id=user_id,
|
||||
style=style, # "Realistic", "Fiction", "Auto"
|
||||
aspect_ratio=aspect_ratio, # "1:1", "16:9", "9:16", "4:3", "3:4"
|
||||
rendering_speed=rendering_speed, # "Default", "Turbo", "Quality"
|
||||
)
|
||||
# Model: ideogram-ai/ideogram-character (WaveSpeed)
|
||||
# Cost: ~$0.10/image
|
||||
|
||||
# Standard generation (no base avatar)
|
||||
generate_image(
|
||||
prompt=image_prompt,
|
||||
options={
|
||||
"provider": None, # Auto-select
|
||||
"width": request.width,
|
||||
"height": request.height,
|
||||
},
|
||||
user_id=user_id
|
||||
)
|
||||
# Provider: Auto-selected (WaveSpeed, HuggingFace, or Stability)
|
||||
# Cost: ~$0.04/image (varies by provider)
|
||||
```
|
||||
|
||||
### Prompt Building Strategy:
|
||||
- **Scene Context**: Scene title, content preview, visual keywords
|
||||
- **Podcast Theme**: Idea/topic context
|
||||
- **Technical Requirements**: 16:9 aspect ratio, video-optimized composition
|
||||
- **Style Constraints**: Realistic photography, professional broadcast quality
|
||||
|
||||
### Error Handling:
|
||||
- **Character Generation Failure**: Raises HTTPException (no fallback to standard)
|
||||
- **Timeout/Connection Issues**: Returns 504 with retry recommendation
|
||||
- **Other Errors**: Returns 502 with error details
|
||||
|
||||
---
|
||||
|
||||
## 2. **Story Writer** (`backend/services/story_writer/image_generation_service.py`)
|
||||
|
||||
### Key Features:
|
||||
- **Simple Wrapper**: Thin service layer around `generate_image()`
|
||||
- **Batch Processing**: Generates images for multiple scenes sequentially
|
||||
- **Progress Callbacks**: Supports progress tracking for batch operations
|
||||
- **Error Resilience**: Continues with next scene if one fails
|
||||
|
||||
### Model Usage:
|
||||
```python
|
||||
# Single scene generation
|
||||
generate_image(
|
||||
prompt=image_prompt, # From scene.image_prompt
|
||||
options={
|
||||
"provider": provider, # Optional, can be None for auto-select
|
||||
"width": width, # Default: 1024
|
||||
"height": height, # Default: 1024
|
||||
"model": model, # Optional
|
||||
},
|
||||
user_id=user_id
|
||||
)
|
||||
|
||||
# Batch generation
|
||||
generate_scene_images(
|
||||
scenes=scenes_data,
|
||||
user_id=user_id,
|
||||
provider=request.provider, # Optional
|
||||
width=request.width or 1024,
|
||||
height=request.height or 1024,
|
||||
model=request.model, # Optional
|
||||
progress_callback=progress_callback # Optional
|
||||
)
|
||||
```
|
||||
|
||||
### Prompt Strategy:
|
||||
- **Direct Use**: Uses `scene.image_prompt` directly (no prompt building)
|
||||
- **Pre-generated**: Prompts are created during story outline phase
|
||||
- **No Modification**: Service doesn't modify prompts
|
||||
|
||||
### Error Handling:
|
||||
- **HTTPException**: Re-raised (e.g., 429 subscription limits)
|
||||
- **Other Exceptions**: Wrapped in RuntimeError, continues with next scene
|
||||
- **Partial Success**: Returns results with error field for failed scenes
|
||||
|
||||
---
|
||||
|
||||
## 3. **Blog Writer** (`frontend/src/components/ImageGen/ImageGenerator.tsx`)
|
||||
|
||||
### Key Features:
|
||||
- **Provider Selection**: User can choose WaveSpeed, HuggingFace, or Stability
|
||||
- **Model Selection**: Dropdown based on selected provider
|
||||
- **Dimension Validation**: Frontend validation with model-specific limits
|
||||
- **Prompt Optimization**: "Optimize Prompt" button for blog-optimized prompts
|
||||
- **Cost Display**: Shows cost information for WaveSpeed models
|
||||
|
||||
### Model Usage:
|
||||
```typescript
|
||||
// Frontend component
|
||||
const req: ImageGenerationRequest = {
|
||||
prompt,
|
||||
negative_prompt: negative,
|
||||
provider, // 'wavespeed' | 'huggingface' | 'stability'
|
||||
model, // e.g., 'qwen-image', 'ideogram-v3-turbo'
|
||||
width,
|
||||
height
|
||||
};
|
||||
|
||||
// Backend routing (main_image_generation.py)
|
||||
// Auto-detects Wavespeed models and remaps provider
|
||||
wavespeed_models = ["qwen-image", "ideogram-v3-turbo"]
|
||||
if model_lower in wavespeed_models and provider_name != "wavespeed":
|
||||
provider_name = "wavespeed"
|
||||
```
|
||||
|
||||
### Available Models:
|
||||
- **WaveSpeed**: `qwen-image` ($0.05), `ideogram-v3-turbo` ($0.10)
|
||||
- **HuggingFace**: `black-forest-labs/FLUX.1-Krea-dev`, `black-forest-labs/FLUX.1-dev`, `runwayml/flux-dev`
|
||||
- **Stability AI**: `stable-diffusion-xl-1024-v1-0`, `stable-diffusion-xl-base-1.0`
|
||||
|
||||
### Dimension Limits:
|
||||
- **WaveSpeed Models**: Max 1024x1024
|
||||
- **Other Models**: Max 2048x2048
|
||||
- **Frontend Validation**: Clamps dimensions and shows errors
|
||||
|
||||
### Prompt Optimization:
|
||||
- **Backend Endpoint**: `/api/images/suggest-prompts`
|
||||
- **Blog-Optimized**: Focuses on data visualization, infographics, text overlay areas
|
||||
- **Context-Aware**: Uses title, section, research, persona for better prompts
|
||||
|
||||
---
|
||||
|
||||
## 4. **Common Patterns & Best Practices**
|
||||
|
||||
### Provider Selection:
|
||||
```python
|
||||
# Pattern 1: Auto-select (Podcast Maker)
|
||||
options = {"provider": None} # Let _select_provider() decide
|
||||
|
||||
# Pattern 2: Explicit (Story Writer, Blog Writer)
|
||||
options = {"provider": "wavespeed"} # User or service specifies
|
||||
|
||||
# Pattern 3: Model-based remapping (Blog Writer backend)
|
||||
# Automatically remaps provider based on model name
|
||||
```
|
||||
|
||||
### Model Routing:
|
||||
```python
|
||||
# Backend auto-detection (main_image_generation.py)
|
||||
# Detects Wavespeed models and remaps provider
|
||||
wavespeed_models = ["qwen-image", "ideogram-v3-turbo"]
|
||||
if model_lower in wavespeed_models and provider_name != "wavespeed":
|
||||
provider_name = "wavespeed"
|
||||
```
|
||||
|
||||
### Error Handling:
|
||||
```python
|
||||
# Pattern 1: Re-raise HTTPExceptions (subscription limits)
|
||||
except HTTPException:
|
||||
raise
|
||||
|
||||
# Pattern 2: Wrap in RuntimeError (Story Writer)
|
||||
except Exception as e:
|
||||
raise RuntimeError(f"Failed to generate image: {str(e)}") from e
|
||||
|
||||
# Pattern 3: Return error in result (Story Writer batch)
|
||||
image_results.append({
|
||||
"error": str(e),
|
||||
"image_url": None,
|
||||
})
|
||||
```
|
||||
|
||||
### Subscription Validation:
|
||||
```python
|
||||
# Pre-flight validation (Podcast Maker)
|
||||
validate_image_generation_operations(
|
||||
pricing_service=pricing_service,
|
||||
user_id=user_id,
|
||||
num_images=1
|
||||
)
|
||||
|
||||
# Built-in validation (main_image_generation.py)
|
||||
_validate_image_operation(
|
||||
user_id=user_id,
|
||||
operation_type="image-generation",
|
||||
num_operations=1,
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. **Key Differences**
|
||||
|
||||
| Feature | Podcast Maker | Story Writer | Blog Writer |
|
||||
|---------|---------------|--------------|-------------|
|
||||
| **Provider Selection** | Auto-select | Optional explicit | User selects |
|
||||
| **Model Selection** | Auto (Character) or Auto-select | Optional explicit | User selects |
|
||||
| **Prompt Building** | Custom podcast prompts | Pre-generated | User + optimization |
|
||||
| **Dimension Limits** | No validation | No validation | Frontend validation |
|
||||
| **Error Handling** | Strict (no fallback) | Resilient (continues) | User-friendly alerts |
|
||||
| **Cost Display** | Estimated in response | Not shown | Shown in UI |
|
||||
| **Special Features** | Character consistency | Batch processing | Prompt optimization |
|
||||
|
||||
---
|
||||
|
||||
## 6. **Recommendations for Blog Writer**
|
||||
|
||||
### ✅ Already Implemented:
|
||||
1. ✅ Provider/model selection UI
|
||||
2. ✅ Dimension validation
|
||||
3. ✅ Model-based provider remapping
|
||||
4. ✅ Cost information display
|
||||
5. ✅ Prompt optimization
|
||||
|
||||
### 🔄 Could Improve:
|
||||
1. **Pre-flight Validation**: Add subscription checks before API calls (like Podcast Maker)
|
||||
2. **Error Messages**: More specific error messages based on error type
|
||||
3. **Batch Generation**: Support generating multiple images for blog sections
|
||||
4. **Progress Tracking**: Show progress for multiple image generations
|
||||
5. **Retry Logic**: Automatic retry for transient failures
|
||||
|
||||
### 📝 Implementation Notes:
|
||||
- **Provider Routing**: Backend correctly auto-detects Wavespeed models
|
||||
- **Dimension Limits**: Frontend validation prevents invalid dimensions
|
||||
- **Cost Tracking**: Handled by centralized `generate_image()` function
|
||||
- **Asset Library**: Images are saved to asset library automatically
|
||||
|
||||
---
|
||||
|
||||
## 7. **Model-Specific Details**
|
||||
|
||||
### WaveSpeed Models:
|
||||
- **qwen-image**: $0.05/image, max 1024x1024, fast generation
|
||||
- **ideogram-v3-turbo**: $0.10/image, max 1024x1024, superior text rendering
|
||||
- **ideogram-character**: $0.10/image, character consistency (Podcast only)
|
||||
|
||||
### HuggingFace Models:
|
||||
- **FLUX.1-Krea-dev**: Photorealistic, optimized for blog images
|
||||
- **FLUX.1-dev**: General purpose
|
||||
- **flux-dev**: RunwayML variant
|
||||
|
||||
### Stability AI Models:
|
||||
- **SDXL 1024**: Professional quality, $0.04/image
|
||||
- **SDXL Base**: Standard quality
|
||||
|
||||
---
|
||||
|
||||
## 8. **Code References**
|
||||
|
||||
### Backend:
|
||||
- `backend/services/llm_providers/main_image_generation.py` - Core generation logic
|
||||
- `backend/services/llm_providers/image_generation/wavespeed_provider.py` - WaveSpeed implementation
|
||||
- `backend/api/podcast/handlers/images.py` - Podcast image generation
|
||||
- `backend/services/story_writer/image_generation_service.py` - Story Writer service
|
||||
- `backend/api/images.py` - Blog Writer image API
|
||||
|
||||
### Frontend:
|
||||
- `frontend/src/components/ImageGen/ImageGenerator.tsx` - Blog Writer component
|
||||
- `frontend/src/components/shared/ImageGenerationModal.tsx` - Shared modal (Podcast/YouTube)
|
||||
- `frontend/src/components/StoryWriter/Phases/StoryOutlineParts/ImageEditModal.tsx` - Story Writer UI
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
All three tools use the centralized `generate_image()` function but with different approaches:
|
||||
|
||||
1. **Podcast Maker**: Specialized for character consistency, auto-selects providers
|
||||
2. **Story Writer**: Simple wrapper, batch processing, error resilient
|
||||
3. **Blog Writer**: User-controlled provider/model selection, frontend validation, prompt optimization
|
||||
|
||||
The Blog Writer implementation is the most user-friendly with explicit controls, while Podcast Maker focuses on specialized use cases and Story Writer prioritizes simplicity and batch operations.
|
||||
Reference in New Issue
Block a user