9.9 KiB
9.9 KiB
Image Generation Implementation Comparison
Overview
This document compares how Podcast Maker, Story Writer, and Blog Writer implement AI image generation, focusing on model selection, provider routing, and best practices.
1. Podcast Maker (backend/api/podcast/handlers/images.py)
Key Features:
- Dual Mode: Character-consistent generation (Ideogram Character) vs. standard generation
- Auto Provider Selection: Uses
provider: Noneto auto-select based on environment - Specialized Prompt Building: Podcast-optimized prompts with scene context
- Pre-flight Validation: Subscription checks before API calls
Model Usage:
# Character-consistent generation (when base_avatar_url provided)
generate_character_image(
prompt=image_prompt,
reference_image_bytes=base_avatar_bytes,
user_id=user_id,
style=style, # "Realistic", "Fiction", "Auto"
aspect_ratio=aspect_ratio, # "1:1", "16:9", "9:16", "4:3", "3:4"
rendering_speed=rendering_speed, # "Default", "Turbo", "Quality"
)
# Model: ideogram-ai/ideogram-character (WaveSpeed)
# Cost: ~$0.10/image
# Standard generation (no base avatar)
generate_image(
prompt=image_prompt,
options={
"provider": None, # Auto-select
"width": request.width,
"height": request.height,
},
user_id=user_id
)
# Provider: Auto-selected (WaveSpeed, HuggingFace, or Stability)
# Cost: ~$0.04/image (varies by provider)
Prompt Building Strategy:
- Scene Context: Scene title, content preview, visual keywords
- Podcast Theme: Idea/topic context
- Technical Requirements: 16:9 aspect ratio, video-optimized composition
- Style Constraints: Realistic photography, professional broadcast quality
Error Handling:
- Character Generation Failure: Raises HTTPException (no fallback to standard)
- Timeout/Connection Issues: Returns 504 with retry recommendation
- Other Errors: Returns 502 with error details
2. Story Writer (backend/services/story_writer/image_generation_service.py)
Key Features:
- Simple Wrapper: Thin service layer around
generate_image() - Batch Processing: Generates images for multiple scenes sequentially
- Progress Callbacks: Supports progress tracking for batch operations
- Error Resilience: Continues with next scene if one fails
Model Usage:
# Single scene generation
generate_image(
prompt=image_prompt, # From scene.image_prompt
options={
"provider": provider, # Optional, can be None for auto-select
"width": width, # Default: 1024
"height": height, # Default: 1024
"model": model, # Optional
},
user_id=user_id
)
# Batch generation
generate_scene_images(
scenes=scenes_data,
user_id=user_id,
provider=request.provider, # Optional
width=request.width or 1024,
height=request.height or 1024,
model=request.model, # Optional
progress_callback=progress_callback # Optional
)
Prompt Strategy:
- Direct Use: Uses
scene.image_promptdirectly (no prompt building) - Pre-generated: Prompts are created during story outline phase
- No Modification: Service doesn't modify prompts
Error Handling:
- HTTPException: Re-raised (e.g., 429 subscription limits)
- Other Exceptions: Wrapped in RuntimeError, continues with next scene
- Partial Success: Returns results with error field for failed scenes
3. Blog Writer (frontend/src/components/ImageGen/ImageGenerator.tsx)
Key Features:
- Provider Selection: User can choose WaveSpeed, HuggingFace, or Stability
- Model Selection: Dropdown based on selected provider
- Dimension Validation: Frontend validation with model-specific limits
- Prompt Optimization: "Optimize Prompt" button for blog-optimized prompts
- Cost Display: Shows cost information for WaveSpeed models
Model Usage:
// Frontend component
const req: ImageGenerationRequest = {
prompt,
negative_prompt: negative,
provider, // 'wavespeed' | 'huggingface' | 'stability'
model, // e.g., 'qwen-image', 'ideogram-v3-turbo'
width,
height
};
// Backend routing (main_image_generation.py)
// Auto-detects Wavespeed models and remaps provider
wavespeed_models = ["qwen-image", "ideogram-v3-turbo"]
if model_lower in wavespeed_models and provider_name != "wavespeed":
provider_name = "wavespeed"
Available Models:
- WaveSpeed:
qwen-image($0.05),ideogram-v3-turbo($0.10) - HuggingFace:
black-forest-labs/FLUX.1-Krea-dev,black-forest-labs/FLUX.1-dev,runwayml/flux-dev - Stability AI:
stable-diffusion-xl-1024-v1-0,stable-diffusion-xl-base-1.0
Dimension Limits:
- WaveSpeed Models: Max 1024x1024
- Other Models: Max 2048x2048
- Frontend Validation: Clamps dimensions and shows errors
Prompt Optimization:
- Backend Endpoint:
/api/images/suggest-prompts - Blog-Optimized: Focuses on data visualization, infographics, text overlay areas
- Context-Aware: Uses title, section, research, persona for better prompts
4. Common Patterns & Best Practices
Provider Selection:
# Pattern 1: Auto-select (Podcast Maker)
options = {"provider": None} # Let _select_provider() decide
# Pattern 2: Explicit (Story Writer, Blog Writer)
options = {"provider": "wavespeed"} # User or service specifies
# Pattern 3: Model-based remapping (Blog Writer backend)
# Automatically remaps provider based on model name
Model Routing:
# Backend auto-detection (main_image_generation.py)
# Detects Wavespeed models and remaps provider
wavespeed_models = ["qwen-image", "ideogram-v3-turbo"]
if model_lower in wavespeed_models and provider_name != "wavespeed":
provider_name = "wavespeed"
Error Handling:
# Pattern 1: Re-raise HTTPExceptions (subscription limits)
except HTTPException:
raise
# Pattern 2: Wrap in RuntimeError (Story Writer)
except Exception as e:
raise RuntimeError(f"Failed to generate image: {str(e)}") from e
# Pattern 3: Return error in result (Story Writer batch)
image_results.append({
"error": str(e),
"image_url": None,
})
Subscription Validation:
# Pre-flight validation (Podcast Maker)
validate_image_generation_operations(
pricing_service=pricing_service,
user_id=user_id,
num_images=1
)
# Built-in validation (main_image_generation.py)
_validate_image_operation(
user_id=user_id,
operation_type="image-generation",
num_operations=1,
)
5. Key Differences
| Feature | Podcast Maker | Story Writer | Blog Writer |
|---|---|---|---|
| Provider Selection | Auto-select | Optional explicit | User selects |
| Model Selection | Auto (Character) or Auto-select | Optional explicit | User selects |
| Prompt Building | Custom podcast prompts | Pre-generated | User + optimization |
| Dimension Limits | No validation | No validation | Frontend validation |
| Error Handling | Strict (no fallback) | Resilient (continues) | User-friendly alerts |
| Cost Display | Estimated in response | Not shown | Shown in UI |
| Special Features | Character consistency | Batch processing | Prompt optimization |
6. Recommendations for Blog Writer
✅ Already Implemented:
- ✅ Provider/model selection UI
- ✅ Dimension validation
- ✅ Model-based provider remapping
- ✅ Cost information display
- ✅ Prompt optimization
🔄 Could Improve:
- Pre-flight Validation: Add subscription checks before API calls (like Podcast Maker)
- Error Messages: More specific error messages based on error type
- Batch Generation: Support generating multiple images for blog sections
- Progress Tracking: Show progress for multiple image generations
- Retry Logic: Automatic retry for transient failures
📝 Implementation Notes:
- Provider Routing: Backend correctly auto-detects Wavespeed models
- Dimension Limits: Frontend validation prevents invalid dimensions
- Cost Tracking: Handled by centralized
generate_image()function - Asset Library: Images are saved to asset library automatically
7. Model-Specific Details
WaveSpeed Models:
- qwen-image: $0.05/image, max 1024x1024, fast generation
- ideogram-v3-turbo: $0.10/image, max 1024x1024, superior text rendering
- ideogram-character: $0.10/image, character consistency (Podcast only)
HuggingFace Models:
- FLUX.1-Krea-dev: Photorealistic, optimized for blog images
- FLUX.1-dev: General purpose
- flux-dev: RunwayML variant
Stability AI Models:
- SDXL 1024: Professional quality, $0.04/image
- SDXL Base: Standard quality
8. Code References
Backend:
backend/services/llm_providers/main_image_generation.py- Core generation logicbackend/services/llm_providers/image_generation/wavespeed_provider.py- WaveSpeed implementationbackend/api/podcast/handlers/images.py- Podcast image generationbackend/services/story_writer/image_generation_service.py- Story Writer servicebackend/api/images.py- Blog Writer image API
Frontend:
frontend/src/components/ImageGen/ImageGenerator.tsx- Blog Writer componentfrontend/src/components/shared/ImageGenerationModal.tsx- Shared modal (Podcast/YouTube)frontend/src/components/StoryWriter/Phases/StoryOutlineParts/ImageEditModal.tsx- Story Writer UI
Summary
All three tools use the centralized generate_image() function but with different approaches:
- Podcast Maker: Specialized for character consistency, auto-selects providers
- Story Writer: Simple wrapper, batch processing, error resilient
- Blog Writer: User-controlled provider/model selection, frontend validation, prompt optimization
The Blog Writer implementation is the most user-friendly with explicit controls, while Podcast Maker focuses on specialized use cases and Story Writer prioritizes simplicity and batch operations.