831 lines
25 KiB
Markdown
831 lines
25 KiB
Markdown
# Story Writer Video Generation Enhancement Plan
|
||
|
||
---
|
||
|
||
## Current State Analysis
|
||
|
||
### Current Video Generation
|
||
- **Provider**: HuggingFace (tencent/HunyuanVideo via fal-ai)
|
||
- **Issues**:
|
||
- Unreliable API responses
|
||
- Limited quality control
|
||
- No audio synchronization
|
||
- Single provider dependency
|
||
- Poor error handling
|
||
|
||
### Current Audio Generation
|
||
- **Provider**: gTTS (Google Text-to-Speech)
|
||
- **Limitations**:
|
||
- Robotic, non-natural voice
|
||
- No brand voice consistency
|
||
- Limited language options
|
||
- No emotion control
|
||
- Cannot clone user's voice
|
||
|
||
### Current Story Writer Workflow
|
||
1. User creates story outline with scenes
|
||
2. Each scene has `audio_narration` text
|
||
3. Audio generated via gTTS per scene
|
||
4. Video generated via HuggingFace per scene
|
||
5. Videos compiled into final story video
|
||
|
||
**Location**: `backend/api/story_writer/` and `frontend/src/components/StoryWriter/`
|
||
|
||
---
|
||
|
||
## Proposed Enhancements
|
||
|
||
### Core Principles
|
||
|
||
**Provider Abstraction**:
|
||
- Users should NOT see provider names (HuggingFace, WaveSpeed, etc.)
|
||
- All provider routing/switching happens automatically in the background
|
||
- Users only see user-friendly options like "Standard Quality" or "Premium Quality"
|
||
- System automatically selects best available provider based on user's subscription and credits
|
||
|
||
**Preserve Existing Options**:
|
||
- gTTS remains available as free fallback when credits run out
|
||
- HuggingFace remains available as fallback option
|
||
- All existing functionality preserved
|
||
- New features are additions, not replacements
|
||
|
||
**Cost Transparency**:
|
||
- All buttons show cost information in tooltips
|
||
- Users make informed decisions before generating
|
||
- No surprise costs
|
||
|
||
---
|
||
|
||
### 1. Provider-Agnostic Video Generation System
|
||
|
||
#### 1.1 Smart Provider Routing
|
||
|
||
**Backend Implementation** (`backend/services/llm_providers/main_video_generation.py`):
|
||
|
||
```python
|
||
def ai_video_generate(
|
||
prompt: str,
|
||
quality: str = "standard", # "standard" (480p), "high" (720p), "premium" (1080p)
|
||
duration: int = 5,
|
||
audio_file_path: Optional[str] = None,
|
||
user_id: str,
|
||
**kwargs,
|
||
) -> bytes:
|
||
"""
|
||
Unified video generation entry point.
|
||
Automatically routes to best available provider:
|
||
- WaveSpeed WAN 2.5 (primary, if credits available)
|
||
- HuggingFace (fallback, if WaveSpeed unavailable)
|
||
|
||
Users never see provider names - only quality options.
|
||
"""
|
||
# 1. Check user subscription and credits
|
||
# 2. Select best available provider automatically
|
||
# 3. Route to appropriate provider function
|
||
# 4. Handle fallbacks transparently
|
||
pass
|
||
|
||
def _select_video_provider(
|
||
user_id: str,
|
||
quality: str,
|
||
pricing_service: PricingService,
|
||
) -> Tuple[str, str]:
|
||
"""
|
||
Automatically select best video provider.
|
||
Returns: (provider_name, model_name)
|
||
|
||
Selection logic:
|
||
1. Check user credits/subscription
|
||
2. Prefer WaveSpeed if available and credits sufficient
|
||
3. Fallback to HuggingFace if WaveSpeed unavailable
|
||
4. Return error if no providers available
|
||
"""
|
||
# Implementation details...
|
||
```
|
||
|
||
**Key Features**:
|
||
- Automatic provider selection (users don't choose)
|
||
- Seamless fallback between providers
|
||
- Quality-based options (Standard/High/Premium) instead of provider names
|
||
- Cost-aware routing (uses cheapest available option)
|
||
- Transparent error handling
|
||
|
||
**Quality Mapping**:
|
||
- **Standard Quality** (480p): $0.05/second - Uses WaveSpeed 480p or HuggingFace
|
||
- **High Quality** (720p): $0.10/second - Uses WaveSpeed 720p
|
||
- **Premium Quality** (1080p): $0.15/second - Uses WaveSpeed 1080p
|
||
|
||
**Cost Optimization**:
|
||
- Default to Standard Quality (480p) for cost-effectiveness
|
||
- Allow upgrade to High/Premium for final export
|
||
- Pre-flight validation prevents waste
|
||
- Automatic fallback to free options when credits exhausted
|
||
|
||
---
|
||
|
||
### 2. Enhanced Audio Generation with Voice Cloning
|
||
|
||
#### 2.1 User-Friendly Voice Selection
|
||
|
||
**Key Principle**: Users choose between "AI Clone Voice" or "Default Voice" (gTTS) - no provider names shown.
|
||
|
||
**Backend Implementation** (`backend/services/story_writer/audio_generation_service.py`):
|
||
|
||
```python
|
||
class StoryAudioGenerationService:
|
||
def generate_scene_audio(
|
||
self,
|
||
scene: Dict[str, Any],
|
||
user_id: str,
|
||
use_ai_voice: bool = False, # User's choice: AI Clone or Default
|
||
**kwargs,
|
||
) -> Dict[str, Any]:
|
||
"""
|
||
Generate audio with automatic provider selection.
|
||
|
||
If use_ai_voice=True:
|
||
- Try persona voice clone (if trained)
|
||
- Try Minimax voice clone (if credits available)
|
||
- Fallback to gTTS if no credits
|
||
|
||
If use_ai_voice=False:
|
||
- Use gTTS (always free, always available)
|
||
"""
|
||
if use_ai_voice:
|
||
# Try AI voice options
|
||
if self._has_persona_voice(user_id):
|
||
return self._generate_with_persona_voice(scene, user_id)
|
||
elif self._has_credits_for_voice_clone(user_id):
|
||
return self._generate_with_minimax_voice_clone(scene, user_id)
|
||
else:
|
||
# Fallback to gTTS with notification
|
||
logger.info(f"Credits exhausted, falling back to gTTS for user {user_id}")
|
||
return self._generate_with_gtts(scene, **kwargs)
|
||
else:
|
||
# User explicitly chose default voice
|
||
return self._generate_with_gtts(scene, **kwargs)
|
||
```
|
||
|
||
**Voice Options in Story Setup**:
|
||
- **Default Voice (gTTS)**: Free, always available, robotic but functional
|
||
- **AI Clone Voice**: Natural, human-like, requires credits ($0.02/minute)
|
||
|
||
**Cost Considerations**:
|
||
- Voice training: One-time cost (~$0.75) - only if user wants to train custom voice
|
||
- Voice generation: ~$0.02 per minute (only when AI Clone Voice selected)
|
||
- gTTS: Always free, always available as fallback
|
||
- Automatic fallback to gTTS when credits exhausted (with user notification)
|
||
|
||
---
|
||
|
||
### 3. Enhanced Story Setup UI
|
||
|
||
#### 3.1 Video Generation Settings (Provider-Agnostic)
|
||
|
||
**Location**: `frontend/src/components/StoryWriter/Phases/StorySetup/GenerationSettingsSection.tsx`
|
||
|
||
**User-Friendly Settings** (No Provider Names):
|
||
```typescript
|
||
interface VideoGenerationSettings {
|
||
// Quality selection (NOT provider selection)
|
||
videoQuality: 'standard' | 'high' | 'premium'; // Maps to 480p/720p/1080p
|
||
|
||
// Duration
|
||
videoDuration: 5 | 10; // seconds
|
||
|
||
// Cost estimation (shown in tooltip)
|
||
estimatedCostPerScene: number;
|
||
totalEstimatedCost: number;
|
||
|
||
// Provider routing happens automatically in backend
|
||
// Users never see "WaveSpeed" or "HuggingFace"
|
||
}
|
||
```
|
||
|
||
**UI Components**:
|
||
- Quality selector: "Standard" / "High" / "Premium" (with cost in tooltip)
|
||
- Duration selector: 5s (default) / 10s (premium)
|
||
- Cost tooltip: Shows estimated cost per scene and total
|
||
- Pre-flight validation warnings
|
||
- **No provider selector** - routing is automatic
|
||
|
||
**Tooltip Example**:
|
||
```
|
||
Standard Quality (480p)
|
||
├─ Cost: $0.25 per scene (5 seconds)
|
||
├─ Quality: Good for previews and testing
|
||
└─ Provider: Automatically selected based on credits
|
||
```
|
||
|
||
#### 3.2 Audio Generation Settings (Simple Choice)
|
||
|
||
**New Settings**:
|
||
```typescript
|
||
interface AudioGenerationSettings {
|
||
// Simple user choice - no provider names
|
||
voiceType: 'default' | 'ai_clone'; // "Default Voice" or "AI Clone Voice"
|
||
|
||
// Only shown if ai_clone selected
|
||
voiceTrainingStatus: 'not_trained' | 'training' | 'ready' | 'failed';
|
||
|
||
// Existing gTTS settings (preserved)
|
||
audioLang: string;
|
||
audioSlow: boolean;
|
||
audioRate: number;
|
||
}
|
||
```
|
||
|
||
**UI Components**:
|
||
- **Voice Type Selector**:
|
||
- "Default Voice (gTTS)" - Free, always available
|
||
- "AI Clone Voice" - Natural, $0.02/minute (with cost tooltip)
|
||
- Voice training section (only if AI Clone Voice selected)
|
||
- Existing gTTS settings (preserved for Default Voice)
|
||
- Cost per minute display in tooltip
|
||
|
||
**Tooltip for "AI Clone Voice"**:
|
||
```
|
||
AI Clone Voice
|
||
├─ Cost: $0.02 per minute
|
||
├─ Quality: Natural, human-like narration
|
||
├─ Fallback: Automatically uses Default Voice if credits exhausted
|
||
└─ Training: One-time $0.75 to train your custom voice (optional)
|
||
```
|
||
|
||
**Tooltip for "Default Voice"**:
|
||
```
|
||
Default Voice (gTTS)
|
||
├─ Cost: Free
|
||
├─ Quality: Standard text-to-speech
|
||
└─ Always Available: Works even when credits exhausted
|
||
```
|
||
|
||
---
|
||
|
||
### 4. New "Animate Scene" Feature in Outline Phase
|
||
|
||
#### 4.1 Per-Scene Animation Preview
|
||
|
||
**Location**: `frontend/src/components/StoryWriter/Phases/StoryOutline.tsx`
|
||
|
||
**Feature**: Add "Animate Scene" hover option alongside existing scene actions
|
||
|
||
**Implementation**:
|
||
- Add to `OutlineHoverActions` component
|
||
- Appears on hover over scene cards
|
||
- Only generates for single scene (never bulk)
|
||
- Uses cheapest option (480p/Standard Quality) to give users a feel
|
||
- Shows cost in tooltip before generation
|
||
|
||
**UI Component**:
|
||
```typescript
|
||
// In OutlineHoverActions.tsx
|
||
const sceneHoverActions = [
|
||
// Existing actions...
|
||
{
|
||
icon: <PlayArrowIcon />,
|
||
label: 'Animate Scene',
|
||
action: 'animate-scene',
|
||
tooltip: `Animate this scene with video\nCost: ~$0.25 (5 seconds, Standard Quality)\nPreview only - uses cheapest option`,
|
||
onClick: handleAnimateScene,
|
||
},
|
||
];
|
||
```
|
||
|
||
**Backend Endpoint**:
|
||
```python
|
||
@router.post("/animate-scene-preview")
|
||
async def animate_scene_preview(
|
||
request: SceneAnimationRequest,
|
||
current_user: Dict[str, Any] = Depends(get_current_user),
|
||
) -> SceneAnimationResponse:
|
||
"""
|
||
Generate preview animation for a single scene.
|
||
Always uses cheapest option (480p/Standard Quality).
|
||
Per-scene only - never bulk generation.
|
||
"""
|
||
# 1. Validate single scene only
|
||
# 2. Use Standard Quality (480p) - cheapest option
|
||
# 3. Generate video with automatic provider routing
|
||
# 4. Return preview video URL
|
||
pass
|
||
```
|
||
|
||
**Cost Management**:
|
||
- Always uses Standard Quality (480p) - $0.25 per scene
|
||
- Pre-flight validation before generation
|
||
- Clear cost display in tooltip
|
||
- Per-scene only prevents bulk waste
|
||
|
||
---
|
||
|
||
### 5. New "Animate Story with VoiceOver" Button in Writing Phase
|
||
|
||
#### 5.1 Complete Story Animation
|
||
|
||
**Location**: `frontend/src/components/StoryWriter/Phases/StoryWriting.tsx`
|
||
|
||
**Feature**: New button alongside existing HuggingFace video options
|
||
|
||
**Implementation**:
|
||
- Add button in Writing phase toolbar
|
||
- Generates complete animated story with synchronized voiceover
|
||
- Uses user's voice preference from Setup (AI Clone or Default)
|
||
- Shows comprehensive cost breakdown in tooltip
|
||
- Pre-flight validation before generation
|
||
|
||
**UI Component**:
|
||
```typescript
|
||
<Button
|
||
variant="contained"
|
||
startIcon={<SmartDisplayIcon />}
|
||
onClick={handleAnimateStoryWithVoiceOver}
|
||
disabled={!state.storyContent || isGenerating}
|
||
title={`Animate Story with VoiceOver\n\nCost Breakdown:\n- Video: $${videoCost} (${scenes.length} scenes × $${costPerScene})\n- Audio: $${audioCost} (${totalAudioMinutes} minutes)\n- Total: $${totalCost}\n\nQuality: ${state.videoQuality}\nVoice: ${state.voiceType === 'ai_clone' ? 'AI Clone' : 'Default'}`}
|
||
>
|
||
Animate Story with VoiceOver
|
||
</Button>
|
||
```
|
||
|
||
**Backend Endpoint**:
|
||
```python
|
||
@router.post("/animate-story-with-voiceover")
|
||
async def animate_story_with_voiceover(
|
||
request: StoryAnimationRequest,
|
||
current_user: Dict[str, Any] = Depends(get_current_user),
|
||
) -> StoryAnimationResponse:
|
||
"""
|
||
Generate complete animated story with synchronized voiceover.
|
||
Uses user's quality and voice preferences from Setup.
|
||
"""
|
||
# 1. Pre-flight validation (cost, credits, limits)
|
||
# 2. Generate audio for all scenes (using user's voice preference)
|
||
# 3. Generate videos for all scenes (using user's quality preference)
|
||
# 4. Synchronize audio with video
|
||
# 5. Compile into final story video
|
||
# 6. Return video URL and cost breakdown
|
||
pass
|
||
```
|
||
|
||
**Cost Tooltip Example**:
|
||
```
|
||
Animate Story with VoiceOver
|
||
|
||
Cost Breakdown:
|
||
├─ Video (Standard Quality): $2.50
|
||
│ └─ 10 scenes × $0.25 per scene
|
||
├─ Audio (AI Clone Voice): $1.00
|
||
│ └─ 50 minutes total × $0.02/minute
|
||
└─ Total: $3.50
|
||
|
||
Settings:
|
||
├─ Quality: Standard (480p)
|
||
├─ Voice: AI Clone Voice
|
||
└─ Duration: 5 seconds per scene
|
||
|
||
⚠️ This will use $3.50 of your monthly credits
|
||
```
|
||
|
||
---
|
||
|
||
## Implementation Phases
|
||
|
||
### Phase 1: Provider-Agnostic Video System (Week 1-2)
|
||
|
||
**Priority**: HIGH - Solves immediate HuggingFace issues with provider abstraction
|
||
|
||
**Tasks**:
|
||
1. ✅ Create WaveSpeed API client (`backend/services/wavespeed/client.py`)
|
||
2. ✅ Add WAN 2.5 text-to-video function
|
||
3. ✅ Implement smart provider routing in `main_video_generation.py`
|
||
4. ✅ Add quality-based selection (Standard/High/Premium)
|
||
5. ✅ Preserve HuggingFace as fallback option
|
||
6. ✅ Update `hd_video.py` with provider routing
|
||
7. ✅ Add pre-flight cost validation
|
||
8. ✅ Update frontend with quality selector (remove provider names)
|
||
9. ✅ Add cost tooltips to all buttons
|
||
10. ✅ Update subscription limits
|
||
11. ✅ Testing and error handling
|
||
|
||
**Files to Modify**:
|
||
- `backend/services/llm_providers/main_video_generation.py` (add routing logic)
|
||
- `backend/api/story_writer/utils/hd_video.py` (use quality-based API)
|
||
- `backend/api/story_writer/routes/video_generation.py`
|
||
- `frontend/src/components/StoryWriter/Phases/StorySetup/GenerationSettingsSection.tsx` (quality selector)
|
||
- `frontend/src/components/StoryWriter/components/HdVideoSection.tsx`
|
||
- `backend/services/subscription/pricing_service.py`
|
||
|
||
**Success Criteria**:
|
||
- Video generation works reliably with automatic provider routing
|
||
- Users see quality options, not provider names
|
||
- HuggingFace preserved as fallback
|
||
- Cost tracking accurate
|
||
- Pre-flight validation prevents waste
|
||
- Error messages clear and actionable
|
||
|
||
---
|
||
|
||
### Phase 2: Voice Cloning Integration (Week 3-4)
|
||
|
||
**Priority**: MEDIUM - Enhances audio quality with simple user choice
|
||
|
||
**Tasks**:
|
||
1. ✅ Create Minimax API client (`backend/services/minimax/voice_clone.py`)
|
||
2. ✅ Add voice training endpoint
|
||
3. ✅ Add voice generation endpoint
|
||
4. ✅ Update `audio_generation_service.py` with "AI Clone" vs "Default" logic
|
||
5. ✅ Preserve gTTS as always-available fallback
|
||
6. ✅ Add automatic fallback when credits exhausted
|
||
7. ✅ Update Story Setup with simple voice type selector
|
||
8. ✅ Add cost tooltips to voice options
|
||
9. ✅ Add voice preview and testing (if AI Clone selected)
|
||
10. ✅ Ensure gTTS always works even when credits exhausted
|
||
|
||
**Files to Create**:
|
||
- `backend/services/minimax/voice_clone.py`
|
||
- `backend/services/story_writer/voice_management_service.py`
|
||
|
||
**Files to Modify**:
|
||
- `backend/services/story_writer/audio_generation_service.py` (add voice type logic)
|
||
- `frontend/src/components/StoryWriter/Phases/StorySetup/GenerationSettingsSection.tsx` (voice type selector)
|
||
- `backend/models/story_models.py` (add voice type field)
|
||
|
||
**Success Criteria**:
|
||
- Users see simple choice: "Default Voice" or "AI Clone Voice"
|
||
- gTTS always available as fallback
|
||
- Automatic fallback when credits exhausted
|
||
- Cost tracking accurate
|
||
- Voice quality significantly better than gTTS when AI Clone used
|
||
|
||
---
|
||
|
||
### Phase 3: New Features - Animate Scene & Animate Story (Week 5-6)
|
||
|
||
**Priority**: MEDIUM - Add preview and complete animation features
|
||
|
||
**Tasks**:
|
||
1. ✅ Add "Animate Scene" hover option in Outline phase
|
||
2. ✅ Implement per-scene animation preview (cheapest option only)
|
||
3. ✅ Add "Animate Story with VoiceOver" button in Writing phase
|
||
4. ✅ Implement complete story animation with voiceover
|
||
5. ✅ Add comprehensive cost tooltips to all buttons
|
||
6. ✅ Add pre-flight validation for all animation features
|
||
7. ✅ Ensure per-scene only (no bulk generation in Outline)
|
||
8. ✅ Update documentation
|
||
9. ✅ User testing and feedback
|
||
|
||
**Files to Create**:
|
||
- `backend/api/story_writer/routes/scene_animation.py` (new endpoint)
|
||
- `frontend/src/components/StoryWriter/components/AnimateSceneButton.tsx`
|
||
|
||
**Files to Modify**:
|
||
- `frontend/src/components/StoryWriter/Phases/StoryOutlineParts/OutlineHoverActions.tsx` (add Animate Scene)
|
||
- `frontend/src/components/StoryWriter/Phases/StoryWriting.tsx` (add Animate Story button)
|
||
- `backend/api/story_writer/routes/video_generation.py` (add story animation endpoint)
|
||
|
||
**Success Criteria**:
|
||
- "Animate Scene" works in Outline (per-scene, cheapest option)
|
||
- "Animate Story with VoiceOver" works in Writing phase
|
||
- All buttons show cost in tooltips
|
||
- Pre-flight validation prevents waste
|
||
- Good user experience
|
||
|
||
---
|
||
|
||
### Phase 4: Integration & Optimization (Week 7-8)
|
||
|
||
**Priority**: MEDIUM - Polish and optimize
|
||
|
||
**Tasks**:
|
||
1. ✅ Integrate audio with video (synchronized videos)
|
||
2. ✅ Improve error handling and retry logic
|
||
3. ✅ Add progress indicators
|
||
4. ✅ Optimize cost calculations
|
||
5. ✅ Add usage analytics
|
||
6. ✅ Update documentation
|
||
7. ✅ User testing and feedback
|
||
|
||
**Success Criteria**:
|
||
- Smooth end-to-end workflow
|
||
- Cost-effective for users
|
||
- Reliable generation
|
||
- Excellent user experience
|
||
- All features work seamlessly together
|
||
|
||
---
|
||
|
||
## Cost Management & Prevention of Waste
|
||
|
||
### Pre-Flight Validation
|
||
|
||
**Implementation**: `backend/services/subscription/preflight_validator.py`
|
||
|
||
**Checks Before Generation**:
|
||
1. User has sufficient subscription tier
|
||
2. Estimated cost within monthly budget
|
||
3. Video generation limit not exceeded
|
||
4. Audio generation limit not exceeded
|
||
5. Total story cost reasonable (<$5 for typical story)
|
||
|
||
**Validation Flow**:
|
||
```python
|
||
def validate_story_generation(
|
||
pricing_service: PricingService,
|
||
user_id: str,
|
||
num_scenes: int,
|
||
video_resolution: str,
|
||
video_duration: int,
|
||
use_voice_clone: bool,
|
||
) -> Tuple[bool, str, Dict[str, Any]]:
|
||
"""
|
||
Pre-flight validation before story generation.
|
||
Returns: (allowed, message, cost_breakdown)
|
||
"""
|
||
# Calculate estimated costs
|
||
video_cost_per_scene = get_wavespeed_cost(video_resolution, video_duration)
|
||
audio_cost_per_scene = get_voice_clone_cost() if use_voice_clone else 0.0
|
||
|
||
total_estimated_cost = (video_cost_per_scene + audio_cost_per_scene) * num_scenes
|
||
|
||
# Check limits
|
||
limits = pricing_service.get_user_limits(user_id)
|
||
current_usage = pricing_service.get_current_usage(user_id)
|
||
|
||
# Validation logic...
|
||
return (allowed, message, cost_breakdown)
|
||
```
|
||
|
||
### Cost Estimation Display
|
||
|
||
**Frontend Implementation**:
|
||
- Real-time cost calculator in Story Setup
|
||
- Per-scene cost breakdown
|
||
- Total story cost estimate
|
||
- Monthly budget remaining
|
||
- Warning if approaching limits
|
||
|
||
**UI Example**:
|
||
```
|
||
Video Generation Cost Estimate:
|
||
├─ Resolution: 720p ($0.10/second)
|
||
├─ Duration: 5 seconds per scene
|
||
├─ Scenes: 10
|
||
└─ Total: $5.00
|
||
|
||
Audio Generation Cost Estimate:
|
||
├─ Provider: Voice Clone ($0.02/minute)
|
||
├─ Average: 30 seconds per scene
|
||
├─ Scenes: 10
|
||
└─ Total: $1.00
|
||
|
||
Total Estimated Cost: $6.00
|
||
Monthly Budget Remaining: $44.00
|
||
```
|
||
|
||
### Usage Tracking
|
||
|
||
**Enhanced Tracking**:
|
||
- Track video generation per scene
|
||
- Track audio generation per scene
|
||
- Track total story cost
|
||
- Alert users approaching limits
|
||
- Provide cost breakdown in analytics
|
||
|
||
---
|
||
|
||
## Pricing Integration
|
||
|
||
### WaveSpeed WAN 2.5 Pricing
|
||
|
||
**Add to `pricing_service.py`**:
|
||
```python
|
||
# WaveSpeed WAN 2.5 Text-to-Video
|
||
{
|
||
"provider": APIProvider.VIDEO, # Or new WAVESPEED provider
|
||
"model_name": "wan-2.5-480p",
|
||
"cost_per_second": 0.05,
|
||
"description": "WaveSpeed WAN 2.5 Text-to-Video (480p)"
|
||
},
|
||
{
|
||
"provider": APIProvider.VIDEO,
|
||
"model_name": "wan-2.5-720p",
|
||
"cost_per_second": 0.10,
|
||
"description": "WaveSpeed WAN 2.5 Text-to-Video (720p)"
|
||
},
|
||
{
|
||
"provider": APIProvider.VIDEO,
|
||
"model_name": "wan-2.5-1080p",
|
||
"cost_per_second": 0.15,
|
||
"description": "WaveSpeed WAN 2.5 Text-to-Video (1080p)"
|
||
}
|
||
```
|
||
|
||
### Minimax Voice Clone Pricing
|
||
|
||
**Add to `pricing_service.py`**:
|
||
```python
|
||
# Minimax Voice Clone
|
||
{
|
||
"provider": APIProvider.AUDIO, # New provider type
|
||
"model_name": "minimax-voice-clone-train",
|
||
"cost_per_request": 0.75, # One-time training cost
|
||
"description": "Minimax Voice Clone Training"
|
||
},
|
||
{
|
||
"provider": APIProvider.AUDIO,
|
||
"model_name": "minimax-voice-clone-generate",
|
||
"cost_per_minute": 0.02, # Per minute of generated audio
|
||
"description": "Minimax Voice Clone Generation"
|
||
}
|
||
```
|
||
|
||
### Subscription Tier Limits
|
||
|
||
**Update subscription limits**:
|
||
- **Free**: 3 stories/month, 480p only, gTTS only
|
||
- **Basic**: 10 stories/month, up to 720p, voice clone available
|
||
- **Pro**: 50 stories/month, up to 1080p, voice clone included
|
||
- **Enterprise**: Unlimited, all features
|
||
|
||
---
|
||
|
||
## Technical Architecture
|
||
|
||
### Backend Services
|
||
|
||
```
|
||
backend/services/
|
||
├── wavespeed/
|
||
│ ├── __init__.py
|
||
│ ├── client.py # WaveSpeed API client
|
||
│ ├── wan25_video.py # WAN 2.5 video generation
|
||
│ └── models.py # Request/response models
|
||
├── minimax/
|
||
│ ├── __init__.py
|
||
│ ├── client.py # Minimax API client
|
||
│ ├── voice_clone.py # Voice cloning service
|
||
│ └── models.py
|
||
└── story_writer/
|
||
├── audio_generation_service.py # Updated with voice clone
|
||
└── video_generation_service.py # Updated with WaveSpeed
|
||
```
|
||
|
||
### Frontend Components
|
||
|
||
```
|
||
frontend/src/components/StoryWriter/
|
||
├── Phases/StorySetup/
|
||
│ └── GenerationSettingsSection.tsx # Enhanced with new settings
|
||
├── components/
|
||
│ ├── HdVideoSection.tsx # Updated for WaveSpeed
|
||
│ ├── VoiceTrainingSection.tsx # NEW: Voice training UI
|
||
│ └── CostEstimationDisplay.tsx # NEW: Cost calculator
|
||
└── hooks/
|
||
└── useStoryGenerationCost.ts # NEW: Cost calculation hook
|
||
```
|
||
|
||
---
|
||
|
||
## Error Handling & User Experience
|
||
|
||
### Error Scenarios
|
||
|
||
1. **WaveSpeed API Failure**:
|
||
- Retry with exponential backoff (3 attempts)
|
||
- Fallback to HuggingFace if available
|
||
- Clear error message with cost refund notice
|
||
|
||
2. **Voice Clone Training Failure**:
|
||
- Provide specific error (audio quality, length, format)
|
||
- Suggest improvements
|
||
- Allow retry with different audio
|
||
|
||
3. **Cost Limit Exceeded**:
|
||
- Pre-flight validation prevents this
|
||
- Show upgrade prompt
|
||
- Suggest reducing scenes/resolution
|
||
|
||
4. **Audio/Video Mismatch**:
|
||
- Validate audio length matches video duration
|
||
- Auto-trim or extend audio
|
||
- Warn user before generation
|
||
|
||
### User Feedback
|
||
|
||
- Progress indicators for all operations
|
||
- Clear cost breakdowns
|
||
- Quality previews before final generation
|
||
- Regeneration options with cost tracking
|
||
- Usage analytics dashboard
|
||
|
||
---
|
||
|
||
## Testing Plan
|
||
|
||
### Unit Tests
|
||
- WaveSpeed API client
|
||
- Voice clone service
|
||
- Cost calculation
|
||
- Pre-flight validation
|
||
|
||
### Integration Tests
|
||
- End-to-end story generation
|
||
- Audio + video synchronization
|
||
- Error handling and fallbacks
|
||
- Subscription limit enforcement
|
||
|
||
### User Acceptance Tests
|
||
- Story generation workflow
|
||
- Voice training process
|
||
- Cost estimation accuracy
|
||
- Error recovery
|
||
|
||
---
|
||
|
||
## Success Metrics
|
||
|
||
### Technical Metrics
|
||
- Video generation success rate >95%
|
||
- Audio generation success rate >98%
|
||
- Average generation time per scene <30s
|
||
- API error rate <2%
|
||
|
||
### Business Metrics
|
||
- User satisfaction with video quality
|
||
- Cost per story (target: <$5 for 10-scene story)
|
||
- Voice clone adoption rate
|
||
- Story completion rate
|
||
|
||
### User Experience Metrics
|
||
- Time to generate story
|
||
- Error recovery time
|
||
- User understanding of costs
|
||
- Feature discovery rate
|
||
|
||
---
|
||
|
||
## Provider Management Strategy
|
||
|
||
### Always-Available Options
|
||
- **gTTS**: Always available, always free, works even when credits exhausted
|
||
- **HuggingFace**: Preserved as fallback option, works when WaveSpeed unavailable
|
||
|
||
### Automatic Provider Routing
|
||
- **Primary**: WaveSpeed WAN 2.5 (when credits available)
|
||
- **Fallback**: HuggingFace (when WaveSpeed unavailable or credits exhausted)
|
||
- **Audio Fallback**: gTTS (always available, always free)
|
||
|
||
### User Experience
|
||
- Users never see provider names
|
||
- System automatically selects best available option
|
||
- Seamless fallback when credits exhausted
|
||
- Clear notifications when fallback occurs
|
||
- No user intervention required
|
||
|
||
### No Deprecation
|
||
- **HuggingFace**: Kept as permanent fallback option
|
||
- **gTTS**: Kept as permanent free option
|
||
- All existing functionality preserved
|
||
- New features are additions, not replacements
|
||
|
||
---
|
||
|
||
## Next Steps
|
||
|
||
1. **Week 1**: Set up WaveSpeed API access and credentials
|
||
2. **Week 1**: Implement provider-agnostic routing system
|
||
3. **Week 2**: Integrate into Story Writer with quality-based UI
|
||
4. **Week 3**: Implement voice cloning with simple "AI Clone" vs "Default" choice
|
||
5. **Week 4**: Add voice training UI (only if AI Clone selected)
|
||
6. **Week 5**: Add "Animate Scene" hover option in Outline
|
||
7. **Week 6**: Add "Animate Story with VoiceOver" button in Writing
|
||
8. **Week 7-8**: Testing, optimization, and polish
|
||
|
||
## Key Design Principles
|
||
|
||
1. **Provider Abstraction**: Users never see provider names - only quality/voice options
|
||
2. **Preserve Existing**: gTTS and HuggingFace remain available as fallbacks
|
||
3. **Cost Transparency**: All buttons show costs in tooltips
|
||
4. **Automatic Fallback**: System automatically uses free options when credits exhausted
|
||
5. **Per-Scene Only**: Outline phase only allows per-scene generation (no bulk)
|
||
6. **User-Friendly**: Simple choices like "Standard Quality" not "WaveSpeed 480p"
|
||
|
||
---
|
||
|
||
## Risk Mitigation
|
||
|
||
| Risk | Mitigation |
|
||
|------|------------|
|
||
| WaveSpeed API changes | Version pinning, abstraction layer |
|
||
| Cost overruns | Strict pre-flight validation |
|
||
| Voice quality issues | Quality checks, fallback options |
|
||
| User confusion | Clear UI, tooltips, documentation |
|
||
| Integration complexity | Phased rollout, extensive testing |
|
||
|
||
---
|
||
|
||
*Document Version: 1.0*
|
||
*Last Updated: January 2025*
|
||
*Priority: HIGH - Immediate Implementation*
|
||
|