AI Image Studio, AI podcast Maker, AI product Marketing
This commit is contained in:
830
docs/story writer/STORY_WRITER_VIDEO_ENHANCEMENT.md
Normal file
830
docs/story writer/STORY_WRITER_VIDEO_ENHANCEMENT.md
Normal file
@@ -0,0 +1,830 @@
|
||||
# Story Writer Video Generation Enhancement Plan
|
||||
|
||||
---
|
||||
|
||||
## Current State Analysis
|
||||
|
||||
### Current Video Generation
|
||||
- **Provider**: HuggingFace (tencent/HunyuanVideo via fal-ai)
|
||||
- **Issues**:
|
||||
- Unreliable API responses
|
||||
- Limited quality control
|
||||
- No audio synchronization
|
||||
- Single provider dependency
|
||||
- Poor error handling
|
||||
|
||||
### Current Audio Generation
|
||||
- **Provider**: gTTS (Google Text-to-Speech)
|
||||
- **Limitations**:
|
||||
- Robotic, non-natural voice
|
||||
- No brand voice consistency
|
||||
- Limited language options
|
||||
- No emotion control
|
||||
- Cannot clone user's voice
|
||||
|
||||
### Current Story Writer Workflow
|
||||
1. User creates story outline with scenes
|
||||
2. Each scene has `audio_narration` text
|
||||
3. Audio generated via gTTS per scene
|
||||
4. Video generated via HuggingFace per scene
|
||||
5. Videos compiled into final story video
|
||||
|
||||
**Location**: `backend/api/story_writer/` and `frontend/src/components/StoryWriter/`
|
||||
|
||||
---
|
||||
|
||||
## Proposed Enhancements
|
||||
|
||||
### Core Principles
|
||||
|
||||
**Provider Abstraction**:
|
||||
- Users should NOT see provider names (HuggingFace, WaveSpeed, etc.)
|
||||
- All provider routing/switching happens automatically in the background
|
||||
- Users only see user-friendly options like "Standard Quality" or "Premium Quality"
|
||||
- System automatically selects best available provider based on user's subscription and credits
|
||||
|
||||
**Preserve Existing Options**:
|
||||
- gTTS remains available as free fallback when credits run out
|
||||
- HuggingFace remains available as fallback option
|
||||
- All existing functionality preserved
|
||||
- New features are additions, not replacements
|
||||
|
||||
**Cost Transparency**:
|
||||
- All buttons show cost information in tooltips
|
||||
- Users make informed decisions before generating
|
||||
- No surprise costs
|
||||
|
||||
---
|
||||
|
||||
### 1. Provider-Agnostic Video Generation System
|
||||
|
||||
#### 1.1 Smart Provider Routing
|
||||
|
||||
**Backend Implementation** (`backend/services/llm_providers/main_video_generation.py`):
|
||||
|
||||
```python
|
||||
def ai_video_generate(
|
||||
prompt: str,
|
||||
quality: str = "standard", # "standard" (480p), "high" (720p), "premium" (1080p)
|
||||
duration: int = 5,
|
||||
audio_file_path: Optional[str] = None,
|
||||
user_id: str,
|
||||
**kwargs,
|
||||
) -> bytes:
|
||||
"""
|
||||
Unified video generation entry point.
|
||||
Automatically routes to best available provider:
|
||||
- WaveSpeed WAN 2.5 (primary, if credits available)
|
||||
- HuggingFace (fallback, if WaveSpeed unavailable)
|
||||
|
||||
Users never see provider names - only quality options.
|
||||
"""
|
||||
# 1. Check user subscription and credits
|
||||
# 2. Select best available provider automatically
|
||||
# 3. Route to appropriate provider function
|
||||
# 4. Handle fallbacks transparently
|
||||
pass
|
||||
|
||||
def _select_video_provider(
|
||||
user_id: str,
|
||||
quality: str,
|
||||
pricing_service: PricingService,
|
||||
) -> Tuple[str, str]:
|
||||
"""
|
||||
Automatically select best video provider.
|
||||
Returns: (provider_name, model_name)
|
||||
|
||||
Selection logic:
|
||||
1. Check user credits/subscription
|
||||
2. Prefer WaveSpeed if available and credits sufficient
|
||||
3. Fallback to HuggingFace if WaveSpeed unavailable
|
||||
4. Return error if no providers available
|
||||
"""
|
||||
# Implementation details...
|
||||
```
|
||||
|
||||
**Key Features**:
|
||||
- Automatic provider selection (users don't choose)
|
||||
- Seamless fallback between providers
|
||||
- Quality-based options (Standard/High/Premium) instead of provider names
|
||||
- Cost-aware routing (uses cheapest available option)
|
||||
- Transparent error handling
|
||||
|
||||
**Quality Mapping**:
|
||||
- **Standard Quality** (480p): $0.05/second - Uses WaveSpeed 480p or HuggingFace
|
||||
- **High Quality** (720p): $0.10/second - Uses WaveSpeed 720p
|
||||
- **Premium Quality** (1080p): $0.15/second - Uses WaveSpeed 1080p
|
||||
|
||||
**Cost Optimization**:
|
||||
- Default to Standard Quality (480p) for cost-effectiveness
|
||||
- Allow upgrade to High/Premium for final export
|
||||
- Pre-flight validation prevents waste
|
||||
- Automatic fallback to free options when credits exhausted
|
||||
|
||||
---
|
||||
|
||||
### 2. Enhanced Audio Generation with Voice Cloning
|
||||
|
||||
#### 2.1 User-Friendly Voice Selection
|
||||
|
||||
**Key Principle**: Users choose between "AI Clone Voice" or "Default Voice" (gTTS) - no provider names shown.
|
||||
|
||||
**Backend Implementation** (`backend/services/story_writer/audio_generation_service.py`):
|
||||
|
||||
```python
|
||||
class StoryAudioGenerationService:
|
||||
def generate_scene_audio(
|
||||
self,
|
||||
scene: Dict[str, Any],
|
||||
user_id: str,
|
||||
use_ai_voice: bool = False, # User's choice: AI Clone or Default
|
||||
**kwargs,
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Generate audio with automatic provider selection.
|
||||
|
||||
If use_ai_voice=True:
|
||||
- Try persona voice clone (if trained)
|
||||
- Try Minimax voice clone (if credits available)
|
||||
- Fallback to gTTS if no credits
|
||||
|
||||
If use_ai_voice=False:
|
||||
- Use gTTS (always free, always available)
|
||||
"""
|
||||
if use_ai_voice:
|
||||
# Try AI voice options
|
||||
if self._has_persona_voice(user_id):
|
||||
return self._generate_with_persona_voice(scene, user_id)
|
||||
elif self._has_credits_for_voice_clone(user_id):
|
||||
return self._generate_with_minimax_voice_clone(scene, user_id)
|
||||
else:
|
||||
# Fallback to gTTS with notification
|
||||
logger.info(f"Credits exhausted, falling back to gTTS for user {user_id}")
|
||||
return self._generate_with_gtts(scene, **kwargs)
|
||||
else:
|
||||
# User explicitly chose default voice
|
||||
return self._generate_with_gtts(scene, **kwargs)
|
||||
```
|
||||
|
||||
**Voice Options in Story Setup**:
|
||||
- **Default Voice (gTTS)**: Free, always available, robotic but functional
|
||||
- **AI Clone Voice**: Natural, human-like, requires credits ($0.02/minute)
|
||||
|
||||
**Cost Considerations**:
|
||||
- Voice training: One-time cost (~$0.75) - only if user wants to train custom voice
|
||||
- Voice generation: ~$0.02 per minute (only when AI Clone Voice selected)
|
||||
- gTTS: Always free, always available as fallback
|
||||
- Automatic fallback to gTTS when credits exhausted (with user notification)
|
||||
|
||||
---
|
||||
|
||||
### 3. Enhanced Story Setup UI
|
||||
|
||||
#### 3.1 Video Generation Settings (Provider-Agnostic)
|
||||
|
||||
**Location**: `frontend/src/components/StoryWriter/Phases/StorySetup/GenerationSettingsSection.tsx`
|
||||
|
||||
**User-Friendly Settings** (No Provider Names):
|
||||
```typescript
|
||||
interface VideoGenerationSettings {
|
||||
// Quality selection (NOT provider selection)
|
||||
videoQuality: 'standard' | 'high' | 'premium'; // Maps to 480p/720p/1080p
|
||||
|
||||
// Duration
|
||||
videoDuration: 5 | 10; // seconds
|
||||
|
||||
// Cost estimation (shown in tooltip)
|
||||
estimatedCostPerScene: number;
|
||||
totalEstimatedCost: number;
|
||||
|
||||
// Provider routing happens automatically in backend
|
||||
// Users never see "WaveSpeed" or "HuggingFace"
|
||||
}
|
||||
```
|
||||
|
||||
**UI Components**:
|
||||
- Quality selector: "Standard" / "High" / "Premium" (with cost in tooltip)
|
||||
- Duration selector: 5s (default) / 10s (premium)
|
||||
- Cost tooltip: Shows estimated cost per scene and total
|
||||
- Pre-flight validation warnings
|
||||
- **No provider selector** - routing is automatic
|
||||
|
||||
**Tooltip Example**:
|
||||
```
|
||||
Standard Quality (480p)
|
||||
├─ Cost: $0.25 per scene (5 seconds)
|
||||
├─ Quality: Good for previews and testing
|
||||
└─ Provider: Automatically selected based on credits
|
||||
```
|
||||
|
||||
#### 3.2 Audio Generation Settings (Simple Choice)
|
||||
|
||||
**New Settings**:
|
||||
```typescript
|
||||
interface AudioGenerationSettings {
|
||||
// Simple user choice - no provider names
|
||||
voiceType: 'default' | 'ai_clone'; // "Default Voice" or "AI Clone Voice"
|
||||
|
||||
// Only shown if ai_clone selected
|
||||
voiceTrainingStatus: 'not_trained' | 'training' | 'ready' | 'failed';
|
||||
|
||||
// Existing gTTS settings (preserved)
|
||||
audioLang: string;
|
||||
audioSlow: boolean;
|
||||
audioRate: number;
|
||||
}
|
||||
```
|
||||
|
||||
**UI Components**:
|
||||
- **Voice Type Selector**:
|
||||
- "Default Voice (gTTS)" - Free, always available
|
||||
- "AI Clone Voice" - Natural, $0.02/minute (with cost tooltip)
|
||||
- Voice training section (only if AI Clone Voice selected)
|
||||
- Existing gTTS settings (preserved for Default Voice)
|
||||
- Cost per minute display in tooltip
|
||||
|
||||
**Tooltip for "AI Clone Voice"**:
|
||||
```
|
||||
AI Clone Voice
|
||||
├─ Cost: $0.02 per minute
|
||||
├─ Quality: Natural, human-like narration
|
||||
├─ Fallback: Automatically uses Default Voice if credits exhausted
|
||||
└─ Training: One-time $0.75 to train your custom voice (optional)
|
||||
```
|
||||
|
||||
**Tooltip for "Default Voice"**:
|
||||
```
|
||||
Default Voice (gTTS)
|
||||
├─ Cost: Free
|
||||
├─ Quality: Standard text-to-speech
|
||||
└─ Always Available: Works even when credits exhausted
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. New "Animate Scene" Feature in Outline Phase
|
||||
|
||||
#### 4.1 Per-Scene Animation Preview
|
||||
|
||||
**Location**: `frontend/src/components/StoryWriter/Phases/StoryOutline.tsx`
|
||||
|
||||
**Feature**: Add "Animate Scene" hover option alongside existing scene actions
|
||||
|
||||
**Implementation**:
|
||||
- Add to `OutlineHoverActions` component
|
||||
- Appears on hover over scene cards
|
||||
- Only generates for single scene (never bulk)
|
||||
- Uses cheapest option (480p/Standard Quality) to give users a feel
|
||||
- Shows cost in tooltip before generation
|
||||
|
||||
**UI Component**:
|
||||
```typescript
|
||||
// In OutlineHoverActions.tsx
|
||||
const sceneHoverActions = [
|
||||
// Existing actions...
|
||||
{
|
||||
icon: <PlayArrowIcon />,
|
||||
label: 'Animate Scene',
|
||||
action: 'animate-scene',
|
||||
tooltip: `Animate this scene with video\nCost: ~$0.25 (5 seconds, Standard Quality)\nPreview only - uses cheapest option`,
|
||||
onClick: handleAnimateScene,
|
||||
},
|
||||
];
|
||||
```
|
||||
|
||||
**Backend Endpoint**:
|
||||
```python
|
||||
@router.post("/animate-scene-preview")
|
||||
async def animate_scene_preview(
|
||||
request: SceneAnimationRequest,
|
||||
current_user: Dict[str, Any] = Depends(get_current_user),
|
||||
) -> SceneAnimationResponse:
|
||||
"""
|
||||
Generate preview animation for a single scene.
|
||||
Always uses cheapest option (480p/Standard Quality).
|
||||
Per-scene only - never bulk generation.
|
||||
"""
|
||||
# 1. Validate single scene only
|
||||
# 2. Use Standard Quality (480p) - cheapest option
|
||||
# 3. Generate video with automatic provider routing
|
||||
# 4. Return preview video URL
|
||||
pass
|
||||
```
|
||||
|
||||
**Cost Management**:
|
||||
- Always uses Standard Quality (480p) - $0.25 per scene
|
||||
- Pre-flight validation before generation
|
||||
- Clear cost display in tooltip
|
||||
- Per-scene only prevents bulk waste
|
||||
|
||||
---
|
||||
|
||||
### 5. New "Animate Story with VoiceOver" Button in Writing Phase
|
||||
|
||||
#### 5.1 Complete Story Animation
|
||||
|
||||
**Location**: `frontend/src/components/StoryWriter/Phases/StoryWriting.tsx`
|
||||
|
||||
**Feature**: New button alongside existing HuggingFace video options
|
||||
|
||||
**Implementation**:
|
||||
- Add button in Writing phase toolbar
|
||||
- Generates complete animated story with synchronized voiceover
|
||||
- Uses user's voice preference from Setup (AI Clone or Default)
|
||||
- Shows comprehensive cost breakdown in tooltip
|
||||
- Pre-flight validation before generation
|
||||
|
||||
**UI Component**:
|
||||
```typescript
|
||||
<Button
|
||||
variant="contained"
|
||||
startIcon={<SmartDisplayIcon />}
|
||||
onClick={handleAnimateStoryWithVoiceOver}
|
||||
disabled={!state.storyContent || isGenerating}
|
||||
title={`Animate Story with VoiceOver\n\nCost Breakdown:\n- Video: $${videoCost} (${scenes.length} scenes × $${costPerScene})\n- Audio: $${audioCost} (${totalAudioMinutes} minutes)\n- Total: $${totalCost}\n\nQuality: ${state.videoQuality}\nVoice: ${state.voiceType === 'ai_clone' ? 'AI Clone' : 'Default'}`}
|
||||
>
|
||||
Animate Story with VoiceOver
|
||||
</Button>
|
||||
```
|
||||
|
||||
**Backend Endpoint**:
|
||||
```python
|
||||
@router.post("/animate-story-with-voiceover")
|
||||
async def animate_story_with_voiceover(
|
||||
request: StoryAnimationRequest,
|
||||
current_user: Dict[str, Any] = Depends(get_current_user),
|
||||
) -> StoryAnimationResponse:
|
||||
"""
|
||||
Generate complete animated story with synchronized voiceover.
|
||||
Uses user's quality and voice preferences from Setup.
|
||||
"""
|
||||
# 1. Pre-flight validation (cost, credits, limits)
|
||||
# 2. Generate audio for all scenes (using user's voice preference)
|
||||
# 3. Generate videos for all scenes (using user's quality preference)
|
||||
# 4. Synchronize audio with video
|
||||
# 5. Compile into final story video
|
||||
# 6. Return video URL and cost breakdown
|
||||
pass
|
||||
```
|
||||
|
||||
**Cost Tooltip Example**:
|
||||
```
|
||||
Animate Story with VoiceOver
|
||||
|
||||
Cost Breakdown:
|
||||
├─ Video (Standard Quality): $2.50
|
||||
│ └─ 10 scenes × $0.25 per scene
|
||||
├─ Audio (AI Clone Voice): $1.00
|
||||
│ └─ 50 minutes total × $0.02/minute
|
||||
└─ Total: $3.50
|
||||
|
||||
Settings:
|
||||
├─ Quality: Standard (480p)
|
||||
├─ Voice: AI Clone Voice
|
||||
└─ Duration: 5 seconds per scene
|
||||
|
||||
⚠️ This will use $3.50 of your monthly credits
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: Provider-Agnostic Video System (Week 1-2)
|
||||
|
||||
**Priority**: HIGH - Solves immediate HuggingFace issues with provider abstraction
|
||||
|
||||
**Tasks**:
|
||||
1. ✅ Create WaveSpeed API client (`backend/services/wavespeed/client.py`)
|
||||
2. ✅ Add WAN 2.5 text-to-video function
|
||||
3. ✅ Implement smart provider routing in `main_video_generation.py`
|
||||
4. ✅ Add quality-based selection (Standard/High/Premium)
|
||||
5. ✅ Preserve HuggingFace as fallback option
|
||||
6. ✅ Update `hd_video.py` with provider routing
|
||||
7. ✅ Add pre-flight cost validation
|
||||
8. ✅ Update frontend with quality selector (remove provider names)
|
||||
9. ✅ Add cost tooltips to all buttons
|
||||
10. ✅ Update subscription limits
|
||||
11. ✅ Testing and error handling
|
||||
|
||||
**Files to Modify**:
|
||||
- `backend/services/llm_providers/main_video_generation.py` (add routing logic)
|
||||
- `backend/api/story_writer/utils/hd_video.py` (use quality-based API)
|
||||
- `backend/api/story_writer/routes/video_generation.py`
|
||||
- `frontend/src/components/StoryWriter/Phases/StorySetup/GenerationSettingsSection.tsx` (quality selector)
|
||||
- `frontend/src/components/StoryWriter/components/HdVideoSection.tsx`
|
||||
- `backend/services/subscription/pricing_service.py`
|
||||
|
||||
**Success Criteria**:
|
||||
- Video generation works reliably with automatic provider routing
|
||||
- Users see quality options, not provider names
|
||||
- HuggingFace preserved as fallback
|
||||
- Cost tracking accurate
|
||||
- Pre-flight validation prevents waste
|
||||
- Error messages clear and actionable
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Voice Cloning Integration (Week 3-4)
|
||||
|
||||
**Priority**: MEDIUM - Enhances audio quality with simple user choice
|
||||
|
||||
**Tasks**:
|
||||
1. ✅ Create Minimax API client (`backend/services/minimax/voice_clone.py`)
|
||||
2. ✅ Add voice training endpoint
|
||||
3. ✅ Add voice generation endpoint
|
||||
4. ✅ Update `audio_generation_service.py` with "AI Clone" vs "Default" logic
|
||||
5. ✅ Preserve gTTS as always-available fallback
|
||||
6. ✅ Add automatic fallback when credits exhausted
|
||||
7. ✅ Update Story Setup with simple voice type selector
|
||||
8. ✅ Add cost tooltips to voice options
|
||||
9. ✅ Add voice preview and testing (if AI Clone selected)
|
||||
10. ✅ Ensure gTTS always works even when credits exhausted
|
||||
|
||||
**Files to Create**:
|
||||
- `backend/services/minimax/voice_clone.py`
|
||||
- `backend/services/story_writer/voice_management_service.py`
|
||||
|
||||
**Files to Modify**:
|
||||
- `backend/services/story_writer/audio_generation_service.py` (add voice type logic)
|
||||
- `frontend/src/components/StoryWriter/Phases/StorySetup/GenerationSettingsSection.tsx` (voice type selector)
|
||||
- `backend/models/story_models.py` (add voice type field)
|
||||
|
||||
**Success Criteria**:
|
||||
- Users see simple choice: "Default Voice" or "AI Clone Voice"
|
||||
- gTTS always available as fallback
|
||||
- Automatic fallback when credits exhausted
|
||||
- Cost tracking accurate
|
||||
- Voice quality significantly better than gTTS when AI Clone used
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: New Features - Animate Scene & Animate Story (Week 5-6)
|
||||
|
||||
**Priority**: MEDIUM - Add preview and complete animation features
|
||||
|
||||
**Tasks**:
|
||||
1. ✅ Add "Animate Scene" hover option in Outline phase
|
||||
2. ✅ Implement per-scene animation preview (cheapest option only)
|
||||
3. ✅ Add "Animate Story with VoiceOver" button in Writing phase
|
||||
4. ✅ Implement complete story animation with voiceover
|
||||
5. ✅ Add comprehensive cost tooltips to all buttons
|
||||
6. ✅ Add pre-flight validation for all animation features
|
||||
7. ✅ Ensure per-scene only (no bulk generation in Outline)
|
||||
8. ✅ Update documentation
|
||||
9. ✅ User testing and feedback
|
||||
|
||||
**Files to Create**:
|
||||
- `backend/api/story_writer/routes/scene_animation.py` (new endpoint)
|
||||
- `frontend/src/components/StoryWriter/components/AnimateSceneButton.tsx`
|
||||
|
||||
**Files to Modify**:
|
||||
- `frontend/src/components/StoryWriter/Phases/StoryOutlineParts/OutlineHoverActions.tsx` (add Animate Scene)
|
||||
- `frontend/src/components/StoryWriter/Phases/StoryWriting.tsx` (add Animate Story button)
|
||||
- `backend/api/story_writer/routes/video_generation.py` (add story animation endpoint)
|
||||
|
||||
**Success Criteria**:
|
||||
- "Animate Scene" works in Outline (per-scene, cheapest option)
|
||||
- "Animate Story with VoiceOver" works in Writing phase
|
||||
- All buttons show cost in tooltips
|
||||
- Pre-flight validation prevents waste
|
||||
- Good user experience
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Integration & Optimization (Week 7-8)
|
||||
|
||||
**Priority**: MEDIUM - Polish and optimize
|
||||
|
||||
**Tasks**:
|
||||
1. ✅ Integrate audio with video (synchronized videos)
|
||||
2. ✅ Improve error handling and retry logic
|
||||
3. ✅ Add progress indicators
|
||||
4. ✅ Optimize cost calculations
|
||||
5. ✅ Add usage analytics
|
||||
6. ✅ Update documentation
|
||||
7. ✅ User testing and feedback
|
||||
|
||||
**Success Criteria**:
|
||||
- Smooth end-to-end workflow
|
||||
- Cost-effective for users
|
||||
- Reliable generation
|
||||
- Excellent user experience
|
||||
- All features work seamlessly together
|
||||
|
||||
---
|
||||
|
||||
## Cost Management & Prevention of Waste
|
||||
|
||||
### Pre-Flight Validation
|
||||
|
||||
**Implementation**: `backend/services/subscription/preflight_validator.py`
|
||||
|
||||
**Checks Before Generation**:
|
||||
1. User has sufficient subscription tier
|
||||
2. Estimated cost within monthly budget
|
||||
3. Video generation limit not exceeded
|
||||
4. Audio generation limit not exceeded
|
||||
5. Total story cost reasonable (<$5 for typical story)
|
||||
|
||||
**Validation Flow**:
|
||||
```python
|
||||
def validate_story_generation(
|
||||
pricing_service: PricingService,
|
||||
user_id: str,
|
||||
num_scenes: int,
|
||||
video_resolution: str,
|
||||
video_duration: int,
|
||||
use_voice_clone: bool,
|
||||
) -> Tuple[bool, str, Dict[str, Any]]:
|
||||
"""
|
||||
Pre-flight validation before story generation.
|
||||
Returns: (allowed, message, cost_breakdown)
|
||||
"""
|
||||
# Calculate estimated costs
|
||||
video_cost_per_scene = get_wavespeed_cost(video_resolution, video_duration)
|
||||
audio_cost_per_scene = get_voice_clone_cost() if use_voice_clone else 0.0
|
||||
|
||||
total_estimated_cost = (video_cost_per_scene + audio_cost_per_scene) * num_scenes
|
||||
|
||||
# Check limits
|
||||
limits = pricing_service.get_user_limits(user_id)
|
||||
current_usage = pricing_service.get_current_usage(user_id)
|
||||
|
||||
# Validation logic...
|
||||
return (allowed, message, cost_breakdown)
|
||||
```
|
||||
|
||||
### Cost Estimation Display
|
||||
|
||||
**Frontend Implementation**:
|
||||
- Real-time cost calculator in Story Setup
|
||||
- Per-scene cost breakdown
|
||||
- Total story cost estimate
|
||||
- Monthly budget remaining
|
||||
- Warning if approaching limits
|
||||
|
||||
**UI Example**:
|
||||
```
|
||||
Video Generation Cost Estimate:
|
||||
├─ Resolution: 720p ($0.10/second)
|
||||
├─ Duration: 5 seconds per scene
|
||||
├─ Scenes: 10
|
||||
└─ Total: $5.00
|
||||
|
||||
Audio Generation Cost Estimate:
|
||||
├─ Provider: Voice Clone ($0.02/minute)
|
||||
├─ Average: 30 seconds per scene
|
||||
├─ Scenes: 10
|
||||
└─ Total: $1.00
|
||||
|
||||
Total Estimated Cost: $6.00
|
||||
Monthly Budget Remaining: $44.00
|
||||
```
|
||||
|
||||
### Usage Tracking
|
||||
|
||||
**Enhanced Tracking**:
|
||||
- Track video generation per scene
|
||||
- Track audio generation per scene
|
||||
- Track total story cost
|
||||
- Alert users approaching limits
|
||||
- Provide cost breakdown in analytics
|
||||
|
||||
---
|
||||
|
||||
## Pricing Integration
|
||||
|
||||
### WaveSpeed WAN 2.5 Pricing
|
||||
|
||||
**Add to `pricing_service.py`**:
|
||||
```python
|
||||
# WaveSpeed WAN 2.5 Text-to-Video
|
||||
{
|
||||
"provider": APIProvider.VIDEO, # Or new WAVESPEED provider
|
||||
"model_name": "wan-2.5-480p",
|
||||
"cost_per_second": 0.05,
|
||||
"description": "WaveSpeed WAN 2.5 Text-to-Video (480p)"
|
||||
},
|
||||
{
|
||||
"provider": APIProvider.VIDEO,
|
||||
"model_name": "wan-2.5-720p",
|
||||
"cost_per_second": 0.10,
|
||||
"description": "WaveSpeed WAN 2.5 Text-to-Video (720p)"
|
||||
},
|
||||
{
|
||||
"provider": APIProvider.VIDEO,
|
||||
"model_name": "wan-2.5-1080p",
|
||||
"cost_per_second": 0.15,
|
||||
"description": "WaveSpeed WAN 2.5 Text-to-Video (1080p)"
|
||||
}
|
||||
```
|
||||
|
||||
### Minimax Voice Clone Pricing
|
||||
|
||||
**Add to `pricing_service.py`**:
|
||||
```python
|
||||
# Minimax Voice Clone
|
||||
{
|
||||
"provider": APIProvider.AUDIO, # New provider type
|
||||
"model_name": "minimax-voice-clone-train",
|
||||
"cost_per_request": 0.75, # One-time training cost
|
||||
"description": "Minimax Voice Clone Training"
|
||||
},
|
||||
{
|
||||
"provider": APIProvider.AUDIO,
|
||||
"model_name": "minimax-voice-clone-generate",
|
||||
"cost_per_minute": 0.02, # Per minute of generated audio
|
||||
"description": "Minimax Voice Clone Generation"
|
||||
}
|
||||
```
|
||||
|
||||
### Subscription Tier Limits
|
||||
|
||||
**Update subscription limits**:
|
||||
- **Free**: 3 stories/month, 480p only, gTTS only
|
||||
- **Basic**: 10 stories/month, up to 720p, voice clone available
|
||||
- **Pro**: 50 stories/month, up to 1080p, voice clone included
|
||||
- **Enterprise**: Unlimited, all features
|
||||
|
||||
---
|
||||
|
||||
## Technical Architecture
|
||||
|
||||
### Backend Services
|
||||
|
||||
```
|
||||
backend/services/
|
||||
├── wavespeed/
|
||||
│ ├── __init__.py
|
||||
│ ├── client.py # WaveSpeed API client
|
||||
│ ├── wan25_video.py # WAN 2.5 video generation
|
||||
│ └── models.py # Request/response models
|
||||
├── minimax/
|
||||
│ ├── __init__.py
|
||||
│ ├── client.py # Minimax API client
|
||||
│ ├── voice_clone.py # Voice cloning service
|
||||
│ └── models.py
|
||||
└── story_writer/
|
||||
├── audio_generation_service.py # Updated with voice clone
|
||||
└── video_generation_service.py # Updated with WaveSpeed
|
||||
```
|
||||
|
||||
### Frontend Components
|
||||
|
||||
```
|
||||
frontend/src/components/StoryWriter/
|
||||
├── Phases/StorySetup/
|
||||
│ └── GenerationSettingsSection.tsx # Enhanced with new settings
|
||||
├── components/
|
||||
│ ├── HdVideoSection.tsx # Updated for WaveSpeed
|
||||
│ ├── VoiceTrainingSection.tsx # NEW: Voice training UI
|
||||
│ └── CostEstimationDisplay.tsx # NEW: Cost calculator
|
||||
└── hooks/
|
||||
└── useStoryGenerationCost.ts # NEW: Cost calculation hook
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling & User Experience
|
||||
|
||||
### Error Scenarios
|
||||
|
||||
1. **WaveSpeed API Failure**:
|
||||
- Retry with exponential backoff (3 attempts)
|
||||
- Fallback to HuggingFace if available
|
||||
- Clear error message with cost refund notice
|
||||
|
||||
2. **Voice Clone Training Failure**:
|
||||
- Provide specific error (audio quality, length, format)
|
||||
- Suggest improvements
|
||||
- Allow retry with different audio
|
||||
|
||||
3. **Cost Limit Exceeded**:
|
||||
- Pre-flight validation prevents this
|
||||
- Show upgrade prompt
|
||||
- Suggest reducing scenes/resolution
|
||||
|
||||
4. **Audio/Video Mismatch**:
|
||||
- Validate audio length matches video duration
|
||||
- Auto-trim or extend audio
|
||||
- Warn user before generation
|
||||
|
||||
### User Feedback
|
||||
|
||||
- Progress indicators for all operations
|
||||
- Clear cost breakdowns
|
||||
- Quality previews before final generation
|
||||
- Regeneration options with cost tracking
|
||||
- Usage analytics dashboard
|
||||
|
||||
---
|
||||
|
||||
## Testing Plan
|
||||
|
||||
### Unit Tests
|
||||
- WaveSpeed API client
|
||||
- Voice clone service
|
||||
- Cost calculation
|
||||
- Pre-flight validation
|
||||
|
||||
### Integration Tests
|
||||
- End-to-end story generation
|
||||
- Audio + video synchronization
|
||||
- Error handling and fallbacks
|
||||
- Subscription limit enforcement
|
||||
|
||||
### User Acceptance Tests
|
||||
- Story generation workflow
|
||||
- Voice training process
|
||||
- Cost estimation accuracy
|
||||
- Error recovery
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
### Technical Metrics
|
||||
- Video generation success rate >95%
|
||||
- Audio generation success rate >98%
|
||||
- Average generation time per scene <30s
|
||||
- API error rate <2%
|
||||
|
||||
### Business Metrics
|
||||
- User satisfaction with video quality
|
||||
- Cost per story (target: <$5 for 10-scene story)
|
||||
- Voice clone adoption rate
|
||||
- Story completion rate
|
||||
|
||||
### User Experience Metrics
|
||||
- Time to generate story
|
||||
- Error recovery time
|
||||
- User understanding of costs
|
||||
- Feature discovery rate
|
||||
|
||||
---
|
||||
|
||||
## Provider Management Strategy
|
||||
|
||||
### Always-Available Options
|
||||
- **gTTS**: Always available, always free, works even when credits exhausted
|
||||
- **HuggingFace**: Preserved as fallback option, works when WaveSpeed unavailable
|
||||
|
||||
### Automatic Provider Routing
|
||||
- **Primary**: WaveSpeed WAN 2.5 (when credits available)
|
||||
- **Fallback**: HuggingFace (when WaveSpeed unavailable or credits exhausted)
|
||||
- **Audio Fallback**: gTTS (always available, always free)
|
||||
|
||||
### User Experience
|
||||
- Users never see provider names
|
||||
- System automatically selects best available option
|
||||
- Seamless fallback when credits exhausted
|
||||
- Clear notifications when fallback occurs
|
||||
- No user intervention required
|
||||
|
||||
### No Deprecation
|
||||
- **HuggingFace**: Kept as permanent fallback option
|
||||
- **gTTS**: Kept as permanent free option
|
||||
- All existing functionality preserved
|
||||
- New features are additions, not replacements
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Week 1**: Set up WaveSpeed API access and credentials
|
||||
2. **Week 1**: Implement provider-agnostic routing system
|
||||
3. **Week 2**: Integrate into Story Writer with quality-based UI
|
||||
4. **Week 3**: Implement voice cloning with simple "AI Clone" vs "Default" choice
|
||||
5. **Week 4**: Add voice training UI (only if AI Clone selected)
|
||||
6. **Week 5**: Add "Animate Scene" hover option in Outline
|
||||
7. **Week 6**: Add "Animate Story with VoiceOver" button in Writing
|
||||
8. **Week 7-8**: Testing, optimization, and polish
|
||||
|
||||
## Key Design Principles
|
||||
|
||||
1. **Provider Abstraction**: Users never see provider names - only quality/voice options
|
||||
2. **Preserve Existing**: gTTS and HuggingFace remain available as fallbacks
|
||||
3. **Cost Transparency**: All buttons show costs in tooltips
|
||||
4. **Automatic Fallback**: System automatically uses free options when credits exhausted
|
||||
5. **Per-Scene Only**: Outline phase only allows per-scene generation (no bulk)
|
||||
6. **User-Friendly**: Simple choices like "Standard Quality" not "WaveSpeed 480p"
|
||||
|
||||
---
|
||||
|
||||
## Risk Mitigation
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|------------|
|
||||
| WaveSpeed API changes | Version pinning, abstraction layer |
|
||||
| Cost overruns | Strict pre-flight validation |
|
||||
| Voice quality issues | Quality checks, fallback options |
|
||||
| User confusion | Clear UI, tooltips, documentation |
|
||||
| Integration complexity | Phased rollout, extensive testing |
|
||||
|
||||
---
|
||||
|
||||
*Document Version: 1.0*
|
||||
*Last Updated: January 2025*
|
||||
*Priority: HIGH - Immediate Implementation*
|
||||
|
||||
Reference in New Issue
Block a user