209 lines
7.7 KiB
Markdown
209 lines
7.7 KiB
Markdown
# Video Generation Refactoring Plan
|
|
|
|
## Goal
|
|
Remove redundant/duplicate code across video studio, image studio, story writer, etc., and ensure all video generation goes through the unified `ai_video_generate()` entry point.
|
|
|
|
## Current State Analysis
|
|
|
|
### ✅ Already Using Unified Entry Point
|
|
1. **Image Studio Transform Service** (`backend/services/image_studio/transform_service.py`)
|
|
- ✅ Uses `ai_video_generate()` for image-to-video
|
|
- ✅ Properly handles file saving and asset library
|
|
|
|
2. **Video Studio Service - Image-to-Video** (`backend/services/video_studio/video_studio_service.py`)
|
|
- ✅ `generate_image_to_video()` uses `ai_video_generate()`
|
|
- ✅ Properly handles file saving and asset library
|
|
|
|
3. **Story Writer** (`backend/api/story_writer/utils/hd_video.py`)
|
|
- ✅ Uses `ai_video_generate()` for text-to-video
|
|
- ✅ Properly handles file saving
|
|
|
|
### ❌ Issues Found - Redundant Code
|
|
|
|
1. **Video Studio Service - Text-to-Video** (`backend/services/video_studio/video_studio_service.py:99`)
|
|
- ❌ Calls `self.wavespeed_client.generate_video()` which **DOES NOT EXIST**
|
|
- ❌ Bypasses unified entry point
|
|
- ❌ Missing pre-flight validation
|
|
- ❌ Missing usage tracking
|
|
- **Action**: Refactor to use `ai_video_generate()`
|
|
|
|
2. **Video Studio Service - Avatar Generation** (`backend/services/video_studio/video_studio_service.py:320`)
|
|
- ❌ Calls `self.wavespeed_client.generate_video()` which **DOES NOT EXIST**
|
|
- ⚠️ This is a different operation (talking avatar) - may need separate handling
|
|
- **Action**: Investigate if this should use unified entry point or stay separate
|
|
|
|
3. **Video Studio Service - Video Enhancement** (`backend/services/video_studio/video_studio_service.py:405`)
|
|
- ❌ Calls `self.wavespeed_client.generate_video()` which **DOES NOT EXIST**
|
|
- ⚠️ This is a different operation (video-to-video) - may need separate handling
|
|
- **Action**: Investigate if this should use unified entry point or stay separate
|
|
|
|
4. **Unified Entry Point - WaveSpeed Text-to-Video** (`backend/services/llm_providers/main_video_generation.py:454`)
|
|
- ❌ Currently raises `VideoProviderNotImplemented` for WaveSpeed text-to-video
|
|
- **Action**: Implement WaveSpeed text-to-video support
|
|
|
|
### ⚠️ Special Cases (Keep Separate for Now)
|
|
|
|
1. **Podcast InfiniteTalk** (`backend/services/wavespeed/infinitetalk.py`)
|
|
- ✅ Specialized operation: talking avatar with audio sync
|
|
- ✅ Has its own polling and error handling
|
|
- **Decision**: Keep separate - this is a specialized use case
|
|
|
|
## Refactoring Steps
|
|
|
|
### Phase 1: Implement WaveSpeed Text-to-Video in Unified Entry Point
|
|
|
|
**File**: `backend/services/llm_providers/main_video_generation.py`
|
|
|
|
**Changes**:
|
|
1. Add `_generate_text_to_video_wavespeed()` function
|
|
2. Use `WaveSpeedClient.generate_text_video()` or `submit_text_to_video()` + polling
|
|
3. Support models: hunyuan-video-1.5, ltx-2-pro, ltx-2-fast, ltx-2-retake
|
|
4. Return metadata dict with video_bytes, cost, duration, etc.
|
|
|
|
**Implementation**:
|
|
```python
|
|
async def _generate_text_to_video_wavespeed(
|
|
prompt: str,
|
|
duration: int = 5,
|
|
resolution: str = "720p",
|
|
model: str = "hunyuan-video-1.5/text-to-video",
|
|
negative_prompt: Optional[str] = None,
|
|
seed: Optional[int] = None,
|
|
audio_base64: Optional[str] = None,
|
|
enable_prompt_expansion: bool = True,
|
|
progress_callback: Optional[Callable[[float, str], None]] = None,
|
|
**kwargs
|
|
) -> Dict[str, Any]:
|
|
"""Generate text-to-video using WaveSpeed models."""
|
|
from services.wavespeed.client import WaveSpeedClient
|
|
|
|
client = WaveSpeedClient()
|
|
|
|
# Map model names to full paths
|
|
model_mapping = {
|
|
"hunyuan-video-1.5": "hunyuan-video-1.5/text-to-video",
|
|
"lightricks/ltx-2-pro": "lightricks/ltx-2-pro/text-to-video",
|
|
"lightricks/ltx-2-fast": "lightricks/ltx-2-fast/text-to-video",
|
|
"lightricks/ltx-2-retake": "lightricks/ltx-2-retake/text-to-video",
|
|
}
|
|
full_model = model_mapping.get(model, model)
|
|
|
|
# Use generate_text_video which handles polling internally
|
|
result = await client.generate_text_video(
|
|
prompt=prompt,
|
|
resolution=resolution,
|
|
duration=duration,
|
|
negative_prompt=negative_prompt,
|
|
seed=seed,
|
|
audio_base64=audio_base64,
|
|
enable_prompt_expansion=enable_prompt_expansion,
|
|
enable_sync_mode=False, # Use async mode with polling
|
|
timeout=600, # 10 minutes
|
|
)
|
|
|
|
return {
|
|
"video_bytes": result["video_bytes"],
|
|
"prompt": prompt,
|
|
"duration": float(duration),
|
|
"model_name": full_model,
|
|
"cost": result.get("cost", 0.0),
|
|
"provider": "wavespeed",
|
|
"resolution": resolution,
|
|
"width": result.get("width", 1280),
|
|
"height": result.get("height", 720),
|
|
"metadata": result.get("metadata", {}),
|
|
}
|
|
```
|
|
|
|
### Phase 2: Refactor VideoStudioService.generate_text_to_video()
|
|
|
|
**File**: `backend/services/video_studio/video_studio_service.py`
|
|
|
|
**Changes**:
|
|
1. Replace `self.wavespeed_client.generate_video()` call with `ai_video_generate()`
|
|
2. Remove model mapping (handled in unified entry point)
|
|
3. Remove cost calculation (handled in unified entry point)
|
|
4. Add file saving and asset library integration
|
|
5. Preserve existing return format for backward compatibility
|
|
|
|
**Before**:
|
|
```python
|
|
result = await self.wavespeed_client.generate_video(...) # DOES NOT EXIST
|
|
```
|
|
|
|
**After**:
|
|
```python
|
|
result = ai_video_generate(
|
|
prompt=prompt,
|
|
operation_type="text-to-video",
|
|
provider=provider,
|
|
user_id=user_id,
|
|
duration=duration,
|
|
resolution=resolution,
|
|
negative_prompt=negative_prompt,
|
|
model=model,
|
|
**kwargs
|
|
)
|
|
|
|
# Save file and update asset library
|
|
save_result = self._save_video_file(...)
|
|
```
|
|
|
|
### Phase 3: Fix Avatar and Enhancement Methods
|
|
|
|
**Decision Needed**:
|
|
- Are avatar generation and video enhancement different enough to warrant separate handling?
|
|
- Or should they be integrated into unified entry point?
|
|
|
|
**Options**:
|
|
1. **Keep Separate**: Create separate unified entry points (`ai_avatar_generate()`, `ai_video_enhance()`)
|
|
2. **Integrate**: Add `operation_type="avatar"` and `operation_type="enhance"` to `ai_video_generate()`
|
|
|
|
**Recommendation**: Keep separate for now, but ensure they use proper WaveSpeed client methods.
|
|
|
|
## Testing Strategy
|
|
|
|
### Pre-Refactoring
|
|
1. ✅ Document current behavior
|
|
2. ✅ Identify all call sites
|
|
3. ✅ Create test cases for each scenario
|
|
|
|
### Post-Refactoring
|
|
1. Test text-to-video with WaveSpeed models
|
|
2. Test image-to-video (already working)
|
|
3. Verify pre-flight validation works
|
|
4. Verify usage tracking works
|
|
5. Verify file saving works
|
|
6. Verify asset library integration works
|
|
|
|
## Risk Mitigation
|
|
|
|
1. **Backward Compatibility**: Preserve existing return formats
|
|
2. **Gradual Migration**: Refactor one method at a time
|
|
3. **Feature Flags**: Consider feature flag for new unified path
|
|
4. **Comprehensive Testing**: Test all scenarios before deployment
|
|
|
|
## Files to Modify
|
|
|
|
1. `backend/services/llm_providers/main_video_generation.py`
|
|
- Add `_generate_text_to_video_wavespeed()`
|
|
- Update `ai_video_generate()` to support WaveSpeed text-to-video
|
|
|
|
2. `backend/services/video_studio/video_studio_service.py`
|
|
- Refactor `generate_text_to_video()` to use `ai_video_generate()`
|
|
- Fix `generate_avatar()` and `enhance_video()` method calls
|
|
|
|
3. `backend/routers/video_studio.py`
|
|
- Update to use refactored service methods
|
|
|
|
## Success Criteria
|
|
|
|
- ✅ All video generation goes through unified entry point
|
|
- ✅ No redundant code
|
|
- ✅ Pre-flight validation works everywhere
|
|
- ✅ Usage tracking works everywhere
|
|
- ✅ File saving works everywhere
|
|
- ✅ Asset library integration works everywhere
|
|
- ✅ No breaking changes
|
|
- ✅ All existing functionality preserved
|