Base code
This commit is contained in:
208
docs/VIDEO_GENERATION_REFACTORING_PLAN.md
Normal file
208
docs/VIDEO_GENERATION_REFACTORING_PLAN.md
Normal file
@@ -0,0 +1,208 @@
|
||||
# Video Generation Refactoring Plan
|
||||
|
||||
## Goal
|
||||
Remove redundant/duplicate code across video studio, image studio, story writer, etc., and ensure all video generation goes through the unified `ai_video_generate()` entry point.
|
||||
|
||||
## Current State Analysis
|
||||
|
||||
### ✅ Already Using Unified Entry Point
|
||||
1. **Image Studio Transform Service** (`backend/services/image_studio/transform_service.py`)
|
||||
- ✅ Uses `ai_video_generate()` for image-to-video
|
||||
- ✅ Properly handles file saving and asset library
|
||||
|
||||
2. **Video Studio Service - Image-to-Video** (`backend/services/video_studio/video_studio_service.py`)
|
||||
- ✅ `generate_image_to_video()` uses `ai_video_generate()`
|
||||
- ✅ Properly handles file saving and asset library
|
||||
|
||||
3. **Story Writer** (`backend/api/story_writer/utils/hd_video.py`)
|
||||
- ✅ Uses `ai_video_generate()` for text-to-video
|
||||
- ✅ Properly handles file saving
|
||||
|
||||
### ❌ Issues Found - Redundant Code
|
||||
|
||||
1. **Video Studio Service - Text-to-Video** (`backend/services/video_studio/video_studio_service.py:99`)
|
||||
- ❌ Calls `self.wavespeed_client.generate_video()` which **DOES NOT EXIST**
|
||||
- ❌ Bypasses unified entry point
|
||||
- ❌ Missing pre-flight validation
|
||||
- ❌ Missing usage tracking
|
||||
- **Action**: Refactor to use `ai_video_generate()`
|
||||
|
||||
2. **Video Studio Service - Avatar Generation** (`backend/services/video_studio/video_studio_service.py:320`)
|
||||
- ❌ Calls `self.wavespeed_client.generate_video()` which **DOES NOT EXIST**
|
||||
- ⚠️ This is a different operation (talking avatar) - may need separate handling
|
||||
- **Action**: Investigate if this should use unified entry point or stay separate
|
||||
|
||||
3. **Video Studio Service - Video Enhancement** (`backend/services/video_studio/video_studio_service.py:405`)
|
||||
- ❌ Calls `self.wavespeed_client.generate_video()` which **DOES NOT EXIST**
|
||||
- ⚠️ This is a different operation (video-to-video) - may need separate handling
|
||||
- **Action**: Investigate if this should use unified entry point or stay separate
|
||||
|
||||
4. **Unified Entry Point - WaveSpeed Text-to-Video** (`backend/services/llm_providers/main_video_generation.py:454`)
|
||||
- ❌ Currently raises `VideoProviderNotImplemented` for WaveSpeed text-to-video
|
||||
- **Action**: Implement WaveSpeed text-to-video support
|
||||
|
||||
### ⚠️ Special Cases (Keep Separate for Now)
|
||||
|
||||
1. **Podcast InfiniteTalk** (`backend/services/wavespeed/infinitetalk.py`)
|
||||
- ✅ Specialized operation: talking avatar with audio sync
|
||||
- ✅ Has its own polling and error handling
|
||||
- **Decision**: Keep separate - this is a specialized use case
|
||||
|
||||
## Refactoring Steps
|
||||
|
||||
### Phase 1: Implement WaveSpeed Text-to-Video in Unified Entry Point
|
||||
|
||||
**File**: `backend/services/llm_providers/main_video_generation.py`
|
||||
|
||||
**Changes**:
|
||||
1. Add `_generate_text_to_video_wavespeed()` function
|
||||
2. Use `WaveSpeedClient.generate_text_video()` or `submit_text_to_video()` + polling
|
||||
3. Support models: hunyuan-video-1.5, ltx-2-pro, ltx-2-fast, ltx-2-retake
|
||||
4. Return metadata dict with video_bytes, cost, duration, etc.
|
||||
|
||||
**Implementation**:
|
||||
```python
|
||||
async def _generate_text_to_video_wavespeed(
|
||||
prompt: str,
|
||||
duration: int = 5,
|
||||
resolution: str = "720p",
|
||||
model: str = "hunyuan-video-1.5/text-to-video",
|
||||
negative_prompt: Optional[str] = None,
|
||||
seed: Optional[int] = None,
|
||||
audio_base64: Optional[str] = None,
|
||||
enable_prompt_expansion: bool = True,
|
||||
progress_callback: Optional[Callable[[float, str], None]] = None,
|
||||
**kwargs
|
||||
) -> Dict[str, Any]:
|
||||
"""Generate text-to-video using WaveSpeed models."""
|
||||
from services.wavespeed.client import WaveSpeedClient
|
||||
|
||||
client = WaveSpeedClient()
|
||||
|
||||
# Map model names to full paths
|
||||
model_mapping = {
|
||||
"hunyuan-video-1.5": "hunyuan-video-1.5/text-to-video",
|
||||
"lightricks/ltx-2-pro": "lightricks/ltx-2-pro/text-to-video",
|
||||
"lightricks/ltx-2-fast": "lightricks/ltx-2-fast/text-to-video",
|
||||
"lightricks/ltx-2-retake": "lightricks/ltx-2-retake/text-to-video",
|
||||
}
|
||||
full_model = model_mapping.get(model, model)
|
||||
|
||||
# Use generate_text_video which handles polling internally
|
||||
result = await client.generate_text_video(
|
||||
prompt=prompt,
|
||||
resolution=resolution,
|
||||
duration=duration,
|
||||
negative_prompt=negative_prompt,
|
||||
seed=seed,
|
||||
audio_base64=audio_base64,
|
||||
enable_prompt_expansion=enable_prompt_expansion,
|
||||
enable_sync_mode=False, # Use async mode with polling
|
||||
timeout=600, # 10 minutes
|
||||
)
|
||||
|
||||
return {
|
||||
"video_bytes": result["video_bytes"],
|
||||
"prompt": prompt,
|
||||
"duration": float(duration),
|
||||
"model_name": full_model,
|
||||
"cost": result.get("cost", 0.0),
|
||||
"provider": "wavespeed",
|
||||
"resolution": resolution,
|
||||
"width": result.get("width", 1280),
|
||||
"height": result.get("height", 720),
|
||||
"metadata": result.get("metadata", {}),
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 2: Refactor VideoStudioService.generate_text_to_video()
|
||||
|
||||
**File**: `backend/services/video_studio/video_studio_service.py`
|
||||
|
||||
**Changes**:
|
||||
1. Replace `self.wavespeed_client.generate_video()` call with `ai_video_generate()`
|
||||
2. Remove model mapping (handled in unified entry point)
|
||||
3. Remove cost calculation (handled in unified entry point)
|
||||
4. Add file saving and asset library integration
|
||||
5. Preserve existing return format for backward compatibility
|
||||
|
||||
**Before**:
|
||||
```python
|
||||
result = await self.wavespeed_client.generate_video(...) # DOES NOT EXIST
|
||||
```
|
||||
|
||||
**After**:
|
||||
```python
|
||||
result = ai_video_generate(
|
||||
prompt=prompt,
|
||||
operation_type="text-to-video",
|
||||
provider=provider,
|
||||
user_id=user_id,
|
||||
duration=duration,
|
||||
resolution=resolution,
|
||||
negative_prompt=negative_prompt,
|
||||
model=model,
|
||||
**kwargs
|
||||
)
|
||||
|
||||
# Save file and update asset library
|
||||
save_result = self._save_video_file(...)
|
||||
```
|
||||
|
||||
### Phase 3: Fix Avatar and Enhancement Methods
|
||||
|
||||
**Decision Needed**:
|
||||
- Are avatar generation and video enhancement different enough to warrant separate handling?
|
||||
- Or should they be integrated into unified entry point?
|
||||
|
||||
**Options**:
|
||||
1. **Keep Separate**: Create separate unified entry points (`ai_avatar_generate()`, `ai_video_enhance()`)
|
||||
2. **Integrate**: Add `operation_type="avatar"` and `operation_type="enhance"` to `ai_video_generate()`
|
||||
|
||||
**Recommendation**: Keep separate for now, but ensure they use proper WaveSpeed client methods.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Pre-Refactoring
|
||||
1. ✅ Document current behavior
|
||||
2. ✅ Identify all call sites
|
||||
3. ✅ Create test cases for each scenario
|
||||
|
||||
### Post-Refactoring
|
||||
1. Test text-to-video with WaveSpeed models
|
||||
2. Test image-to-video (already working)
|
||||
3. Verify pre-flight validation works
|
||||
4. Verify usage tracking works
|
||||
5. Verify file saving works
|
||||
6. Verify asset library integration works
|
||||
|
||||
## Risk Mitigation
|
||||
|
||||
1. **Backward Compatibility**: Preserve existing return formats
|
||||
2. **Gradual Migration**: Refactor one method at a time
|
||||
3. **Feature Flags**: Consider feature flag for new unified path
|
||||
4. **Comprehensive Testing**: Test all scenarios before deployment
|
||||
|
||||
## Files to Modify
|
||||
|
||||
1. `backend/services/llm_providers/main_video_generation.py`
|
||||
- Add `_generate_text_to_video_wavespeed()`
|
||||
- Update `ai_video_generate()` to support WaveSpeed text-to-video
|
||||
|
||||
2. `backend/services/video_studio/video_studio_service.py`
|
||||
- Refactor `generate_text_to_video()` to use `ai_video_generate()`
|
||||
- Fix `generate_avatar()` and `enhance_video()` method calls
|
||||
|
||||
3. `backend/routers/video_studio.py`
|
||||
- Update to use refactored service methods
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- ✅ All video generation goes through unified entry point
|
||||
- ✅ No redundant code
|
||||
- ✅ Pre-flight validation works everywhere
|
||||
- ✅ Usage tracking works everywhere
|
||||
- ✅ File saving works everywhere
|
||||
- ✅ Asset library integration works everywhere
|
||||
- ✅ No breaking changes
|
||||
- ✅ All existing functionality preserved
|
||||
Reference in New Issue
Block a user