Base code

2026-01-08 22:39:53 +07:00
parent 697115c61a
commit c35fa52117
2169 changed files with 626670 additions and 0 deletions
--- a/docs/VIDEO_GENERATION_REFACTORING_PLAN.md
+++ b/docs/VIDEO_GENERATION_REFACTORING_PLAN.md
@@ -0,0 +1,208 @@
+# Video Generation Refactoring Plan
+
+## Goal
+Remove redundant/duplicate code across video studio, image studio, story writer, etc., and ensure all video generation goes through the unified `ai_video_generate()` entry point.
+
+## Current State Analysis
+
+### ✅ Already Using Unified Entry Point
+1. **Image Studio Transform Service** (`backend/services/image_studio/transform_service.py`)
+   - ✅ Uses `ai_video_generate()` for image-to-video
+   - ✅ Properly handles file saving and asset library
+
+2. **Video Studio Service - Image-to-Video** (`backend/services/video_studio/video_studio_service.py`)
+   - ✅ `generate_image_to_video()` uses `ai_video_generate()`
+   - ✅ Properly handles file saving and asset library
+
+3. **Story Writer** (`backend/api/story_writer/utils/hd_video.py`)
+   - ✅ Uses `ai_video_generate()` for text-to-video
+   - ✅ Properly handles file saving
+
+### ❌ Issues Found - Redundant Code
+
+1. **Video Studio Service - Text-to-Video** (`backend/services/video_studio/video_studio_service.py:99`)
+   - ❌ Calls `self.wavespeed_client.generate_video()` which **DOES NOT EXIST**
+   - ❌ Bypasses unified entry point
+   - ❌ Missing pre-flight validation
+   - ❌ Missing usage tracking
+   - **Action**: Refactor to use `ai_video_generate()`
+
+2. **Video Studio Service - Avatar Generation** (`backend/services/video_studio/video_studio_service.py:320`)
+   - ❌ Calls `self.wavespeed_client.generate_video()` which **DOES NOT EXIST**
+   - ⚠️ This is a different operation (talking avatar) - may need separate handling
+   - **Action**: Investigate if this should use unified entry point or stay separate
+
+3. **Video Studio Service - Video Enhancement** (`backend/services/video_studio/video_studio_service.py:405`)
+   - ❌ Calls `self.wavespeed_client.generate_video()` which **DOES NOT EXIST**
+   - ⚠️ This is a different operation (video-to-video) - may need separate handling
+   - **Action**: Investigate if this should use unified entry point or stay separate
+
+4. **Unified Entry Point - WaveSpeed Text-to-Video** (`backend/services/llm_providers/main_video_generation.py:454`)
+   - ❌ Currently raises `VideoProviderNotImplemented` for WaveSpeed text-to-video
+   - **Action**: Implement WaveSpeed text-to-video support
+
+### ⚠️ Special Cases (Keep Separate for Now)
+
+1. **Podcast InfiniteTalk** (`backend/services/wavespeed/infinitetalk.py`)
+   - ✅ Specialized operation: talking avatar with audio sync
+   - ✅ Has its own polling and error handling
+   - **Decision**: Keep separate - this is a specialized use case
+
+## Refactoring Steps
+
+### Phase 1: Implement WaveSpeed Text-to-Video in Unified Entry Point
+
+**File**: `backend/services/llm_providers/main_video_generation.py`
+
+**Changes**:
+1. Add `_generate_text_to_video_wavespeed()` function
+2. Use `WaveSpeedClient.generate_text_video()` or `submit_text_to_video()` + polling
+3. Support models: hunyuan-video-1.5, ltx-2-pro, ltx-2-fast, ltx-2-retake
+4. Return metadata dict with video_bytes, cost, duration, etc.
+
+**Implementation**:
+```python
+async def _generate_text_to_video_wavespeed(
+    prompt: str,
+    duration: int = 5,
+    resolution: str = "720p",
+    model: str = "hunyuan-video-1.5/text-to-video",
+    negative_prompt: Optional[str] = None,
+    seed: Optional[int] = None,
+    audio_base64: Optional[str] = None,
+    enable_prompt_expansion: bool = True,
+    progress_callback: Optional[Callable[[float, str], None]] = None,
+    **kwargs
+) -> Dict[str, Any]:
+    """Generate text-to-video using WaveSpeed models."""
+    from services.wavespeed.client import WaveSpeedClient
+    
+    client = WaveSpeedClient()
+    
+    # Map model names to full paths
+    model_mapping = {
+        "hunyuan-video-1.5": "hunyuan-video-1.5/text-to-video",
+        "lightricks/ltx-2-pro": "lightricks/ltx-2-pro/text-to-video",
+        "lightricks/ltx-2-fast": "lightricks/ltx-2-fast/text-to-video",
+        "lightricks/ltx-2-retake": "lightricks/ltx-2-retake/text-to-video",
+    }
+    full_model = model_mapping.get(model, model)
+    
+    # Use generate_text_video which handles polling internally
+    result = await client.generate_text_video(
+        prompt=prompt,
+        resolution=resolution,
+        duration=duration,
+        negative_prompt=negative_prompt,
+        seed=seed,
+        audio_base64=audio_base64,
+        enable_prompt_expansion=enable_prompt_expansion,
+        enable_sync_mode=False,  # Use async mode with polling
+        timeout=600,  # 10 minutes
+    )
+    
+    return {
+        "video_bytes": result["video_bytes"],
+        "prompt": prompt,
+        "duration": float(duration),
+        "model_name": full_model,
+        "cost": result.get("cost", 0.0),
+        "provider": "wavespeed",
+        "resolution": resolution,
+        "width": result.get("width", 1280),
+        "height": result.get("height", 720),
+        "metadata": result.get("metadata", {}),
+    }
+```
+
+### Phase 2: Refactor VideoStudioService.generate_text_to_video()
+
+**File**: `backend/services/video_studio/video_studio_service.py`
+
+**Changes**:
+1. Replace `self.wavespeed_client.generate_video()` call with `ai_video_generate()`
+2. Remove model mapping (handled in unified entry point)
+3. Remove cost calculation (handled in unified entry point)
+4. Add file saving and asset library integration
+5. Preserve existing return format for backward compatibility
+
+**Before**:
+```python
+result = await self.wavespeed_client.generate_video(...)  # DOES NOT EXIST
+```
+
+**After**:
+```python
+result = ai_video_generate(
+    prompt=prompt,
+    operation_type="text-to-video",
+    provider=provider,
+    user_id=user_id,
+    duration=duration,
+    resolution=resolution,
+    negative_prompt=negative_prompt,
+    model=model,
+    **kwargs
+)
+
+# Save file and update asset library
+save_result = self._save_video_file(...)
+```
+
+### Phase 3: Fix Avatar and Enhancement Methods
+
+**Decision Needed**: 
+- Are avatar generation and video enhancement different enough to warrant separate handling?
+- Or should they be integrated into unified entry point?
+
+**Options**:
+1. **Keep Separate**: Create separate unified entry points (`ai_avatar_generate()`, `ai_video_enhance()`)
+2. **Integrate**: Add `operation_type="avatar"` and `operation_type="enhance"` to `ai_video_generate()`
+
+**Recommendation**: Keep separate for now, but ensure they use proper WaveSpeed client methods.
+
+## Testing Strategy
+
+### Pre-Refactoring
+1. ✅ Document current behavior
+2. ✅ Identify all call sites
+3. ✅ Create test cases for each scenario
+
+### Post-Refactoring
+1. Test text-to-video with WaveSpeed models
+2. Test image-to-video (already working)
+3. Verify pre-flight validation works
+4. Verify usage tracking works
+5. Verify file saving works
+6. Verify asset library integration works
+
+## Risk Mitigation
+
+1. **Backward Compatibility**: Preserve existing return formats
+2. **Gradual Migration**: Refactor one method at a time
+3. **Feature Flags**: Consider feature flag for new unified path
+4. **Comprehensive Testing**: Test all scenarios before deployment
+
+## Files to Modify
+
+1. `backend/services/llm_providers/main_video_generation.py`
+   - Add `_generate_text_to_video_wavespeed()`
+   - Update `ai_video_generate()` to support WaveSpeed text-to-video
+
+2. `backend/services/video_studio/video_studio_service.py`
+   - Refactor `generate_text_to_video()` to use `ai_video_generate()`
+   - Fix `generate_avatar()` and `enhance_video()` method calls
+
+3. `backend/routers/video_studio.py`
+   - Update to use refactored service methods
+
+## Success Criteria
+
+- ✅ All video generation goes through unified entry point
+- ✅ No redundant code
+- ✅ Pre-flight validation works everywhere
+- ✅ Usage tracking works everywhere
+- ✅ File saving works everywhere
+- ✅ Asset library integration works everywhere
+- ✅ No breaking changes
+- ✅ All existing functionality preserved