Files

Kunthawat Greethong c35fa52117 Base code

2026-01-08 22:39:53 +07:00

7.7 KiB

Raw Permalink Blame History

Video Generation Refactoring Plan

Goal

Remove redundant/duplicate code across video studio, image studio, story writer, etc., and ensure all video generation goes through the unified ai_video_generate() entry point.

Current State Analysis

✅ Already Using Unified Entry Point

Image Studio Transform Service (backend/services/image_studio/transform_service.py)
- ✅ Uses ai_video_generate() for image-to-video
- ✅ Properly handles file saving and asset library
Video Studio Service - Image-to-Video (backend/services/video_studio/video_studio_service.py)
- ✅ generate_image_to_video() uses ai_video_generate()
- ✅ Properly handles file saving and asset library
Story Writer (backend/api/story_writer/utils/hd_video.py)
- ✅ Uses ai_video_generate() for text-to-video
- ✅ Properly handles file saving

❌ Issues Found - Redundant Code

Video Studio Service - Text-to-Video (backend/services/video_studio/video_studio_service.py:99)
- ❌ Calls self.wavespeed_client.generate_video() which DOES NOT EXIST
- ❌ Bypasses unified entry point
- ❌ Missing pre-flight validation
- ❌ Missing usage tracking
- Action: Refactor to use ai_video_generate()
Video Studio Service - Avatar Generation (backend/services/video_studio/video_studio_service.py:320)
- ❌ Calls self.wavespeed_client.generate_video() which DOES NOT EXIST
- ⚠️ This is a different operation (talking avatar) - may need separate handling
- Action: Investigate if this should use unified entry point or stay separate
Video Studio Service - Video Enhancement (backend/services/video_studio/video_studio_service.py:405)
- ❌ Calls self.wavespeed_client.generate_video() which DOES NOT EXIST
- ⚠️ This is a different operation (video-to-video) - may need separate handling
- Action: Investigate if this should use unified entry point or stay separate
Unified Entry Point - WaveSpeed Text-to-Video (backend/services/llm_providers/main_video_generation.py:454)
- ❌ Currently raises VideoProviderNotImplemented for WaveSpeed text-to-video
- Action: Implement WaveSpeed text-to-video support

⚠️ Special Cases (Keep Separate for Now)

Podcast InfiniteTalk (backend/services/wavespeed/infinitetalk.py)
- ✅ Specialized operation: talking avatar with audio sync
- ✅ Has its own polling and error handling
- Decision: Keep separate - this is a specialized use case

Refactoring Steps

Phase 1: Implement WaveSpeed Text-to-Video in Unified Entry Point

File: backend/services/llm_providers/main_video_generation.py

Changes:

Add _generate_text_to_video_wavespeed() function
Use WaveSpeedClient.generate_text_video() or submit_text_to_video() + polling
Support models: hunyuan-video-1.5, ltx-2-pro, ltx-2-fast, ltx-2-retake
Return metadata dict with video_bytes, cost, duration, etc.

Implementation:

async def _generate_text_to_video_wavespeed(
    prompt: str,
    duration: int = 5,
    resolution: str = "720p",
    model: str = "hunyuan-video-1.5/text-to-video",
    negative_prompt: Optional[str] = None,
    seed: Optional[int] = None,
    audio_base64: Optional[str] = None,
    enable_prompt_expansion: bool = True,
    progress_callback: Optional[Callable[[float, str], None]] = None,
    **kwargs
) -> Dict[str, Any]:
    """Generate text-to-video using WaveSpeed models."""
    from services.wavespeed.client import WaveSpeedClient
    
    client = WaveSpeedClient()
    
    # Map model names to full paths
    model_mapping = {
        "hunyuan-video-1.5": "hunyuan-video-1.5/text-to-video",
        "lightricks/ltx-2-pro": "lightricks/ltx-2-pro/text-to-video",
        "lightricks/ltx-2-fast": "lightricks/ltx-2-fast/text-to-video",
        "lightricks/ltx-2-retake": "lightricks/ltx-2-retake/text-to-video",
    }
    full_model = model_mapping.get(model, model)
    
    # Use generate_text_video which handles polling internally
    result = await client.generate_text_video(
        prompt=prompt,
        resolution=resolution,
        duration=duration,
        negative_prompt=negative_prompt,
        seed=seed,
        audio_base64=audio_base64,
        enable_prompt_expansion=enable_prompt_expansion,
        enable_sync_mode=False,  # Use async mode with polling
        timeout=600,  # 10 minutes
    )
    
    return {
        "video_bytes": result["video_bytes"],
        "prompt": prompt,
        "duration": float(duration),
        "model_name": full_model,
        "cost": result.get("cost", 0.0),
        "provider": "wavespeed",
        "resolution": resolution,
        "width": result.get("width", 1280),
        "height": result.get("height", 720),
        "metadata": result.get("metadata", {}),
    }

Phase 2: Refactor VideoStudioService.generate_text_to_video()

File: backend/services/video_studio/video_studio_service.py

Changes:

Replace self.wavespeed_client.generate_video() call with ai_video_generate()
Remove model mapping (handled in unified entry point)
Remove cost calculation (handled in unified entry point)
Add file saving and asset library integration
Preserve existing return format for backward compatibility

Before:

result = await self.wavespeed_client.generate_video(...)  # DOES NOT EXIST

After:

result = ai_video_generate(
    prompt=prompt,
    operation_type="text-to-video",
    provider=provider,
    user_id=user_id,
    duration=duration,
    resolution=resolution,
    negative_prompt=negative_prompt,
    model=model,
    **kwargs
)

# Save file and update asset library
save_result = self._save_video_file(...)

Phase 3: Fix Avatar and Enhancement Methods

Decision Needed:

Are avatar generation and video enhancement different enough to warrant separate handling?
Or should they be integrated into unified entry point?

Options:

Keep Separate: Create separate unified entry points (ai_avatar_generate(), ai_video_enhance())
Integrate: Add operation_type="avatar" and operation_type="enhance" to ai_video_generate()

Recommendation: Keep separate for now, but ensure they use proper WaveSpeed client methods.

Testing Strategy

Pre-Refactoring

✅ Document current behavior
✅ Identify all call sites
✅ Create test cases for each scenario

Post-Refactoring

Test text-to-video with WaveSpeed models
Test image-to-video (already working)
Verify pre-flight validation works
Verify usage tracking works
Verify file saving works
Verify asset library integration works

Risk Mitigation

Backward Compatibility: Preserve existing return formats
Gradual Migration: Refactor one method at a time
Feature Flags: Consider feature flag for new unified path
Comprehensive Testing: Test all scenarios before deployment

Files to Modify

backend/services/llm_providers/main_video_generation.py
- Add _generate_text_to_video_wavespeed()
- Update ai_video_generate() to support WaveSpeed text-to-video
backend/services/video_studio/video_studio_service.py
- Refactor generate_text_to_video() to use ai_video_generate()
- Fix generate_avatar() and enhance_video() method calls
backend/routers/video_studio.py
- Update to use refactored service methods

Success Criteria

✅ All video generation goes through unified entry point
✅ No redundant code
✅ Pre-flight validation works everywhere
✅ Usage tracking works everywhere
✅ File saving works everywhere
✅ Asset library integration works everywhere
✅ No breaking changes
✅ All existing functionality preserved

7.7 KiB Raw Permalink Blame History