Files
moreminimore-marketing/docs/VIDEO_GENERATION_REFACTORING_PLAN.md
Kunthawat Greethong c35fa52117 Base code
2026-01-08 22:39:53 +07:00

7.7 KiB

Video Generation Refactoring Plan

Goal

Remove redundant/duplicate code across video studio, image studio, story writer, etc., and ensure all video generation goes through the unified ai_video_generate() entry point.

Current State Analysis

Already Using Unified Entry Point

  1. Image Studio Transform Service (backend/services/image_studio/transform_service.py)

    • Uses ai_video_generate() for image-to-video
    • Properly handles file saving and asset library
  2. Video Studio Service - Image-to-Video (backend/services/video_studio/video_studio_service.py)

    • generate_image_to_video() uses ai_video_generate()
    • Properly handles file saving and asset library
  3. Story Writer (backend/api/story_writer/utils/hd_video.py)

    • Uses ai_video_generate() for text-to-video
    • Properly handles file saving

Issues Found - Redundant Code

  1. Video Studio Service - Text-to-Video (backend/services/video_studio/video_studio_service.py:99)

    • Calls self.wavespeed_client.generate_video() which DOES NOT EXIST
    • Bypasses unified entry point
    • Missing pre-flight validation
    • Missing usage tracking
    • Action: Refactor to use ai_video_generate()
  2. Video Studio Service - Avatar Generation (backend/services/video_studio/video_studio_service.py:320)

    • Calls self.wavespeed_client.generate_video() which DOES NOT EXIST
    • ⚠️ This is a different operation (talking avatar) - may need separate handling
    • Action: Investigate if this should use unified entry point or stay separate
  3. Video Studio Service - Video Enhancement (backend/services/video_studio/video_studio_service.py:405)

    • Calls self.wavespeed_client.generate_video() which DOES NOT EXIST
    • ⚠️ This is a different operation (video-to-video) - may need separate handling
    • Action: Investigate if this should use unified entry point or stay separate
  4. Unified Entry Point - WaveSpeed Text-to-Video (backend/services/llm_providers/main_video_generation.py:454)

    • Currently raises VideoProviderNotImplemented for WaveSpeed text-to-video
    • Action: Implement WaveSpeed text-to-video support

⚠️ Special Cases (Keep Separate for Now)

  1. Podcast InfiniteTalk (backend/services/wavespeed/infinitetalk.py)
    • Specialized operation: talking avatar with audio sync
    • Has its own polling and error handling
    • Decision: Keep separate - this is a specialized use case

Refactoring Steps

Phase 1: Implement WaveSpeed Text-to-Video in Unified Entry Point

File: backend/services/llm_providers/main_video_generation.py

Changes:

  1. Add _generate_text_to_video_wavespeed() function
  2. Use WaveSpeedClient.generate_text_video() or submit_text_to_video() + polling
  3. Support models: hunyuan-video-1.5, ltx-2-pro, ltx-2-fast, ltx-2-retake
  4. Return metadata dict with video_bytes, cost, duration, etc.

Implementation:

async def _generate_text_to_video_wavespeed(
    prompt: str,
    duration: int = 5,
    resolution: str = "720p",
    model: str = "hunyuan-video-1.5/text-to-video",
    negative_prompt: Optional[str] = None,
    seed: Optional[int] = None,
    audio_base64: Optional[str] = None,
    enable_prompt_expansion: bool = True,
    progress_callback: Optional[Callable[[float, str], None]] = None,
    **kwargs
) -> Dict[str, Any]:
    """Generate text-to-video using WaveSpeed models."""
    from services.wavespeed.client import WaveSpeedClient
    
    client = WaveSpeedClient()
    
    # Map model names to full paths
    model_mapping = {
        "hunyuan-video-1.5": "hunyuan-video-1.5/text-to-video",
        "lightricks/ltx-2-pro": "lightricks/ltx-2-pro/text-to-video",
        "lightricks/ltx-2-fast": "lightricks/ltx-2-fast/text-to-video",
        "lightricks/ltx-2-retake": "lightricks/ltx-2-retake/text-to-video",
    }
    full_model = model_mapping.get(model, model)
    
    # Use generate_text_video which handles polling internally
    result = await client.generate_text_video(
        prompt=prompt,
        resolution=resolution,
        duration=duration,
        negative_prompt=negative_prompt,
        seed=seed,
        audio_base64=audio_base64,
        enable_prompt_expansion=enable_prompt_expansion,
        enable_sync_mode=False,  # Use async mode with polling
        timeout=600,  # 10 minutes
    )
    
    return {
        "video_bytes": result["video_bytes"],
        "prompt": prompt,
        "duration": float(duration),
        "model_name": full_model,
        "cost": result.get("cost", 0.0),
        "provider": "wavespeed",
        "resolution": resolution,
        "width": result.get("width", 1280),
        "height": result.get("height", 720),
        "metadata": result.get("metadata", {}),
    }

Phase 2: Refactor VideoStudioService.generate_text_to_video()

File: backend/services/video_studio/video_studio_service.py

Changes:

  1. Replace self.wavespeed_client.generate_video() call with ai_video_generate()
  2. Remove model mapping (handled in unified entry point)
  3. Remove cost calculation (handled in unified entry point)
  4. Add file saving and asset library integration
  5. Preserve existing return format for backward compatibility

Before:

result = await self.wavespeed_client.generate_video(...)  # DOES NOT EXIST

After:

result = ai_video_generate(
    prompt=prompt,
    operation_type="text-to-video",
    provider=provider,
    user_id=user_id,
    duration=duration,
    resolution=resolution,
    negative_prompt=negative_prompt,
    model=model,
    **kwargs
)

# Save file and update asset library
save_result = self._save_video_file(...)

Phase 3: Fix Avatar and Enhancement Methods

Decision Needed:

  • Are avatar generation and video enhancement different enough to warrant separate handling?
  • Or should they be integrated into unified entry point?

Options:

  1. Keep Separate: Create separate unified entry points (ai_avatar_generate(), ai_video_enhance())
  2. Integrate: Add operation_type="avatar" and operation_type="enhance" to ai_video_generate()

Recommendation: Keep separate for now, but ensure they use proper WaveSpeed client methods.

Testing Strategy

Pre-Refactoring

  1. Document current behavior
  2. Identify all call sites
  3. Create test cases for each scenario

Post-Refactoring

  1. Test text-to-video with WaveSpeed models
  2. Test image-to-video (already working)
  3. Verify pre-flight validation works
  4. Verify usage tracking works
  5. Verify file saving works
  6. Verify asset library integration works

Risk Mitigation

  1. Backward Compatibility: Preserve existing return formats
  2. Gradual Migration: Refactor one method at a time
  3. Feature Flags: Consider feature flag for new unified path
  4. Comprehensive Testing: Test all scenarios before deployment

Files to Modify

  1. backend/services/llm_providers/main_video_generation.py

    • Add _generate_text_to_video_wavespeed()
    • Update ai_video_generate() to support WaveSpeed text-to-video
  2. backend/services/video_studio/video_studio_service.py

    • Refactor generate_text_to_video() to use ai_video_generate()
    • Fix generate_avatar() and enhance_video() method calls
  3. backend/routers/video_studio.py

    • Update to use refactored service methods

Success Criteria

  • All video generation goes through unified entry point
  • No redundant code
  • Pre-flight validation works everywhere
  • Usage tracking works everywhere
  • File saving works everywhere
  • Asset library integration works everywhere
  • No breaking changes
  • All existing functionality preserved