Files
moreminimore-marketing/docs/HUNYUAN_VIDEO_IMPLEMENTATION_COMPLETE.md
Kunthawat Greethong c35fa52117 Base code
2026-01-08 22:39:53 +07:00

5.0 KiB

HunyuanVideo-1.5 Text-to-Video Implementation - Complete

Summary

Successfully implemented HunyuanVideo-1.5 text-to-video generation with modular architecture, following separation of concerns principles.

Implementation Details

1. Service Structure

File: backend/services/llm_providers/video_generation/wavespeed_provider.py

  • HunyuanVideoService: Complete implementation
    • Model-specific validation (duration: 5, 8, or 10 seconds, resolution: 480p or 720p)
    • Based on official API docs: https://wavespeed.ai/docs/docs-api/wavespeed-ai/hunyuan-video-1.5-text-to-video
    • Size format conversion (resolution + aspect_ratio → "width*height")
    • Cost calculation ($0.02/s for 480p, $0.04/s for 720p)
    • Full API integration (submit → poll → download)
    • Progress callback support
    • Comprehensive error handling

2. Unified Entry Point Integration

File: backend/services/llm_providers/main_video_generation.py

  • _generate_text_to_video_wavespeed(): New async function

    • Routes to appropriate service based on model
    • Handles all parameters
    • Returns standardized metadata dict
  • ai_video_generate(): Updated

    • Now supports WaveSpeed text-to-video
    • Default model: hunyuan-video-1.5
    • Async/await properly handled

3. API Integration

Model: wavespeed-ai/hunyuan-video-1.5/text-to-video

Parameters Supported:

  • prompt (required)
  • negative_prompt (optional)
  • size (auto-calculated from resolution + aspect_ratio)
  • duration (5, 8, or 10 seconds)
  • seed (optional, default: -1)

Workflow:

  1. Submit request to WaveSpeed API
  2. Get prediction ID
  3. Poll /api/v3/predictions/{id}/result with progress callbacks
  4. Download video from outputs[0]
  5. Return metadata dict

4. Features

  • Pre-flight validation: Subscription limits checked before API calls
  • Usage tracking: Integrated with existing tracking system
  • Progress callbacks: Real-time progress updates (10% → 20-80% → 90% → 100%)
  • Error handling: Comprehensive error messages with prediction_id for resume
  • Cost calculation: Accurate pricing ($0.02/s 480p, $0.04/s 720p)
  • Metadata return: Full metadata including dimensions, cost, prediction_id

5. Size Format Mapping

Resolution → Size Format:

  • 480p + 16:9"832*480" (landscape)
  • 480p + 9:16"480*832" (portrait)
  • 720p + 16:9"1280*720" (landscape)
  • 720p + 9:16"720*1280" (portrait)

6. Validation

HunyuanVideo-1.5 Specific:

  • Duration: Must be 5, 8, or 10 seconds (per official API docs)
  • Resolution: Must be 480p or 720p (not 1080p)
  • Prompt: Required and cannot be empty

Code Structure

backend/services/llm_providers/
├── main_video_generation.py          # Unified entry point
│   ├── ai_video_generate()          # Main function (async)
│   └── _generate_text_to_video_wavespeed()  # WaveSpeed router
│
└── video_generation/                # Modular services
    ├── base.py                       # Base classes
    └── wavespeed_provider.py         # WaveSpeed services
        ├── BaseWaveSpeedTextToVideoService  # Base class
        ├── HunyuanVideoService      # ✅ Implemented
        └── get_wavespeed_text_to_video_service()  # Factory

Usage Example

from services.llm_providers.main_video_generation import ai_video_generate

result = await ai_video_generate(
    prompt="A tiny robot hiking across a kitchen table",
    operation_type="text-to-video",
    provider="wavespeed",
    model="hunyuan-video-1.5",
    duration=5,
    resolution="720p",
    user_id="user123",
    progress_callback=lambda progress, msg: print(f"{progress}%: {msg}")
)

video_bytes = result["video_bytes"]
cost = result["cost"]  # $0.20 for 5s @ 720p

Testing Checklist

  • Test with valid prompt
  • Test with 5-second duration
  • Test with 8-second duration
  • Test with 10-second duration
  • Test with 480p resolution
  • Test with 720p resolution
  • Test with negative_prompt
  • Test with seed
  • Test progress callbacks
  • Test error handling (invalid duration)
  • Test error handling (invalid resolution)
  • Test cost calculation
  • Test metadata return

Next Steps

  1. HunyuanVideo-1.5: Complete
  2. LTX-2 Pro: Pending documentation
  3. LTX-2 Fast: Pending documentation
  4. LTX-2 Retake: Pending documentation

Notes

  • Audio support: Not supported by HunyuanVideo-1.5 (ignored with warning)
  • Prompt expansion: Not supported by HunyuanVideo-1.5 (ignored with warning)
  • Aspect ratio: Used for size calculation (landscape vs portrait)
  • Polling interval: 0.5 seconds (as per example code)
  • Timeout: 10 minutes maximum

Ready for Testing

The implementation is complete and ready for testing. All features are implemented following the modular architecture with separation of concerns.