5.0 KiB
5.0 KiB
HunyuanVideo-1.5 Text-to-Video Implementation - Complete ✅
Summary
Successfully implemented HunyuanVideo-1.5 text-to-video generation with modular architecture, following separation of concerns principles.
Implementation Details
1. Service Structure ✅
File: backend/services/llm_providers/video_generation/wavespeed_provider.py
HunyuanVideoService: Complete implementation- Model-specific validation (duration: 5, 8, or 10 seconds, resolution: 480p or 720p)
- Based on official API docs: https://wavespeed.ai/docs/docs-api/wavespeed-ai/hunyuan-video-1.5-text-to-video
- Size format conversion (resolution + aspect_ratio → "width*height")
- Cost calculation ($0.02/s for 480p, $0.04/s for 720p)
- Full API integration (submit → poll → download)
- Progress callback support
- Comprehensive error handling
2. Unified Entry Point Integration ✅
File: backend/services/llm_providers/main_video_generation.py
-
_generate_text_to_video_wavespeed(): New async function- Routes to appropriate service based on model
- Handles all parameters
- Returns standardized metadata dict
-
ai_video_generate(): Updated- Now supports WaveSpeed text-to-video
- Default model:
hunyuan-video-1.5 - Async/await properly handled
3. API Integration ✅
Model: wavespeed-ai/hunyuan-video-1.5/text-to-video
Parameters Supported:
- ✅
prompt(required) - ✅
negative_prompt(optional) - ✅
size(auto-calculated from resolution + aspect_ratio) - ✅
duration(5, 8, or 10 seconds) - ✅
seed(optional, default: -1)
Workflow:
- ✅ Submit request to WaveSpeed API
- ✅ Get prediction ID
- ✅ Poll
/api/v3/predictions/{id}/resultwith progress callbacks - ✅ Download video from
outputs[0] - ✅ Return metadata dict
4. Features ✅
- ✅ Pre-flight validation: Subscription limits checked before API calls
- ✅ Usage tracking: Integrated with existing tracking system
- ✅ Progress callbacks: Real-time progress updates (10% → 20-80% → 90% → 100%)
- ✅ Error handling: Comprehensive error messages with prediction_id for resume
- ✅ Cost calculation: Accurate pricing ($0.02/s 480p, $0.04/s 720p)
- ✅ Metadata return: Full metadata including dimensions, cost, prediction_id
5. Size Format Mapping ✅
Resolution → Size Format:
480p+16:9→"832*480"(landscape)480p+9:16→"480*832"(portrait)720p+16:9→"1280*720"(landscape)720p+9:16→"720*1280"(portrait)
6. Validation ✅
HunyuanVideo-1.5 Specific:
- Duration: Must be 5, 8, or 10 seconds (per official API docs)
- Resolution: Must be 480p or 720p (not 1080p)
- Prompt: Required and cannot be empty
Code Structure
backend/services/llm_providers/
├── main_video_generation.py # Unified entry point
│ ├── ai_video_generate() # Main function (async)
│ └── _generate_text_to_video_wavespeed() # WaveSpeed router
│
└── video_generation/ # Modular services
├── base.py # Base classes
└── wavespeed_provider.py # WaveSpeed services
├── BaseWaveSpeedTextToVideoService # Base class
├── HunyuanVideoService # ✅ Implemented
└── get_wavespeed_text_to_video_service() # Factory
Usage Example
from services.llm_providers.main_video_generation import ai_video_generate
result = await ai_video_generate(
prompt="A tiny robot hiking across a kitchen table",
operation_type="text-to-video",
provider="wavespeed",
model="hunyuan-video-1.5",
duration=5,
resolution="720p",
user_id="user123",
progress_callback=lambda progress, msg: print(f"{progress}%: {msg}")
)
video_bytes = result["video_bytes"]
cost = result["cost"] # $0.20 for 5s @ 720p
Testing Checklist
- Test with valid prompt
- Test with 5-second duration
- Test with 8-second duration
- Test with 10-second duration
- Test with 480p resolution
- Test with 720p resolution
- Test with negative_prompt
- Test with seed
- Test progress callbacks
- Test error handling (invalid duration)
- Test error handling (invalid resolution)
- Test cost calculation
- Test metadata return
Next Steps
- ✅ HunyuanVideo-1.5: Complete
- ⏳ LTX-2 Pro: Pending documentation
- ⏳ LTX-2 Fast: Pending documentation
- ⏳ LTX-2 Retake: Pending documentation
Notes
- Audio support: Not supported by HunyuanVideo-1.5 (ignored with warning)
- Prompt expansion: Not supported by HunyuanVideo-1.5 (ignored with warning)
- Aspect ratio: Used for size calculation (landscape vs portrait)
- Polling interval: 0.5 seconds (as per example code)
- Timeout: 10 minutes maximum
Ready for Testing ✅
The implementation is complete and ready for testing. All features are implemented following the modular architecture with separation of concerns.