Base code

2026-01-08 22:39:53 +07:00
parent 697115c61a
commit c35fa52117
2169 changed files with 626670 additions and 0 deletions
--- a/docs/HUNYUAN_VIDEO_IMPLEMENTATION_COMPLETE.md
+++ b/docs/HUNYUAN_VIDEO_IMPLEMENTATION_COMPLETE.md
@@ -0,0 +1,147 @@
+# HunyuanVideo-1.5 Text-to-Video Implementation - Complete ✅
+
+## Summary
+
+Successfully implemented HunyuanVideo-1.5 text-to-video generation with modular architecture, following separation of concerns principles.
+
+## Implementation Details
+
+### 1. Service Structure ✅
+
+**File**: `backend/services/llm_providers/video_generation/wavespeed_provider.py`
+
+- **`HunyuanVideoService`**: Complete implementation
+  - Model-specific validation (duration: 5, 8, or 10 seconds, resolution: 480p or 720p)
+  - Based on official API docs: https://wavespeed.ai/docs/docs-api/wavespeed-ai/hunyuan-video-1.5-text-to-video
+  - Size format conversion (resolution + aspect_ratio → "width*height")
+  - Cost calculation ($0.02/s for 480p, $0.04/s for 720p)
+  - Full API integration (submit → poll → download)
+  - Progress callback support
+  - Comprehensive error handling
+
+### 2. Unified Entry Point Integration ✅
+
+**File**: `backend/services/llm_providers/main_video_generation.py`
+
+- **`_generate_text_to_video_wavespeed()`**: New async function
+  - Routes to appropriate service based on model
+  - Handles all parameters
+  - Returns standardized metadata dict
+
+- **`ai_video_generate()`**: Updated
+  - Now supports WaveSpeed text-to-video
+  - Default model: `hunyuan-video-1.5`
+  - Async/await properly handled
+
+### 3. API Integration ✅
+
+**Model**: `wavespeed-ai/hunyuan-video-1.5/text-to-video`
+
+**Parameters Supported**:
+- ✅ `prompt` (required)
+- ✅ `negative_prompt` (optional)
+- ✅ `size` (auto-calculated from resolution + aspect_ratio)
+- ✅ `duration` (5, 8, or 10 seconds)
+- ✅ `seed` (optional, default: -1)
+
+**Workflow**:
+1. ✅ Submit request to WaveSpeed API
+2. ✅ Get prediction ID
+3. ✅ Poll `/api/v3/predictions/{id}/result` with progress callbacks
+4. ✅ Download video from `outputs[0]`
+5. ✅ Return metadata dict
+
+### 4. Features ✅
+
+- ✅ **Pre-flight validation**: Subscription limits checked before API calls
+- ✅ **Usage tracking**: Integrated with existing tracking system
+- ✅ **Progress callbacks**: Real-time progress updates (10% → 20-80% → 90% → 100%)
+- ✅ **Error handling**: Comprehensive error messages with prediction_id for resume
+- ✅ **Cost calculation**: Accurate pricing ($0.02/s 480p, $0.04/s 720p)
+- ✅ **Metadata return**: Full metadata including dimensions, cost, prediction_id
+
+### 5. Size Format Mapping ✅
+
+**Resolution → Size Format**:
+- `480p` + `16:9` → `"832*480"` (landscape)
+- `480p` + `9:16` → `"480*832"` (portrait)
+- `720p` + `16:9` → `"1280*720"` (landscape)
+- `720p` + `9:16` → `"720*1280"` (portrait)
+
+### 6. Validation ✅
+
+**HunyuanVideo-1.5 Specific**:
+- Duration: Must be 5, 8, or 10 seconds (per official API docs)
+- Resolution: Must be 480p or 720p (not 1080p)
+- Prompt: Required and cannot be empty
+
+## Code Structure
+
+```
+backend/services/llm_providers/
+├── main_video_generation.py          # Unified entry point
+│   ├── ai_video_generate()          # Main function (async)
+│   └── _generate_text_to_video_wavespeed()  # WaveSpeed router
+│
+└── video_generation/                # Modular services
+    ├── base.py                       # Base classes
+    └── wavespeed_provider.py         # WaveSpeed services
+        ├── BaseWaveSpeedTextToVideoService  # Base class
+        ├── HunyuanVideoService      # ✅ Implemented
+        └── get_wavespeed_text_to_video_service()  # Factory
+```
+
+## Usage Example
+
+```python
+from services.llm_providers.main_video_generation import ai_video_generate
+
+result = await ai_video_generate(
+    prompt="A tiny robot hiking across a kitchen table",
+    operation_type="text-to-video",
+    provider="wavespeed",
+    model="hunyuan-video-1.5",
+    duration=5,
+    resolution="720p",
+    user_id="user123",
+    progress_callback=lambda progress, msg: print(f"{progress}%: {msg}")
+)
+
+video_bytes = result["video_bytes"]
+cost = result["cost"]  # $0.20 for 5s @ 720p
+```
+
+## Testing Checklist
+
+- [ ] Test with valid prompt
+- [ ] Test with 5-second duration
+- [ ] Test with 8-second duration
+- [ ] Test with 10-second duration
+- [ ] Test with 480p resolution
+- [ ] Test with 720p resolution
+- [ ] Test with negative_prompt
+- [ ] Test with seed
+- [ ] Test progress callbacks
+- [ ] Test error handling (invalid duration)
+- [ ] Test error handling (invalid resolution)
+- [ ] Test cost calculation
+- [ ] Test metadata return
+
+## Next Steps
+
+1. ✅ **HunyuanVideo-1.5**: Complete
+2. ⏳ **LTX-2 Pro**: Pending documentation
+3. ⏳ **LTX-2 Fast**: Pending documentation
+4. ⏳ **LTX-2 Retake**: Pending documentation
+
+## Notes
+
+- **Audio support**: Not supported by HunyuanVideo-1.5 (ignored with warning)
+- **Prompt expansion**: Not supported by HunyuanVideo-1.5 (ignored with warning)
+- **Aspect ratio**: Used for size calculation (landscape vs portrait)
+- **Polling interval**: 0.5 seconds (as per example code)
+- **Timeout**: 10 minutes maximum
+
+## Ready for Testing ✅
+
+The implementation is complete and ready for testing. All features are implemented following the modular architecture with separation of concerns.