133 lines
4.2 KiB
Markdown
133 lines
4.2 KiB
Markdown
# Text-to-Video Implementation Plan - Phase 1
|
|
|
|
## Goal
|
|
Implement WaveSpeed text-to-video support in the unified `ai_video_generate()` entry point with modular, maintainable code structure.
|
|
|
|
## Proposed Architecture
|
|
|
|
### Modular Structure (Following Image Generation Pattern)
|
|
|
|
```
|
|
backend/services/llm_providers/
|
|
├── main_video_generation.py # Unified entry point (already exists)
|
|
└── video_generation/ # NEW: Modular video generation services
|
|
├── __init__.py
|
|
├── base.py # Base classes/interfaces
|
|
└── wavespeed_provider.py # WaveSpeed text-to-video models
|
|
├── HunyuanVideoService # HunyuanVideo-1.5
|
|
├── LTX2ProService # LTX-2 Pro
|
|
├── LTX2FastService # LTX-2 Fast
|
|
└── LTX2RetakeService # LTX-2 Retake
|
|
```
|
|
|
|
### Implementation Strategy
|
|
|
|
**Step 1: Create Base Structure**
|
|
- Create `video_generation/` directory
|
|
- Create `base.py` with base classes/interfaces
|
|
- Create `wavespeed_provider.py` with service classes
|
|
|
|
**Step 2: Implement First Model (HunyuanVideo-1.5)**
|
|
- Create `HunyuanVideoService` class
|
|
- Implement model-specific logic
|
|
- Add progress callback support
|
|
- Return metadata dict
|
|
|
|
**Step 3: Integrate into Unified Entry Point**
|
|
- Add `_generate_text_to_video_wavespeed()` function
|
|
- Route to appropriate service based on model
|
|
- Handle async/sync properly
|
|
|
|
**Step 4: Test and Validate**
|
|
- Test with one model
|
|
- Verify all features work
|
|
- Ensure backward compatibility
|
|
|
|
**Step 5: Add Remaining Models**
|
|
- Follow same pattern for LTX-2 Pro, Fast, Retake
|
|
- Reuse common logic
|
|
- Model-specific differences only
|
|
|
|
## Model Selection
|
|
|
|
**Recommended Starting Model:** **HunyuanVideo-1.5**
|
|
- Most commonly used
|
|
- Good documentation availability
|
|
- Standard parameters
|
|
|
|
**Alternative:** Any model you prefer - we'll follow the same pattern.
|
|
|
|
## Service Class Structure
|
|
|
|
```python
|
|
class HunyuanVideoService:
|
|
"""Service for HunyuanVideo-1.5 text-to-video generation."""
|
|
|
|
MODEL_PATH = "wavespeed-ai/hunyuan-video-1.5/text-to-video"
|
|
MODEL_NAME = "hunyuan-video-1.5"
|
|
|
|
def __init__(self, client: Optional[WaveSpeedClient] = None):
|
|
self.client = client or WaveSpeedClient()
|
|
|
|
async def generate_video(
|
|
self,
|
|
prompt: str,
|
|
duration: int = 5,
|
|
resolution: str = "720p",
|
|
negative_prompt: Optional[str] = None,
|
|
seed: Optional[int] = None,
|
|
audio_base64: Optional[str] = None,
|
|
enable_prompt_expansion: bool = True,
|
|
progress_callback: Optional[Callable[[float, str], None]] = None,
|
|
**kwargs
|
|
) -> Dict[str, Any]:
|
|
"""
|
|
Generate video using HunyuanVideo-1.5.
|
|
|
|
Returns:
|
|
Dict with video_bytes, prompt, duration, model_name, cost, etc.
|
|
"""
|
|
# 1. Validate inputs
|
|
# 2. Build payload
|
|
# 3. Submit to WaveSpeed
|
|
# 4. Poll with progress callbacks
|
|
# 5. Download video
|
|
# 6. Return metadata dict
|
|
```
|
|
|
|
## Integration Points
|
|
|
|
### Unified Entry Point
|
|
```python
|
|
# In main_video_generation.py
|
|
async def _generate_text_to_video_wavespeed(
|
|
prompt: str,
|
|
model: str = "hunyuan-video-1.5",
|
|
progress_callback: Optional[Callable[[float, str], None]] = None,
|
|
**kwargs
|
|
) -> Dict[str, Any]:
|
|
"""Route to appropriate WaveSpeed text-to-video service."""
|
|
from .video_generation.wavespeed_provider import get_wavespeed_text_to_video_service
|
|
|
|
service = get_wavespeed_text_to_video_service(model)
|
|
return await service.generate_video(
|
|
prompt=prompt,
|
|
progress_callback=progress_callback,
|
|
**kwargs
|
|
)
|
|
```
|
|
|
|
## Next Steps
|
|
|
|
1. **Wait for Model Documentation** - You'll provide documentation for the first model
|
|
2. **Create Base Structure** - Set up directory and base classes
|
|
3. **Implement First Model** - HunyuanVideo-1.5 (or your chosen model)
|
|
4. **Test** - Verify functionality
|
|
5. **Add Remaining Models** - Follow same pattern
|
|
|
|
## Questions
|
|
|
|
1. **Which model should we start with?** (Recommended: HunyuanVideo-1.5)
|
|
2. **Do you have the model documentation ready?** (API endpoints, parameters, response format)
|
|
3. **Any specific requirements for the first model?** (Parameters, features, etc.)
|