# Video Studio: Model Documentation Needed **Last Updated**: Current Session **Purpose**: Track which AI model documentation is needed to complete immediate next steps --- ## Immediate Next Steps (1-2 Weeks) ### 1. Complete Enhance Studio Frontend ### 2. Add Remaining Text-to-Video Models ### 3. Add Image-to-Video Alternatives --- ## Required Model Documentation ### Priority 1: Enhance Studio Models ⚠️ **URGENT** #### 1. **FlashVSR (Video Upscaling)** ✅ **RECEIVED** - **Model**: `wavespeed-ai/flashvsr` - **Purpose**: Video super-resolution and upscaling - **Use Case**: Enhance Studio - upscale videos from 480p/720p to 1080p/4K - **Status**: ✅ Documentation received, implementation in progress - **Documentation**: https://wavespeed.ai/docs/docs-api/wavespeed-ai/flashvsr - **Implementation Notes**: - Endpoint: `https://api.wavespeed.ai/api/v3/wavespeed-ai/flashvsr` - Input: `video` (base64 or URL), `target_resolution` ("720p", "1080p", "2k", "4k") - Pricing: $0.06-$0.16 per 5 seconds (based on resolution) - Max clip length: 10 minutes - Processing: 3-20 seconds wall time per 1 second of video #### 2. **Video Extend/Outpaint** ✅ **RECEIVED & IMPLEMENTED** - **Models**: - `alibaba/wan-2.5/video-extend` (Full Featured) - `wavespeed-ai/wan-2.2-spicy/video-extend` (Fast & Affordable) - `bytedance/seedance-v1.5-pro/video-extend` (Advanced) - **Purpose**: Extend video duration with motion/audio continuity - **Use Case**: Extend Studio - extend short clips into longer videos - **Status**: ✅ Documentation received, all three models implemented with model selector and comparison UI - **Documentation**: - WAN 2.5: https://wavespeed.ai/docs/docs-api/alibaba/alibaba-wan-2.5-video-extend - WAN 2.2 Spicy: https://wavespeed.ai/docs/docs-api/wavespeed-ai/wan-2.2-spicy/video-extend - Seedance 1.5 Pro: https://wavespeed.ai/docs/docs-api/bytedance/seedance-v1.5-pro/video-extend - **Implementation Notes**: - **WAN 2.5**: Full featured model - Endpoint: `https://api.wavespeed.ai/api/v3/alibaba/wan-2.5/video-extend` - Required: `video`, `prompt` - Optional: `audio` (URL, ≤15MB, 3-30s), `negative_prompt`, `resolution` (480p/720p/1080p), `duration` (3-10s), `enable_prompt_expansion`, `seed` - Pricing: $0.05/s (480p), $0.10/s (720p), $0.15/s (1080p) - Audio handling: If audio > video length, only first segment used; if audio < video length, remaining is silent; if no audio, can auto-generate - Multilingual: Supports Chinese and English prompts - **WAN 2.2 Spicy**: Fast and affordable model - Endpoint: `https://api.wavespeed.ai/api/v3/wavespeed-ai/wan-2.2-spicy/video-extend` - Required: `video`, `prompt` - Optional: `resolution` (480p/720p only), `duration` (5 or 8s only), `seed` - Pricing: $0.03/s (480p), $0.06/s (720p) - **Most affordable option** - No audio, negative prompt, or prompt expansion support - Simpler API for quick extensions - Optimized for expressive visuals, smooth temporal coherence, and cinematic color - **Seedance 1.5 Pro**: Advanced model with unique features - Endpoint: `https://api.wavespeed.ai/api/v3/bytedance/seedance-v1.5-pro/video-extend` - Required: `video`, `prompt` - Optional: `resolution` (480p/720p only), `duration` (4-12s), `generate_audio` (boolean, default true), `camera_fixed` (boolean, default false), `seed` - Pricing (with audio): $0.024/s (480p), $0.052/s (720p) - Pricing (without audio): $0.012/s (480p), $0.026/s (720p) - **Audio generation doubles the cost** - disable for budget-friendly extensions - Unique features: Auto audio generation, camera position control - No audio upload, negative prompt, or prompt expansion support - Ideal for ad creatives and short dramas - Natural motion continuation, stable aesthetics, upscaled output - Best practices: Use clean input videos, keep prompts specific but short, start with 5s to validate --- ### Priority 2: Additional Text-to-Video Models #### 3. **LTX-2 Fast** - **Model**: `lightricks/ltx-2-fast/text-to-video` - **Purpose**: Fast draft generation for quick iterations - **Use Case**: Create Studio - quick previews, draft mode - **Documentation Needed**: - API endpoint - Input parameters (prompt, duration, resolution, aspect ratio) - Speed/latency characteristics - Quality trade-offs vs LTX-2 Pro - Pricing (likely lower than Pro) - Supported resolutions and durations - **WaveSpeed Link**: https://wavespeed.ai/models/lightricks/ltx-2-fast/text-to-video - **Status**: Mentioned in plan, TODO in code (`# "lightricks/ltx-2-fast": LTX2FastService`) #### 4. **LTX-2 Retake** - **Model**: `lightricks/ltx-2-retake` - **Purpose**: Regenerate/retake videos with variations - **Use Case**: Create Studio - regeneration workflows, variations - **Documentation Needed**: - API endpoint - How it differs from initial generation - Seed/prompt variation parameters - Pricing (likely similar to LTX-2 Pro) - Use cases and best practices - **WaveSpeed Link**: Check for `lightricks/ltx-2-retake` documentation - **Status**: Mentioned in plan, TODO in code (`# "lightricks/ltx-2-retake": LTX2RetakeService`) --- ### Priority 3: Image-to-Video Alternatives #### 5. **Kandinsky 5 Pro Image-to-Video** - **Model**: `wavespeed-ai/kandinsky5-pro/image-to-video` - **Purpose**: Alternative image-to-video model - **Use Case**: Create Studio - image-to-video with different quality/style - **Documentation Needed**: - API endpoint - Input parameters (image, prompt, duration, resolution) - Quality characteristics vs WAN 2.5 - Pricing structure - Supported resolutions (512p/1024p mentioned in plan) - Duration limits - Best use cases - **WaveSpeed Link**: https://wavespeed.ai/models/wavespeed-ai/kandinsky5-pro/image-to-video - **Note**: Plan mentions 5s MP4, 512p/1024p, ~$0.20/0.60 per run --- ## Currently Implemented Models ✅ These models are already implemented and working: - ✅ **HunyuanVideo-1.5** (`wavespeed-ai/hunyuan-video-1.5/text-to-video`) - ✅ **LTX-2 Pro** (`lightricks/ltx-2-pro/text-to-video`) - ✅ **Google Veo 3.1** (`google/veo3.1/text-to-video`) - ✅ **Hunyuan Avatar** (`wavespeed-ai/hunyuan-avatar`) - ✅ **InfiniteTalk** (`wavespeed-ai/infinitetalk`) - ✅ **WAN 2.5** (text-to-video and image-to-video via unified generation) --- ## Documentation Request Format For each model, please provide: 1. **API Documentation Link** (WaveSpeed model page) 2. **Input Schema**: - Required parameters - Optional parameters - Parameter types and constraints - Default values 3. **Output Schema**: - Response format - File URLs or data format - Metadata returned 4. **Pricing Information**: - Cost per second/run - Resolution-based pricing - Duration limits and pricing 5. **Capabilities**: - Supported resolutions - Duration limits - Aspect ratios - Special features (audio, style, etc.) 6. **Example Requests/Responses**: - cURL examples - Python examples - Response samples --- ## Implementation Priority ### Week 1 Focus: 1. **FlashVSR** - Critical for Enhance Studio frontend 2. **LTX-2 Fast** - Quick to implement (similar to LTX-2 Pro) ### Week 2 Focus: 3. **LTX-2 Retake** - Complete LTX-2 suite 4. **Kandinsky 5 Pro** - Image-to-video alternative ### Future (Phase 3): 5. **Video-extend** - For Enhance Studio temporal features 6. Other enhancement models as needed --- ## Notes - All models should follow the same pattern as existing implementations - Use `BaseWaveSpeedTextToVideoService` or similar base classes - Integrate into `main_video_generation.py` unified entry point - Add to model selector in frontend with education system - Ensure cost estimation and preflight validation work correctly