ALwrity/docs/Video Studio/VIDEO_STUDIO_MODEL_DOCUMENTATION_NEEDED.md

# Video Studio: Model Documentation Needed

**Last Updated**: Current Session
**Purpose**: Track which AI model documentation is needed to complete immediate next steps

---

## Immediate Next Steps (1-2 Weeks)

### 1. Complete Enhance Studio Frontend
### 2. Add Remaining Text-to-Video Models
### 3. Add Image-to-Video Alternatives

---

## Required Model Documentation

### Priority 1: Enhance Studio Models ⚠️ **URGENT**

#### 1. **FlashVSR (Video Upscaling)** ✅ **RECEIVED**
- **Model**: `wavespeed-ai/flashvsr`
- **Purpose**: Video super-resolution and upscaling
- **Use Case**: Enhance Studio - upscale videos from 480p/720p to 1080p/4K
- **Status**: ✅ Documentation received, implementation in progress
- **Documentation**: https://wavespeed.ai/docs/docs-api/wavespeed-ai/flashvsr
- **Implementation Notes**:
  - Endpoint: `https://api.wavespeed.ai/api/v3/wavespeed-ai/flashvsr`
  - Input: `video` (base64 or URL), `target_resolution` ("720p", "1080p", "2k", "4k")
  - Pricing: $0.06-$0.16 per 5 seconds (based on resolution)
  - Max clip length: 10 minutes
  - Processing: 3-20 seconds wall time per 1 second of video

#### 2. **Video Extend/Outpaint** ✅ **RECEIVED & IMPLEMENTED**
- **Models**:
  - `alibaba/wan-2.5/video-extend` (Full Featured)
  - `wavespeed-ai/wan-2.2-spicy/video-extend` (Fast & Affordable)
  - `bytedance/seedance-v1.5-pro/video-extend` (Advanced)
- **Purpose**: Extend video duration with motion/audio continuity
- **Use Case**: Extend Studio - extend short clips into longer videos
- **Status**: ✅ Documentation received, all three models implemented with model selector and comparison UI
- **Documentation**:
  - WAN 2.5: https://wavespeed.ai/docs/docs-api/alibaba/alibaba-wan-2.5-video-extend
  - WAN 2.2 Spicy: https://wavespeed.ai/docs/docs-api/wavespeed-ai/wan-2.2-spicy/video-extend
  - Seedance 1.5 Pro: https://wavespeed.ai/docs/docs-api/bytedance/seedance-v1.5-pro/video-extend
- **Implementation Notes**:
  - **WAN 2.5**: Full featured model
    - Endpoint: `https://api.wavespeed.ai/api/v3/alibaba/wan-2.5/video-extend`
    - Required: `video`, `prompt`
    - Optional: `audio` (URL, ≤15MB, 3-30s), `negative_prompt`, `resolution` (480p/720p/1080p), `duration` (3-10s), `enable_prompt_expansion`, `seed`
    - Pricing: $0.05/s (480p), $0.10/s (720p), $0.15/s (1080p)
    - Audio handling: If audio > video length, only first segment used; if audio < video length, remaining is silent; if no audio, can auto-generate
    - Multilingual: Supports Chinese and English prompts
  - **WAN 2.2 Spicy**: Fast and affordable model
    - Endpoint: `https://api.wavespeed.ai/api/v3/wavespeed-ai/wan-2.2-spicy/video-extend`
    - Required: `video`, `prompt`
    - Optional: `resolution` (480p/720p only), `duration` (5 or 8s only), `seed`
    - Pricing: $0.03/s (480p), $0.06/s (720p) - **Most affordable option**
    - No audio, negative prompt, or prompt expansion support
    - Simpler API for quick extensions
    - Optimized for expressive visuals, smooth temporal coherence, and cinematic color
  - **Seedance 1.5 Pro**: Advanced model with unique features
    - Endpoint: `https://api.wavespeed.ai/api/v3/bytedance/seedance-v1.5-pro/video-extend`
    - Required: `video`, `prompt`
    - Optional: `resolution` (480p/720p only), `duration` (4-12s), `generate_audio` (boolean, default true), `camera_fixed` (boolean, default false), `seed`
    - Pricing (with audio): $0.024/s (480p), $0.052/s (720p)
    - Pricing (without audio): $0.012/s (480p), $0.026/s (720p)
    - **Audio generation doubles the cost** - disable for budget-friendly extensions
    - Unique features: Auto audio generation, camera position control
    - No audio upload, negative prompt, or prompt expansion support
    - Ideal for ad creatives and short dramas
    - Natural motion continuation, stable aesthetics, upscaled output
    - Best practices: Use clean input videos, keep prompts specific but short, start with 5s to validate

---

### Priority 2: Additional Text-to-Video Models

#### 3. **LTX-2 Fast**
- **Model**: `lightricks/ltx-2-fast/text-to-video`
- **Purpose**: Fast draft generation for quick iterations
- **Use Case**: Create Studio - quick previews, draft mode
- **Documentation Needed**:
  - API endpoint
  - Input parameters (prompt, duration, resolution, aspect ratio)
  - Speed/latency characteristics
  - Quality trade-offs vs LTX-2 Pro
  - Pricing (likely lower than Pro)
  - Supported resolutions and durations
- **WaveSpeed Link**: https://wavespeed.ai/models/lightricks/ltx-2-fast/text-to-video
- **Status**: Mentioned in plan, TODO in code (`# "lightricks/ltx-2-fast": LTX2FastService`)

#### 4. **LTX-2 Retake**
- **Model**: `lightricks/ltx-2-retake`
- **Purpose**: Regenerate/retake videos with variations
- **Use Case**: Create Studio - regeneration workflows, variations
- **Documentation Needed**:
  - API endpoint
  - How it differs from initial generation
  - Seed/prompt variation parameters
  - Pricing (likely similar to LTX-2 Pro)
  - Use cases and best practices
- **WaveSpeed Link**: Check for `lightricks/ltx-2-retake` documentation
- **Status**: Mentioned in plan, TODO in code (`# "lightricks/ltx-2-retake": LTX2RetakeService`)

---

### Priority 3: Image-to-Video Alternatives

#### 5. **Kandinsky 5 Pro Image-to-Video**
- **Model**: `wavespeed-ai/kandinsky5-pro/image-to-video`
- **Purpose**: Alternative image-to-video model
- **Use Case**: Create Studio - image-to-video with different quality/style
- **Documentation Needed**:
  - API endpoint
  - Input parameters (image, prompt, duration, resolution)
  - Quality characteristics vs WAN 2.5
  - Pricing structure
  - Supported resolutions (512p/1024p mentioned in plan)
  - Duration limits
  - Best use cases
- **WaveSpeed Link**: https://wavespeed.ai/models/wavespeed-ai/kandinsky5-pro/image-to-video
- **Note**: Plan mentions 5s MP4, 512p/1024p, ~$0.20/0.60 per run

---

## Currently Implemented Models ✅

These models are already implemented and working:
- ✅ **HunyuanVideo-1.5** (`wavespeed-ai/hunyuan-video-1.5/text-to-video`)
- ✅ **LTX-2 Pro** (`lightricks/ltx-2-pro/text-to-video`)
- ✅ **Google Veo 3.1** (`google/veo3.1/text-to-video`)
- ✅ **Hunyuan Avatar** (`wavespeed-ai/hunyuan-avatar`)
- ✅ **InfiniteTalk** (`wavespeed-ai/infinitetalk`)
- ✅ **WAN 2.5** (text-to-video and image-to-video via unified generation)

---

## Documentation Request Format

For each model, please provide:

1. **API Documentation Link** (WaveSpeed model page)
2. **Input Schema**:
   - Required parameters
   - Optional parameters
   - Parameter types and constraints
   - Default values
3. **Output Schema**:
   - Response format
   - File URLs or data format
   - Metadata returned
4. **Pricing Information**:
   - Cost per second/run
   - Resolution-based pricing
   - Duration limits and pricing
5. **Capabilities**:
   - Supported resolutions
   - Duration limits
   - Aspect ratios
   - Special features (audio, style, etc.)
6. **Example Requests/Responses**:
   - cURL examples
   - Python examples
   - Response samples

---

## Implementation Priority

### Week 1 Focus:
1. **FlashVSR** - Critical for Enhance Studio frontend
2. **LTX-2 Fast** - Quick to implement (similar to LTX-2 Pro)

### Week 2 Focus:
3. **LTX-2 Retake** - Complete LTX-2 suite
4. **Kandinsky 5 Pro** - Image-to-video alternative

### Future (Phase 3):
5. **Video-extend** - For Enhance Studio temporal features
6. Other enhancement models as needed

---

## Notes

- All models should follow the same pattern as existing implementations
- Use `BaseWaveSpeedTextToVideoService` or similar base classes
- Integrate into `main_video_generation.py` unified entry point
- Add to model selector in frontend with education system
- Ensure cost estimation and preflight validation work correctly