feat: voice clone audio generation + podcast workspace architecture

- Voice clone integration: When user selects voice clone in Write phase, backend uses their uploaded voice sample + scene script text to generate audio via qwen3/minimax/cosyvoice voice clone APIs - Multi-tenant workspace storage: All podcast assets (audio, video, images, charts) now use workspace-specific directories per user - Chart preview improvements: Card-based B-Roll charts UI with thumbnails, takeaway text, and action buttons; public endpoint for image serving - Voice clone caching: In-memory LRU cache for voice samples (avoids re-downloading per scene); frontend caches voice clone metadata - Thread pool for voice clone: Audio generation uses ThreadPoolExecutor to avoid blocking the FastAPI event loop - Auto-detect voice clone IDs (vc_*, MY_VOICE_CLONE) to route correctly - DB fallback for voice sample URL: Fetches from ContentAsset if not passed - Fixed API URL resolution for chart previews - Fixed GlassyCard DOM warnings for motion props - Fixed ScriptGenerationProgressView syntax error - Fixed usePodcastWorkflow scriptData reference
2026-04-21 19:38:50 +05:30
parent 7637babd7d
commit 91b2f996fd
33 changed files with 1642 additions and 457 deletions
--- a/backend/api/podcast/models.py
+++ b/backend/api/podcast/models.py
@@ -223,6 +223,9 @@ class PodcastAudioRequest(BaseModel):
    text: str
    voice_id: Optional[str] = "Wise_Woman"
    custom_voice_id: Optional[str] = None  # Voice clone ID for custom voice
+    use_voice_clone: Optional[bool] = False  # If True, use voice clone with voice_sample_url
+    voice_sample_url: Optional[str] = None  # URL to user's voice sample for cloning
+    voice_clone_engine: Optional[str] = None  # Engine: "qwen3", "minimax", "cosyvoice"
    speed: Optional[float] = 1.0
    volume: Optional[float] = 1.0
    pitch: Optional[float] = 0.0