- Fix text selection menu not showing: wire contentRef via inputRef on multiline TextField
- Fix blog title not truncating: add min-w-0 for flex item overflow
- Fix outline generation 500: escape curly braces in f-string prompt template
- Fix content generation 'NoneType not callable': replace SessionLocal() with get_session_for_user(), add db param to MediumBlogGenerator, fix signature mismatch in database_task_manager
- Fix writing assistant suggest 500: add auth + user_id to API endpoint and service, replace sync requests with httpx.AsyncClient
- Fix hallucination detector 404: explicitly include router in main.py and app.py
- Fix missing error_data in task failure responses
- Hide CopilotKit web inspector button
- Remove hardcoded fallback suggestions from SmartTypingAssist
- Fix stale closure refs in SmartTypingAssist handleTypingChange
- Add two-column editor layout, stats bar, section hover menu
- Various subscription, billing, and research module improvements
- Move voice clone cache from module-level memory to localStorage
so it survives page refresh and works across browser tabs
- VoiceAvatarPlaceholder now syncs clone result to localStorage
immediately after creation (both design and clone paths)
- usePodcastProjectState auto-merges voice clone cache into project
knobs when loading a project (fills gap for projects created
before voice clone or when voice clone was created after)
- CreateModal now detects voice clone IDs by prefix (vc_*) not
just by VOICE_CLONE_ID constant, fixing the mismatch where
VoiceSelector passes the actual clone ID but CreateModal
expected the placeholder ID
- AudioRegenerateModal is intentionally per-scene override and
does not write back to knobs (by design)
- trends.py handler added for podcast topic trend analysis
Frontend Changes:
- Add scene numbering badge (1/N) next to scene titles
- Add inline status chips (Complete, Audio, Image, Voice, Why Script)
- Professional AI-like gradient styling for all chips with shadows
- Remove Script Editor header and 'Why This Script Format?' collapsible
- Move Voice and Why Script info to per-scene chips
- Make scene section mobile-responsive (responsive layout, button sizing)
- Rename 'B-Roll Charts' to 'Podcast Charts' with accordion (collapsed by default)
- Add sceneIndex prop to SceneEditor for scene numbering
- Enhanced accessibility with keyboard navigation and focus states
Backend Changes:
- Audio handler improvements
- B-roll handler enhancements
- Script handler updates
- B-roll composer and service improvements
- Removed temporary broll_temp files
Technical:
- Full mobile responsiveness for scene cards
- Gradient chip styling: vibrant colors with white text and shadows
- Non-breaking approval/generation flow preserved
- TypeScript compatibility maintained
- Fix voice clone preview saved as .wav regardless of actual format (MP3/WebM
content from WaveSpeed was saved with .wav extension causing NotSupportedError)
- Add detect_audio_format() and ensure_audio_extension() to media_utils
- Fix assets_serving.py: use storage_paths for root resolution, add proper
MIME types to FileResponse, add auth via query token for <audio> elements
- Fix assets_serving.py: add path traversal security check
- Fix step4_asset_routes.py: use get_user_workspace() instead of WORKSPACE_DIR,
detect actual audio format before saving preview
- Fix get_db() in database.py: raise HTTPException(401) instead of raw Exception,
catch engine creation failures with HTTPException(503)
- Fix avatar.py: add auth error handling, diagnostic logging for path resolution,
graceful DB save degradation
- Voice clone integration: When user selects voice clone in Write phase,
backend uses their uploaded voice sample + scene script text to generate
audio via qwen3/minimax/cosyvoice voice clone APIs
- Multi-tenant workspace storage: All podcast assets (audio, video, images,
charts) now use workspace-specific directories per user
- Chart preview improvements: Card-based B-Roll charts UI with thumbnails,
takeaway text, and action buttons; public endpoint for image serving
- Voice clone caching: In-memory LRU cache for voice samples (avoids
re-downloading per scene); frontend caches voice clone metadata
- Thread pool for voice clone: Audio generation uses ThreadPoolExecutor to
avoid blocking the FastAPI event loop
- Auto-detect voice clone IDs (vc_*, MY_VOICE_CLONE) to route correctly
- DB fallback for voice sample URL: Fetches from ContentAsset if not passed
- Fixed API URL resolution for chart previews
- Fixed GlassyCard DOM warnings for motion props
- Fixed ScriptGenerationProgressView syntax error
- Fixed usePodcastWorkflow scriptData reference
- Fix database session handling in main_image_editing.py to use proper generator handling
- Add graceful handling of validation errors in podcast-only mode
- Add better error messages when WAVESPEED_API_KEY or HF_TOKEN is missing
- Add specific HTTP 503 error for configuration issues
- Add ALWRITY_SKIP_IMAGE_EDITING_VALIDATION env var to bypass validation in dev
- Add pandas to requirements-podcast.txt for usage tracking
- Fix LLM prompt to return plain strings instead of objects for enhanced_ideas
- Add object-to-string normalization for LLM responses that return objects
- Skip bible generation in podcast-only mode (onboarding disabled)
- Skip alerts polling in AlertsBadge when in podcast-only demo mode
- script.py: set preferred_provider=None to respect GPT_PROVIDER
- research.py: set preferred_provider=None to respect GPT_PROVIDER
- Now all podcast handlers use GPT_PROVIDER env var
- Remove hardcoded preferred_provider=huggingface in podcast handlers
- Set preferred_provider=None to respect GPT_PROVIDER env var
- Change default model from Qwen to gpt-oss-120b:cerebras (the model user had access to)
- WaveSpeed will now use gpt-oss-120b model instead of Qwen
This commit adds the Auto-Dubbing feature for Podcast Maker with support
for translating podcast audio to different languages with optional voice
cloning to preserve the original speaker's voice.
New Features:
- Translation Service (common module): DeepL integration for low-cost
translation, WaveSpeed integration for high-quality translation
- Audio Dubbing Service: STT -> Translate -> TTS pipeline with
voice cloning support
- 9 new API endpoints for dubbing and voice cloning
- Support for 34+ languages
- Cost estimation utilities
- Comprehensive documentation
Files Added:
- services/translation/ (5 files): Translation service module
- services/dubbing/: Audio dubbing service
- api/podcast/handlers/dubbing.py: API endpoints
- docs/AUTO_DUBBING.md: Feature documentation
- CHANGELOG.md: Change log
Files Modified:
- api/podcast/models.py: Added dubbing request/response models
- api/podcast/router.py: Added dubbing routes
- services/__init__.py: Export translation and dubbing services
- scene_animation.py: Fixed missing Path import