- Upgrade utils/storage_paths.py with robust find_repo_root() (env var override + validation + fallback)
- Remove broken _find_root() from podcast/constants.py, import from storage_paths instead
- Fix ROOT_DIR resolving to backend/ instead of project root (caused avatar upload 500s on Render.com)
- Fix video_combination_service.py default output dir (was writing to data/media instead of workspace)
- Add deprecation comments to global data/media constants in media_utils.py
- Pass user_id through resolve_media_path for tenant-scoped podcast resolution
- Add ALWRITY_ROOT_DIR env var support for explicit production overrides
- Log warning when get_podcast_media_dir called without user_id
- Use OperationButton with cost display for scene action buttons
- Voice clone integration: When user selects voice clone in Write phase,
backend uses their uploaded voice sample + scene script text to generate
audio via qwen3/minimax/cosyvoice voice clone APIs
- Multi-tenant workspace storage: All podcast assets (audio, video, images,
charts) now use workspace-specific directories per user
- Chart preview improvements: Card-based B-Roll charts UI with thumbnails,
takeaway text, and action buttons; public endpoint for image serving
- Voice clone caching: In-memory LRU cache for voice samples (avoids
re-downloading per scene); frontend caches voice clone metadata
- Thread pool for voice clone: Audio generation uses ThreadPoolExecutor to
avoid blocking the FastAPI event loop
- Auto-detect voice clone IDs (vc_*, MY_VOICE_CLONE) to route correctly
- DB fallback for voice sample URL: Fetches from ContentAsset if not passed
- Fixed API URL resolution for chart previews
- Fixed GlassyCard DOM warnings for motion props
- Fixed ScriptGenerationProgressView syntax error
- Fixed usePodcastWorkflow scriptData reference
The save_file_safely function may add a UUID suffix to the filename if there's
a collision, but the preview URL was being constructed using the original
filename instead of the actual saved filename. Now uses saved_preview_path.name.
- Remove nltk dependency from step4_assets by inlining _extract_user_id
- Add graceful error handling for step4_assets in podcast-only mode
- Fix media_utils.py to use tenant-aware get_podcast_media_read_dirs()
- Resolves avatar image 404 during scene image generation
- Fix database session handling in main_image_editing.py to use proper generator handling
- Add graceful handling of validation errors in podcast-only mode
- Add better error messages when WAVESPEED_API_KEY or HF_TOKEN is missing
- Add specific HTTP 503 error for configuration issues
- Add ALWRITY_SKIP_IMAGE_EDITING_VALIDATION env var to bypass validation in dev
- Add pandas to requirements-podcast.txt for usage tracking
- Fix LLM prompt to return plain strings instead of objects for enhanced_ideas
- Add object-to-string normalization for LLM responses that return objects
- Skip bible generation in podcast-only mode (onboarding disabled)
- Skip alerts polling in AlertsBadge when in podcast-only demo mode
- script.py: set preferred_provider=None to respect GPT_PROVIDER
- research.py: set preferred_provider=None to respect GPT_PROVIDER
- Now all podcast handlers use GPT_PROVIDER env var
- Remove hardcoded preferred_provider=huggingface in podcast handlers
- Set preferred_provider=None to respect GPT_PROVIDER env var
- Change default model from Qwen to gpt-oss-120b:cerebras (the model user had access to)
- WaveSpeed will now use gpt-oss-120b model instead of Qwen
- Add dedicated image_generation module with statistical extraction
- Support 16 industry domains with visual concept detection
- Add model-specific guidance for Ideogram, FLUX, GLM, Qwen, MAI
- Extract statistics, rankings, comparisons, and trends automatically
- Refactor backend/api/images.py to use new module
This commit adds the Auto-Dubbing feature for Podcast Maker with support
for translating podcast audio to different languages with optional voice
cloning to preserve the original speaker's voice.
New Features:
- Translation Service (common module): DeepL integration for low-cost
translation, WaveSpeed integration for high-quality translation
- Audio Dubbing Service: STT -> Translate -> TTS pipeline with
voice cloning support
- 9 new API endpoints for dubbing and voice cloning
- Support for 34+ languages
- Cost estimation utilities
- Comprehensive documentation
Files Added:
- services/translation/ (5 files): Translation service module
- services/dubbing/: Audio dubbing service
- api/podcast/handlers/dubbing.py: API endpoints
- docs/AUTO_DUBBING.md: Feature documentation
- CHANGELOG.md: Change log
Files Modified:
- api/podcast/models.py: Added dubbing request/response models
- api/podcast/router.py: Added dubbing routes
- services/__init__.py: Export translation and dubbing services
- scene_animation.py: Fixed missing Path import
- Add get_current_user authentication to all user data endpoints
- Pass authenticated user_id from auth context to service methods
- Add proper HTTPException handling for missing data
- Fix user_id type from int to str in service methods
- Ensure endpoints only return data for authenticated user