- Add dedicated image_generation module with statistical extraction
- Support 16 industry domains with visual concept detection
- Add model-specific guidance for Ideogram, FLUX, GLM, Qwen, MAI
- Extract statistics, rankings, comparisons, and trends automatically
- Refactor backend/api/images.py to use new module
This commit adds the Auto-Dubbing feature for Podcast Maker with support
for translating podcast audio to different languages with optional voice
cloning to preserve the original speaker's voice.
New Features:
- Translation Service (common module): DeepL integration for low-cost
translation, WaveSpeed integration for high-quality translation
- Audio Dubbing Service: STT -> Translate -> TTS pipeline with
voice cloning support
- 9 new API endpoints for dubbing and voice cloning
- Support for 34+ languages
- Cost estimation utilities
- Comprehensive documentation
Files Added:
- services/translation/ (5 files): Translation service module
- services/dubbing/: Audio dubbing service
- api/podcast/handlers/dubbing.py: API endpoints
- docs/AUTO_DUBBING.md: Feature documentation
- CHANGELOG.md: Change log
Files Modified:
- api/podcast/models.py: Added dubbing request/response models
- api/podcast/router.py: Added dubbing routes
- services/__init__.py: Export translation and dubbing services
- scene_animation.py: Fixed missing Path import
- Add services/startup_health.py with health check functions:
- get_startup_status(): Returns current startup status
- readiness_under_auth_context(): Validates tenant DB under auth context
- run_startup_health_routine(): Runs all startup health checks
- Add /health/readiness endpoint for tenant DB validation
- Update startup_event() to use run_startup_health_routine()
- Add raise to startup_event to fail fast on errors
- Import APIKeyManager for provider key checking
- Use APIKeyManager.get_api_key() instead of get_api_key() function
- Add wavespeed provider to available_providers check
- Add detailed provider preflight logging with flow_type tag
- Improve fallback logic when preferred provider is unavailable
These improvements come from PRs #423-#431 while maintaining the modular textgen_utils structure.
huggingface_provider.py:
- Add retry logic with _should_retry_hf_error and _is_non_retryable_hf_error
- Update default models from :groq to :cerebras (HF_FALLBACK_MODELS)
- Add fallback_models parameter to huggingface_text_response
- Add get_available_models with updated model list
main_text_generation.py:
- Add GPT_PROVIDER and TEXTGEN_AI_MODELS env var support
- Add preferred_provider and flow_type parameters to llm_text_gen
- Add HF_MODEL_MAPPING for short model name resolution
- Add flow_type logging tag for better observability
sif_agents.py:
- Add LOW_COST_SHARED_REMOTE_MODELS for SIF agents
- Update SharedLLMWrapper to use preferred_hf_models and flow_type
These changes preserve the modular textgen_utils structure while incorporating
the useful routing and retry logic improvements from the pending PRs.
- Add get_current_user authentication to all user data endpoints
- Pass authenticated user_id from auth context to service methods
- Add proper HTTPException handling for missing data
- Fix user_id type from int to str in service methods
- Ensure endpoints only return data for authenticated user