Files

ajaysi f503a24b3b feat: Add Auto-Dubbing feature for Podcast Maker

This commit adds the Auto-Dubbing feature for Podcast Maker with support
for translating podcast audio to different languages with optional voice
cloning to preserve the original speaker's voice.

New Features:
- Translation Service (common module): DeepL integration for low-cost
  translation, WaveSpeed integration for high-quality translation
- Audio Dubbing Service: STT -> Translate -> TTS pipeline with
  voice cloning support
- 9 new API endpoints for dubbing and voice cloning
- Support for 34+ languages
- Cost estimation utilities
- Comprehensive documentation

Files Added:
- services/translation/ (5 files): Translation service module
- services/dubbing/: Audio dubbing service
- api/podcast/handlers/dubbing.py: API endpoints
- docs/AUTO_DUBBING.md: Feature documentation
- CHANGELOG.md: Change log

Files Modified:
- api/podcast/models.py: Added dubbing request/response models
- api/podcast/router.py: Added dubbing routes
- services/__init__.py: Export translation and dubbing services
- scene_animation.py: Fixed missing Path import

2026-03-24 15:45:51 +05:30

6.5 KiB

Raw Blame History

Auto-Dubbing Feature Documentation

Overview

Auto-Dubbing enables automatic translation of podcast audio to different languages with optional voice cloning to preserve the original speaker's voice.

Features

Text Translation: Translate audio transcripts using DeepL (low-cost) or WaveSpeed (high-quality)
Voice Cloning: Preserve original speaker's voice in dubbed audio
Multiple Quality Tiers: Choose between low-cost (DeepL) and high-quality (WaveSpeed) translation
Cost Estimation: Preview costs before starting dubbing tasks
Progress Tracking: Real-time progress updates for long-running tasks

Architecture

backend/services/
├── translation/          # Common translation service
│   ├── __init__.py
│   ├── base_translation.py
│   ├── deepl_translator.py
│   ├── wavespeed_translator.py
│   └── translation_factory.py
│
├── dubbing/             # Audio dubbing service
│   └── __init__.py      # AudioDubbingService
│
└── api/podcast/
    ├── handlers/
    │   └── dubbing.py   # API endpoints
    └── models.py        # Request/response models

Quick Start

1. Configure Environment

Add your DeepL API key to .env:

# backend/.env
DEEPL_API_KEY=your-deepl-api-key-here

Get a free DeepL API key at: https://www.deepl.com/pro-api

2. Basic Audio Dubbing

from services.dubbing import AudioDubbingService

service = AudioDubbingService()
result = service.dub_audio(
    source_audio="/path/to/audio.mp3",
    target_language="Spanish",
    quality="low",  # or "high"
)

3. High-Quality Dubbing with Voice Clone

result = service.dub_audio(
    source_audio="/path/to/audio.mp3",
    target_language="French",
    quality="high",
    use_voice_clone=True,  # Preserve original voice
    custom_voice_id="my_podcast_voice",
    accuracy=0.8,  # 0.1-1.0
)

API Endpoints

Create Dubbing Task

POST /api/podcast/dub/audio

Request:

{
    "source_audio_url": "https://example.com/audio.mp3",
    "target_language": "Spanish",
    "quality": "low",
    "voice_id": "Wise_Woman",
    "speed": 1.0,
    "use_voice_clone": false
}

Response:

{
    "task_id": "abc123",
    "status": "pending",
    "message": "Audio dubbing task created"
}

Get Dubbing Result

GET /api/podcast/dub/{task_id}/result

Response (completed):

{
    "task_id": "abc123",
    "status": "completed",
    "dubbed_audio_url": "/api/podcast/dub/audio/dubbed_xyz123.mp3",
    "original_transcript": "Hello, welcome to my podcast...",
    "translated_transcript": "Hola, bienvenidos a mi podcast...",
    "source_language": "en",
    "target_language": "Spanish",
    "voice_id": "Wise_Woman",
    "quality": "low",
    "voice_clone_used": false,
    "cost": 0.05,
    "file_size": 45000
}

Clone Voice

POST /api/podcast/dub/voices/clone

Request:

{
    "source_audio_url": "https://example.com/voice_sample.mp3",
    "custom_voice_id": "podcast_voice_1",
    "accuracy": 0.7,
    "language_boost": "Spanish"
}

Response:

{
    "task_id": "clone123",
    "status": "pending",
    "message": "Voice cloning task created"
}

Estimate Cost

POST /api/podcast/dub/estimate

Request:

{
    "audio_duration_seconds": 60,
    "target_language": "Spanish",
    "quality": "low",
    "use_voice_clone": false
}

Response:

{
    "estimated_characters": 900,
    "translation_cost": 0.009,
    "tts_cost": 0.9,
    "voice_clone_cost": 0.0,
    "total_cost": 0.909,
    "currency": "USD"
}

Get Supported Languages

GET /api/podcast/dub/languages

Response:

{
    "languages": [
        {"code": "es", "name": "Spanish"},
        {"code": "fr", "name": "French"},
        {"code": "de", "name": "German"},
        ...
    ],
    "count": 34
}

Get Available Voices

GET /api/podcast/dub/voices

Response:

{
    "voices": [
        {"id": "Wise_Woman", "name": "Wise Woman", "gender": "female"},
        {"id": "Warm_Man", "name": "Warm Man", "gender": "male"},
        ...
    ],
    "count": 10
}

Translation Pipeline

Low Quality (DeepL)

Source Audio → Download → STT (Gemini) → Translate (DeepL) → TTS (WaveSpeed) → Dubbed Audio

High Quality (WaveSpeed + Voice Clone)

Source Audio → Voice Clone → Download → STT → Translate (WaveSpeed) → TTS (cloned voice) → Dubbed Audio

Cost Structure

Component	Low Quality	High Quality
Translation	$0.00001/char	$0.0001/char
TTS	$0.001/char	$0.001/char
Voice Clone	N/A	$0.05/voice

Example: 60-second audio (~900 chars)

Low quality: ~$0.91
High quality with voice clone: ~$0.96

Common Module Usage

The translation service can be used anywhere in the application:

from services.translation import translate_text, TranslationQuality

# Simple translation
result = translate_text(
    text="Hello world",
    target_language="Spanish",
    quality=TranslationQuality.LOW
)
print(result.translated_text)  # "Hola mundo"

# Batch translation
from services.translation import translate_batch
results = translate_batch(
    texts=["Hello", "Goodbye"],
    target_language="French",
    quality=TranslationQuality.LOW
)

Error Handling

The dubbing service returns standard HTTP exceptions:

400 Bad Request: Invalid parameters
404 Not Found: Task or file not found
500 Internal Server Error: Dubbing failed (check task error message)

Background Tasks

Dubbing tasks run in the background. Poll the result endpoint:

import time
while True:
    result = get_dubbing_result(task_id)
    if result.status == "completed":
        print(f"Dubbed audio: {result.dubbed_audio_url}")
        break
    elif result.status == "failed":
        print(f"Failed: {result.error}")
        break
    time.sleep(2)

Environment Variables

Variable	Description	Required
`DEEPL_API_KEY`	DeepL API key for low-quality translation	Yes (for low quality)
`DEEPL_USE_PRO`	Use DeepL Pro API	No
`WAVESPEED_API_KEY`	WaveSpeed API key (already configured)	Yes

Supported Languages

DeepL supports 34 languages including:

English, Spanish, French, German, Italian, Portuguese
Japanese, Chinese, Korean, Arabic, Hindi
Russian, Dutch, Polish, Turkish, Vietnamese
And more...

See full list via: GET /api/podcast/dub/languages

6.5 KiB Raw Blame History

Auto-Dubbing Feature Documentation

Overview

Features

Architecture

Quick Start

1. Configure Environment

2. Basic Audio Dubbing

3. High-Quality Dubbing with Voice Clone

API Endpoints

Create Dubbing Task

Get Dubbing Result

Clone Voice

Estimate Cost

Get Supported Languages

Get Available Voices

Translation Pipeline

Low Quality (DeepL)

High Quality (WaveSpeed + Voice Clone)

Cost Structure

Common Module Usage

Error Handling

Background Tasks

Environment Variables

Supported Languages

6.5 KiB

Raw Blame History