Files
ALwrity/backend/docs/AUTO_DUBBING.md
ajaysi f503a24b3b feat: Add Auto-Dubbing feature for Podcast Maker
This commit adds the Auto-Dubbing feature for Podcast Maker with support
for translating podcast audio to different languages with optional voice
cloning to preserve the original speaker's voice.

New Features:
- Translation Service (common module): DeepL integration for low-cost
  translation, WaveSpeed integration for high-quality translation
- Audio Dubbing Service: STT -> Translate -> TTS pipeline with
  voice cloning support
- 9 new API endpoints for dubbing and voice cloning
- Support for 34+ languages
- Cost estimation utilities
- Comprehensive documentation

Files Added:
- services/translation/ (5 files): Translation service module
- services/dubbing/: Audio dubbing service
- api/podcast/handlers/dubbing.py: API endpoints
- docs/AUTO_DUBBING.md: Feature documentation
- CHANGELOG.md: Change log

Files Modified:
- api/podcast/models.py: Added dubbing request/response models
- api/podcast/router.py: Added dubbing routes
- services/__init__.py: Export translation and dubbing services
- scene_animation.py: Fixed missing Path import
2026-03-24 15:45:51 +05:30

6.5 KiB

Auto-Dubbing Feature Documentation

Overview

Auto-Dubbing enables automatic translation of podcast audio to different languages with optional voice cloning to preserve the original speaker's voice.

Features

  • Text Translation: Translate audio transcripts using DeepL (low-cost) or WaveSpeed (high-quality)
  • Voice Cloning: Preserve original speaker's voice in dubbed audio
  • Multiple Quality Tiers: Choose between low-cost (DeepL) and high-quality (WaveSpeed) translation
  • Cost Estimation: Preview costs before starting dubbing tasks
  • Progress Tracking: Real-time progress updates for long-running tasks

Architecture

backend/services/
├── translation/          # Common translation service
│   ├── __init__.py
│   ├── base_translation.py
│   ├── deepl_translator.py
│   ├── wavespeed_translator.py
│   └── translation_factory.py
│
├── dubbing/             # Audio dubbing service
│   └── __init__.py      # AudioDubbingService
│
└── api/podcast/
    ├── handlers/
    │   └── dubbing.py   # API endpoints
    └── models.py        # Request/response models

Quick Start

1. Configure Environment

Add your DeepL API key to .env:

# backend/.env
DEEPL_API_KEY=your-deepl-api-key-here

Get a free DeepL API key at: https://www.deepl.com/pro-api

2. Basic Audio Dubbing

from services.dubbing import AudioDubbingService

service = AudioDubbingService()
result = service.dub_audio(
    source_audio="/path/to/audio.mp3",
    target_language="Spanish",
    quality="low",  # or "high"
)

3. High-Quality Dubbing with Voice Clone

result = service.dub_audio(
    source_audio="/path/to/audio.mp3",
    target_language="French",
    quality="high",
    use_voice_clone=True,  # Preserve original voice
    custom_voice_id="my_podcast_voice",
    accuracy=0.8,  # 0.1-1.0
)

API Endpoints

Create Dubbing Task

POST /api/podcast/dub/audio

Request:

{
    "source_audio_url": "https://example.com/audio.mp3",
    "target_language": "Spanish",
    "quality": "low",
    "voice_id": "Wise_Woman",
    "speed": 1.0,
    "use_voice_clone": false
}

Response:

{
    "task_id": "abc123",
    "status": "pending",
    "message": "Audio dubbing task created"
}

Get Dubbing Result

GET /api/podcast/dub/{task_id}/result

Response (completed):

{
    "task_id": "abc123",
    "status": "completed",
    "dubbed_audio_url": "/api/podcast/dub/audio/dubbed_xyz123.mp3",
    "original_transcript": "Hello, welcome to my podcast...",
    "translated_transcript": "Hola, bienvenidos a mi podcast...",
    "source_language": "en",
    "target_language": "Spanish",
    "voice_id": "Wise_Woman",
    "quality": "low",
    "voice_clone_used": false,
    "cost": 0.05,
    "file_size": 45000
}

Clone Voice

POST /api/podcast/dub/voices/clone

Request:

{
    "source_audio_url": "https://example.com/voice_sample.mp3",
    "custom_voice_id": "podcast_voice_1",
    "accuracy": 0.7,
    "language_boost": "Spanish"
}

Response:

{
    "task_id": "clone123",
    "status": "pending",
    "message": "Voice cloning task created"
}

Estimate Cost

POST /api/podcast/dub/estimate

Request:

{
    "audio_duration_seconds": 60,
    "target_language": "Spanish",
    "quality": "low",
    "use_voice_clone": false
}

Response:

{
    "estimated_characters": 900,
    "translation_cost": 0.009,
    "tts_cost": 0.9,
    "voice_clone_cost": 0.0,
    "total_cost": 0.909,
    "currency": "USD"
}

Get Supported Languages

GET /api/podcast/dub/languages

Response:

{
    "languages": [
        {"code": "es", "name": "Spanish"},
        {"code": "fr", "name": "French"},
        {"code": "de", "name": "German"},
        ...
    ],
    "count": 34
}

Get Available Voices

GET /api/podcast/dub/voices

Response:

{
    "voices": [
        {"id": "Wise_Woman", "name": "Wise Woman", "gender": "female"},
        {"id": "Warm_Man", "name": "Warm Man", "gender": "male"},
        ...
    ],
    "count": 10
}

Translation Pipeline

Low Quality (DeepL)

Source Audio → Download → STT (Gemini) → Translate (DeepL) → TTS (WaveSpeed) → Dubbed Audio

High Quality (WaveSpeed + Voice Clone)

Source Audio → Voice Clone → Download → STT → Translate (WaveSpeed) → TTS (cloned voice) → Dubbed Audio

Cost Structure

Component Low Quality High Quality
Translation $0.00001/char $0.0001/char
TTS $0.001/char $0.001/char
Voice Clone N/A $0.05/voice

Example: 60-second audio (~900 chars)

  • Low quality: ~$0.91
  • High quality with voice clone: ~$0.96

Common Module Usage

The translation service can be used anywhere in the application:

from services.translation import translate_text, TranslationQuality

# Simple translation
result = translate_text(
    text="Hello world",
    target_language="Spanish",
    quality=TranslationQuality.LOW
)
print(result.translated_text)  # "Hola mundo"

# Batch translation
from services.translation import translate_batch
results = translate_batch(
    texts=["Hello", "Goodbye"],
    target_language="French",
    quality=TranslationQuality.LOW
)

Error Handling

The dubbing service returns standard HTTP exceptions:

  • 400 Bad Request: Invalid parameters
  • 404 Not Found: Task or file not found
  • 500 Internal Server Error: Dubbing failed (check task error message)

Background Tasks

Dubbing tasks run in the background. Poll the result endpoint:

import time
while True:
    result = get_dubbing_result(task_id)
    if result.status == "completed":
        print(f"Dubbed audio: {result.dubbed_audio_url}")
        break
    elif result.status == "failed":
        print(f"Failed: {result.error}")
        break
    time.sleep(2)

Environment Variables

Variable Description Required
DEEPL_API_KEY DeepL API key for low-quality translation Yes (for low quality)
DEEPL_USE_PRO Use DeepL Pro API No
WAVESPEED_API_KEY WaveSpeed API key (already configured) Yes

Supported Languages

DeepL supports 34 languages including:

  • English, Spanish, French, German, Italian, Portuguese
  • Japanese, Chinese, Korean, Arabic, Hindi
  • Russian, Dutch, Polish, Turkish, Vietnamese
  • And more...

See full list via: GET /api/podcast/dub/languages