Analyzing your idea with AI...

2026-04-19 13:21:36 +05:30
parent e704aa7d87
commit 0732887c09
17 changed files with 6225 additions and 0 deletions
--- a/Maker/AUDIO_ONLY_PODCAST_OPTIMIZATION.md
+++ b/Maker/AUDIO_ONLY_PODCAST_OPTIMIZATION.md
@@ -0,0 +1,530 @@
+# Audio-Only Podcast Optimization Plan
+
+## Executive Summary
+
+This document outlines the optimization strategy for audio-only podcasts in ALwrity's Podcast Maker. The goal is to maximize the character throughput per API request while maintaining cost efficiency and audio quality.
+
+---
+
+## 1. Current Cost Analysis
+
+### 1.1 Pricing Structure
+
+| Service | Provider | Cost Formula | Notes |
+|---------|----------|--------------|-------|
+| **TTS (Audio)** | Minimax Speech-02-HD (WaveSpeed) | $0.05 per 1,000 chars | Exact billing per character |
+| **Voice Clone** | Minimax Voice Clone | $0.50 per clone | One-time if using custom voice |
+| **Research** | Exa Neural Search | $0.005 per query | + ~$0.001 for LLM insight extraction |
+| **Avatar** | Ideogram Character | $0.10 per image | Only if AI-generated |
+
+### 1.2 Cost Examples
+
+| Podcast Duration | Characters (est.) | TTS Cost | Total Cost (audio-only) |
+|------------------|-------------------|----------|--------------------------|
+| 1 minute | 750 | $0.04 | $0.07 |
+| 3 minutes | 2,250 | $0.11 | $0.14 |
+| 5 minutes | 3,750 | $0.19 | $0.22 |
+| 10 minutes | 7,500 | $0.38 | $0.41 |
+
+---
+
+## 2. Technical Constraints
+
+### 2.1 API Limits
+
+**Backend**: `main_audio_generation.py` (line 100)
+```python
+if len(text) > 10000:
+    raise ValueError(f"Text is too long ({len(text)} characters). Maximum is 10,000 characters.")
+```
+
+**Current Limit**: 10,000 characters per single API request
+
+### 2.2 Scene-Based Architecture
+
+- Each scene = 1 API call
+- Default scene length: 45 seconds (`scene_length_target` knob)
+- Audio is generated per scene, then concatenated
+
+---
+
+## 3. Optimization Strategies
+
+### 3.1 Strategy 1: Fewer, Longer Scenes
+
+**Problem**: More scenes = more API calls = higher costs
+
+**Solution**: 
+- Increase `scene_length_target` from 45s to 60s or 90s
+- Fewer scenes for the same podcast duration
+
+**Impact**:
+| Duration | Scenes (45s) | Scenes (60s) | Scenes (90s) | API Call Savings |
+|----------|-------------|--------------|--------------|------------------|
+| 5 min | 7 | 5 | 3 | 57% fewer calls |
+| 10 min | 13 | 10 | 7 | 46% fewer calls |
+
+### 3.2 Strategy 2: Per-Scene Character Budgeting
+
+**Current behavior**: Each scene text is sent separately to TTS API
+
+**Optimization options**:
+
+1. **Text Concatenation**: Combine multiple scene texts with `<#x#>` pause markers
+   ```python
+   # Example: Combine scenes with pause markers
+   combined_text = "Scene 1 text.<#x#>Scene 2 text.<#x#>Scene 3 text."
+   ```
+   - Risk: May hit 10,000 char limit faster
+   - Benefit: Single API call for multiple scenes
+
+2. **Smart Chunking**: Dynamically batch scenes based on character count
+   ```python
+   MAX_CHARS_PER_REQUEST = 9500  # Leave buffer
+   # Group scenes until approaching limit
+   ```
+
+### 3.3 Strategy 3: Voice Settings for Longer Content
+
+**Speed factor impacts**:
+- Speed 0.8 = 25% more content per same duration
+- Speed 1.2 = 20% less content
+
+**Recommendation**: Use speed 0.9-1.0 for optimal quality/cost balance
+
+### 3.4 Strategy 4: Audio-Only Mode Skip
+
+**For audio-only podcasts** (no video):
+
+1. **Skip avatar generation** - Save $0.10 per speaker
+2. **Skip video rendering** - Save $0.30 per scene  
+3. **Skip scene images** - Save $0.04-$0.10 per scene
+
+**Estimated savings for 5-min, 5-scene audio podcast**:
+| Component | Cost | Audio-Only Savings |
+|-----------|------|---------------------|
+| Avatar | $0.10 | $0.10 |
+| Video (5 scenes) | $1.50 | $1.50 |
+| Images (5 scenes) | $0.20-$0.50 | $0.20-$0.50 |
+| **Total** | $1.80-$2.10 | **$1.80-$2.10** |
+
+---
+
+## 4. Implementation Plan
+
+### 4.1 Phase 1: User-Facing Controls (Frontend)
+
+#### 4.1.1 Add "Audio Only" Toggle
+- Location: `CreateModal.tsx` or `PodcastConfiguration.tsx`
+- Options: `Audio Only` | `Video Only` | `Audio + Video`
+- When enabled: Skip avatar, image, video generation
+- Pass `audio_only: true` or `video_only: true` to backend
+
+#### 4.1.2 Cost Preview Updates
+- Show cost comparison based on selected mode
+- Display potential savings for audio-only vs video
+
+### 4.2 Phase 2: Script Editor UI (NEW - CRITICAL)
+
+#### 4.2.1 Three Mode UI Strategy
+
+The script editor needs to adapt based on the podcast mode:
+
+| Mode | Script Editor UI | Available Actions |
+|------|------------------|-------------------|
+| **Audio Only** | Single audio-optimized script | Generate Audio only |
+| **Video Only** | Current video script editor | Generate Audio + Image + Video |
+| **Audio + Video** | Two tabs: "Audio Script" + "Video Script" | Full generation options |
+
+#### 4.2.2 Implementation Details
+
+**File:** `frontend/src/components/PodcastMaker/ScriptEditor/ScriptEditor.tsx`
+
+**New Component Structure:**
+
+```typescript
+interface ScriptEditorProps {
+  // ... existing props
+  audioOnlyMode: boolean;    // Audio-only podcast
+  videoOnlyMode: boolean;    // Video-only podcast (current behavior)
+  audioScript?: Script;      // Audio-optimized script (3-4 scenes, more lines)
+  videoScript?: Script;      // Video-optimized script (current)
+  onAudioScriptChange?: (script: Script) => void;
+  onVideoScriptChange?: (script: Script) => void;
+}
+```
+
+**UI Layout:**
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  Script Editor                              [Audio] [Video] tabs (if both)
+├─────────────────────────────────────────────────────────────┤
+│  Mode: Audio-Only                                          │
+│  ┌─────────────────────────────────────────────────────┐  │
+│  │ Scene 1: Introduction (90s)                     [Edit]│  │
+│  │   Host: Welcome to today's episode...                 │  │
+│  │   Host: Today we're diving deep into...               │  │
+│  │   ... (6-10 lines per scene for audio)                │  │
+│  └─────────────────────────────────────────────────────┘  │
+│                                                             │
+│  [Generate Audio] $0.04                                   │
+└─────────────────────────────────────────────────────────────┘
+```
+
+#### 4.2.3 Tab Implementation for Audio + Video Mode
+
+**When both Audio and Video are selected:**
+
+1. Show two tabs in script editor:
+   - **Tab 1: "Audio Script"** - Audio-optimized (fewer scenes, more content)
+   - **Tab 2: "Video Script"** - Current video script (more scenes, visual)
+
+2. Each tab has independent:
+   - Scene structure
+   - Edit capabilities
+   - Generation buttons
+
+3. Generation actions differ by tab:
+   - Audio Tab: "Generate Audio" button only
+   - Video Tab: "Generate Audio" + "Generate Image" + "Generate Video"
+
+#### 4.2.4 Backend Script Generation Updates
+
+**Script generation endpoint changes:**
+
+```python
+# In PodcastScriptRequest model
+class PodcastScriptRequest(BaseModel):
+    # ... existing fields
+    audio_only: bool = False      # Generate audio-optimized script
+    video_only: bool = False     # Generate video-optimized script (current)
+    # If both False AND audio/video mode is "both", generate both scripts
+```
+
+**Prompt Selection Logic:**
+
+```python
+if request.audio_only:
+    prompt = AUDIO_ONLY_PROMPT  # 3-4 scenes, 6-10 lines/scene
+elif request.video_only:
+    prompt = VIDEO_PROMPT        # Current 5-6 scenes, 2-4 lines/scene
+else:
+    # Generate both scripts with respective prompts
+    audio_prompt = AUDIO_ONLY_PROMPT
+    video_prompt = VIDEO_PROMPT
+```
+
+### 4.3 Phase 3: Backend Script Generation (AI Prompts)
+
+#### 4.2.1 Two-Tier Script Generation Strategy
+
+**Current Behavior (Video Podcast):**
+- Existing prompt in `backend/api/podcast/handlers/script.py` (lines 125-151)
+- Optimized for video with shorter scenes (2-4 lines per scene)
+- 5-6 scenes max for visual storytelling
+- Less content per scene to match video duration
+
+**New Audio-Only Mode:**
+- New prompt optimized for audio-only content
+- More content-dense, information-rich
+- Fewer scenes with MORE content per scene
+- Maximizes use of research data
+- Reduces API calls while delivering more value
+
+#### 4.2.2 Audio-Only Script Prompt
+
+**Location:** `backend/api/podcast/handlers/script.py`
+
+**New Prompt for Audio-Only:**
+
+```python
+AUDIO_ONLY_PROMPT = """Create a DEEP, content-rich podcast script optimized for AUDIO-ONLY delivery.
+
+{f"RESEARCH DATA (Use extensively - this is audio only, more content is better): {research_context[:3000]}" if research_context else "No research available - generate general content"}
+
+{f"BIBLE: {bible_context[:1500]}" if bible_context else ""}
+{f"{analysis_context}" if analysis_context else ""}
+
+Topic: "{request.idea}"
+Duration: {request.duration_minutes} min | Speakers: {request.speakers}
+MODE: AUDIO-ONLY (no video constraints - maximize content density)
+
+COST OPTIMIZATION (Audio-Only):
+- 3-4 scenes MAX for entire episode (fewer scenes = fewer API calls)
+- EACH scene should have 6-10 LINES (more content per scene)
+- Each line: 3-5 sentences, information-dense
+- Include: facts, statistics, examples, insights from research
+- NO visual descriptions needed (save tokens for content)
+- Make every line deliver unique value
+
+STRUCTURE per scene:
+- scene_id: string
+- title: short descriptive title
+- duration: seconds (target {request.duration_minutes*60 // 3}-{request.duration_minutes*60 // 4} per scene)
+- emotion: neutral|happy|excited|serious|curious|confident
+- lines: array of {{speaker, text, emphasis}}
+  - speaker: "Host" or "Guest"
+  - text: 3-5 sentences, rich with facts/insights
+  - emphasis: true|false for important points
+
+Return JSON with scenes array.
+"""
+```
+
+**Key Differences:**
+
+| Aspect | Video (Current) | Audio-Only (New) |
+|--------|------------------|------------------|
+| Scenes | 5-6 | 3-4 |
+| Lines/Scene | 2-4 | 6-10 |
+| Sentences/Line | 1-3 | 3-5 |
+| Research Usage | 1,200 chars | 3,000 chars |
+| Focus | Visual storytelling | Content density |
+| API Calls | More (lower cost/scene) | Fewer (higher cost/scene) |
+
+#### 4.2.3 Implementation Details
+
+**File:** `backend/api/podcast/handlers/script.py`
+
+1. Add `audio_only: bool` parameter to `PodcastScriptRequest`
+2. Conditionally select prompt based on `audio_only` flag
+3. For audio-only:
+   - Use expanded research context (3,000 chars vs 1,200)
+   - Request more lines per scene
+   - Fewer total scenes
+   - More content per line
+
+### 4.4 Phase 4: Backend Optimizations
+
+#### 4.3.1 Smart Scene Batching
+- File: `backend/api/podcast/handlers/audio.py`
+- Logic: Group scenes with total chars < 9000
+- Add pause markers between scenes
+
+#### 4.3.2 Audio-Only Flag in Project
+- Model: Add `audio_only: bool` to project settings
+- Skip: Avatar generation, image generation, video rendering
+
+### 4.4 Phase 4: Cost Calculation Updates
+
+#### 4.4.1 Update Frontend Estimation
+- File: `frontend/src/services/podcastApi.ts`
+- Formula updates:
+  ```typescript
+  const estimatedApiCalls = Math.ceil(totalChars / 9500);
+  const ttsCost = estimatedApiCalls * 0.05;
+  ```
+
+---
+
+## 5. Technical Details
+
+### 5.1 Files to Modify
+
+| File | Changes |
+|------|---------|
+| `frontend/src/components/PodcastMaker/types.ts` | Add `audio_only`, `video_only`, `podcast_mode` to project settings |
+| `frontend/src/components/PodcastMaker/CreateModal.tsx` | Add mode toggle (Audio/Video/Both) |
+| `frontend/src/services/podcastApi.ts` | Update cost estimation for each mode |
+| `frontend/src/components/PodcastMaker/ScriptEditor/ScriptEditor.tsx` | Add tab support for Audio + Video mode |
+| `frontend/src/components/PodcastMaker/ScriptEditor/SceneEditor.tsx` | Conditional action buttons per mode |
+| `backend/api/podcast/models.py` | Add `audio_only`, `video_only` fields to request model |
+| `backend/api/podcast/handlers/script.py` | Add audio-only + video-only prompts, return both scripts when needed |
+| `backend/api/podcast/handlers/audio.py` | Implement smart batching |
+
+### 5.2 API Endpoints
+
+```python
+# PodcastScriptRequest model changes
+class PodcastScriptRequest(BaseModel):
+    idea: str
+    duration_minutes: int
+    speakers: int
+    research: Optional[Dict] = None
+    bible: Optional[Dict] = None
+    analysis: Optional[Dict] = None
+    outline: Optional[Dict] = None
+    # NEW FIELDS:
+    audio_only: bool = False      # Generate audio-optimized script
+    video_only: bool = False      # Generate video-optimized script (current)
+    # Both False = generate both scripts for audio+video mode
+
+# Response includes both scripts when needed
+class PodcastScriptResponse(BaseModel):
+    audio_script: Optional[Script] = None   # Audio-optimized
+    video_script: Optional[Script] = None   # Video-optimized
+```
+
+### 5.3 Database Schema
+
+```python
+# In PodcastProject model
+audio_only: bool = False
+scene_length_target: int = 60  # seconds
+```
+
+---
+
+## 6. User Experience
+
+### 6.1 Create Phase - Mode Toggle
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  🎙️ Create New Podcast                                     │
+├─────────────────────────────────────────────────────────────┤
+│  Duration: [5] minutes   Speakers: [1] [2]                   │
+│                                                             │
+│  Podcast Mode:                                              │
+│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐          │
+│  │ Audio Only  │ │ Video Only  │ │ Audio+Video │          │
+│  │   ($0.22)   │ │   ($2.02)   │ │   ($2.24)   │          │
+│  └─────────────┘ └─────────────┘ └─────────────┘          │
+│                                                             │
+│  Est. Cost: $0.22 (audio only) vs $2.02 (with video)       │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### 6.2 Script Editor - Audio Only Mode
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  Script Editor                                              │
+├─────────────────────────────────────────────────────────────┤
+│  📻 Audio-Only Mode                                         │
+│  ┌─────────────────────────────────────────────────────┐    │
+│  │ Scene 1: Introduction (90s)                     [Edit]│
+│  │   Host: Welcome to today's episode on AI...         │
+│  │   Host: Today we're diving deep into how AI...      │
+│  │   Host: I'm excited to share three key insights...  │
+│  │   ... (6-10 lines for audio)                        │
+│  │                                                      │
+│  │ Scene 2: Main Topic (120s)                      [Edit]│
+│  │   ...                                               │
+│  └─────────────────────────────────────────────────────┘    │
+│                                                             │
+│  [Generate Audio] $0.04      [Generate Image] Disabled    │
+│  [Generate Video] Disabled                                   │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### 6.3 Script Editor - Video Only Mode (Current)
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  Script Editor                                              │
+├─────────────────────────────────────────────────────────────┤
+│  🎬 Video Mode                                               │
+│  ┌─────────────────────────────────────────────────────┐    │
+│  │ Scene 1: Intro (30s)          [Image] [Audio] [V] │
+│  │ Scene 2: Hook (30s)            [Image] [Audio] [V]  │
+│  │ Scene 3: Content (45s)         [Image] [Audio] [V]  │
+│  │ Scene 4: Example (30s)         [Image] [Audio] [V]  │
+│  │ Scene 5: CTA (15s)             [Image] [Audio] [V]   │
+│  └─────────────────────────────────────────────────────┘    │
+│                                                             │
+│  [Generate Audio] $0.19   [Generate Image] $0.10           │
+│  [Generate Video] $1.50                                     │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### 6.4 Script Editor - Audio + Video Mode (Both)
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  Script Editor                             [Audio] [Video] │
+├─────────────────────────────────────────────────────────────┤
+│  ┌─────────────────────────────────────────────────────┐  │
+│  │ [Audio] Tab | [Video] Tab                           │  │
+│  ├─────────────────────────────────────────────────────┤  │
+│  │ Audio Script:                                        │  │
+│  │   Scene 1: Intro (90s) - 8 lines                   │  │
+│  │   Scene 2: Deep Dive (120s) - 10 lines              │  │
+│  │                                                      │  │
+│  │ [Generate Audio] $0.04                              │  │
+│  └─────────────────────────────────────────────────────┘  │
+└─────────────────────────────────────────────────────────────┘
+OR
+┌─────────────────────────────────────────────────────────────┐
+│  Script Editor                             [Audio] [Video] │
+├─────────────────────────────────────────────────────────────┤
+│  ┌─────────────────────────────────────────────────────┐  │
+│  │ [Audio] Tab | [Video] Tab                           │  │
+│  ├─────────────────────────────────────────────────────┤  │
+│  │ Video Script:                                       │  │
+│  │   Scene 1: Intro (30s)    [Img] [Aud] [Vid]         │  │
+│  │   Scene 2: Hook (30s)      [Img] [Aud] [Vid]        │  │
+│  │   Scene 3: Content (45s)   [Img] [Aud] [Vid]        │  │
+│  │                                                      │  │
+│  │ [Generate Audio] [Generate Image] [Generate Video]  │  │
+│  └─────────────────────────────────────────────────────┘  │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### 6.5 Cost Comparison UI
+
+| Mode | Scenes | Lines/Scene | TTS Cost | Video Cost | Total |
+|------|--------|-------------|----------|------------|-------|
+| Audio Only | 3-4 | 6-10 | $0.19 | $0 | **$0.22** |
+| Video Only | 5-6 | 2-4 | $0.19 | $1.50 | **$1.69** |
+| Audio+Video | 3-4 + 5-6 | varies | $0.19 | $1.50 | **$1.72** |
+
+---
+
+## 7. Testing Plan
+
+### 7.1 Unit Tests
+
+1. Test character count calculation
+2. Test scene batching logic (under 10k chars)
+3. Test cost estimation accuracy
+
+### 7.2 Integration Tests
+
+1. Generate audio for 10-minute podcast with 5 scenes
+2. Verify all scenes generate correctly
+3. Verify cost tracking in database
+
+### 7.3 Performance Tests
+
+1. Measure time for batched vs sequential API calls
+2. Verify no timeout issues with longer text
+
+---
+
+## 8. Success Metrics
+
+| Metric | Target | Current |
+|--------|--------|---------|
+| API calls per 5-min podcast | 5 | 7 |
+| Cost per 5-min audio podcast | $0.22 | $0.22 + video |
+| User-visible savings | 50%+ | N/A |
+| Scene length default | 60s | 45s |
+
+---
+
+## 9. Appendix: Related Files
+
+### Backend
+- `backend/services/llm_providers/main_audio_generation.py` - TTS cost calculation
+- `backend/api/podcast/handlers/audio.py` - Audio generation endpoint
+- `backend/api/podcast/handlers/script.py` - Script generation
+- `backend/services/subscription/pricing_service.py` - Pricing configuration
+
+### Frontend  
+- `frontend/src/services/podcastApi.ts` - Cost estimation
+- `frontend/src/components/PodcastMaker/CreateModal.tsx` - Create UI
+- `frontend/src/components/PodcastMaker/types.ts` - Type definitions
+
+---
+
+## Document History
+
+| Version | Date | Author | Changes |
+|---------|------|--------|---------|
+| 1.0 | 2026-04-08 | ALwrity Team | Initial document creation |
+
+---
+
+*This document serves as the reference for audio-only podcast optimization in ALwrity Podcast Maker.*