feat: validate podcast cost estimation accuracy, document per-token costs, and fix subscription/plan enforcement

Issue #543 — Validate Estimated Cost Accuracy (UI vs Backend) Backend: - cost_estimator.py uses pricing catalog (APIProviderPricing) as single source of truth - All 7 cost components: analysis, research (search+LLM), script, TTS, voice clone, avatar, video - initialize_default_pricing() runs on every app startup for auto-sync Frontend cost estimation fixes: - Added missing analysisCost, scriptCost, voiceCloneCost to PodcastEstimate type - toPodcastEstimate() now extracts all 7 backend fields (was dropping 3) - headerCostEst maps analysisCost->Analyze, scriptCost->Write, voiceCloneCost->Produce - EstimateCard shows 5 chips: Analysis, Research, Script, Voice(TTS+clone), Visuals(avatar+video) - Chip sum now equals backend total for all configurations Subscription & plan fixes: - Removed Stripe re-verification from checkSubscription() (downgrade regression fix #539) - Added verifyCheckoutRef pattern for reliable mount-time checkout polling - One-time Stripe sync effect with pending_subscription_change flag for Customer Portal returns - Free plan limits: stability_calls 3->10, audio_calls 5->10 (supports 2 podcasts) - Image enforcement uses actual provider (GPT_PROVIDER), not hardcoded Stability - Billing/pricing pages bypass onboarding check in ProtectedRoute - Gradient buttons + loading spinner on plan chip in UserBadge - Added metadata-based Stripe lookup fallback (Issue #538) Documentation: - TESTING_GUIDE.md: comprehensive testing instructions for non-technical testers - Free plan limits, usage tracking, cost estimation formulas - 10 test cases for UI verification - Troubleshooting guide - Quick-reference cost formulas with all default rates Cleanup: removed legacy ToBeMigrated directory (70+ files, ~22K LOC) GSC Brainstorm: service, hook, modal, and UI components for blog topic brainstorming
2026-05-27 08:46:38 +05:30
parent 96fa469fe8
commit aaf94049da
100 changed files with 2953 additions and 22118 deletions
--- a/Maker/AUDIO_ONLY_PODCAST_OPTIMIZATION.md
+++ b/Maker/AUDIO_ONLY_PODCAST_OPTIMIZATION.md
@@ -1,530 +0,0 @@
-# Audio-Only Podcast Optimization Plan
-
-## Executive Summary
-
-This document outlines the optimization strategy for audio-only podcasts in ALwrity's Podcast Maker. The goal is to maximize the character throughput per API request while maintaining cost efficiency and audio quality.
-
---
-
-## 1. Current Cost Analysis
-
-### 1.1 Pricing Structure
-
-| Service | Provider | Cost Formula | Notes |
-|---------|----------|--------------|-------|
-| **TTS (Audio)** | Minimax Speech-02-HD (WaveSpeed) | $0.05 per 1,000 chars | Exact billing per character |
-| **Voice Clone** | Minimax Voice Clone | $0.50 per clone | One-time if using custom voice |
-| **Research** | Exa Neural Search | $0.005 per query | + ~$0.001 for LLM insight extraction |
-| **Avatar** | Ideogram Character | $0.10 per image | Only if AI-generated |
-
-### 1.2 Cost Examples
-
-| Podcast Duration | Characters (est.) | TTS Cost | Total Cost (audio-only) |
-|------------------|-------------------|----------|--------------------------|
-| 1 minute | 750 | $0.04 | $0.07 |
-| 3 minutes | 2,250 | $0.11 | $0.14 |
-| 5 minutes | 3,750 | $0.19 | $0.22 |
-| 10 minutes | 7,500 | $0.38 | $0.41 |
-
---
-
-## 2. Technical Constraints
-
-### 2.1 API Limits
-
-**Backend**: `main_audio_generation.py` (line 100)
-```python
-if len(text) > 10000:
-    raise ValueError(f"Text is too long ({len(text)} characters). Maximum is 10,000 characters.")
-```
-
-**Current Limit**: 10,000 characters per single API request
-
-### 2.2 Scene-Based Architecture
-
- Each scene = 1 API call
- Default scene length: 45 seconds (`scene_length_target` knob)
- Audio is generated per scene, then concatenated
-
---
-
-## 3. Optimization Strategies
-
-### 3.1 Strategy 1: Fewer, Longer Scenes
-
-**Problem**: More scenes = more API calls = higher costs
-
-**Solution**: 
- Increase `scene_length_target` from 45s to 60s or 90s
- Fewer scenes for the same podcast duration
-
-**Impact**:
-| Duration | Scenes (45s) | Scenes (60s) | Scenes (90s) | API Call Savings |
-|----------|-------------|--------------|--------------|------------------|
-| 5 min | 7 | 5 | 3 | 57% fewer calls |
-| 10 min | 13 | 10 | 7 | 46% fewer calls |
-
-### 3.2 Strategy 2: Per-Scene Character Budgeting
-
-**Current behavior**: Each scene text is sent separately to TTS API
-
-**Optimization options**:
-
-1. **Text Concatenation**: Combine multiple scene texts with `<#x#>` pause markers
-   ```python
-   # Example: Combine scenes with pause markers
-   combined_text = "Scene 1 text.<#x#>Scene 2 text.<#x#>Scene 3 text."
-   ```
-   - Risk: May hit 10,000 char limit faster
-   - Benefit: Single API call for multiple scenes
-
-2. **Smart Chunking**: Dynamically batch scenes based on character count
-   ```python
-   MAX_CHARS_PER_REQUEST = 9500  # Leave buffer
-   # Group scenes until approaching limit
-   ```
-
-### 3.3 Strategy 3: Voice Settings for Longer Content
-
-**Speed factor impacts**:
- Speed 0.8 = 25% more content per same duration
- Speed 1.2 = 20% less content
-
-**Recommendation**: Use speed 0.9-1.0 for optimal quality/cost balance
-
-### 3.4 Strategy 4: Audio-Only Mode Skip
-
-**For audio-only podcasts** (no video):
-
-1. **Skip avatar generation** - Save $0.10 per speaker
-2. **Skip video rendering** - Save $0.30 per scene  
-3. **Skip scene images** - Save $0.04-$0.10 per scene
-
-**Estimated savings for 5-min, 5-scene audio podcast**:
-| Component | Cost | Audio-Only Savings |
-|-----------|------|---------------------|
-| Avatar | $0.10 | $0.10 |
-| Video (5 scenes) | $1.50 | $1.50 |
-| Images (5 scenes) | $0.20-$0.50 | $0.20-$0.50 |
-| **Total** | $1.80-$2.10 | **$1.80-$2.10** |
-
---
-
-## 4. Implementation Plan
-
-### 4.1 Phase 1: User-Facing Controls (Frontend)
-
-#### 4.1.1 Add "Audio Only" Toggle
- Location: `CreateModal.tsx` or `PodcastConfiguration.tsx`
- Options: `Audio Only` | `Video Only` | `Audio + Video`
- When enabled: Skip avatar, image, video generation
- Pass `audio_only: true` or `video_only: true` to backend
-
-#### 4.1.2 Cost Preview Updates
- Show cost comparison based on selected mode
- Display potential savings for audio-only vs video
-
-### 4.2 Phase 2: Script Editor UI (NEW - CRITICAL)
-
-#### 4.2.1 Three Mode UI Strategy
-
-The script editor needs to adapt based on the podcast mode:
-
-| Mode | Script Editor UI | Available Actions |
-|------|------------------|-------------------|
-| **Audio Only** | Single audio-optimized script | Generate Audio only |
-| **Video Only** | Current video script editor | Generate Audio + Image + Video |
-| **Audio + Video** | Two tabs: "Audio Script" + "Video Script" | Full generation options |
-
-#### 4.2.2 Implementation Details
-
-**File:** `frontend/src/components/PodcastMaker/ScriptEditor/ScriptEditor.tsx`
-
-**New Component Structure:**
-
-```typescript
-interface ScriptEditorProps {
-  // ... existing props
-  audioOnlyMode: boolean;    // Audio-only podcast
-  videoOnlyMode: boolean;    // Video-only podcast (current behavior)
-  audioScript?: Script;      // Audio-optimized script (3-4 scenes, more lines)
-  videoScript?: Script;      // Video-optimized script (current)
-  onAudioScriptChange?: (script: Script) => void;
-  onVideoScriptChange?: (script: Script) => void;
-}
-```
-
-**UI Layout:**
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│  Script Editor                              [Audio] [Video] tabs (if both)
-├─────────────────────────────────────────────────────────────┤
-│  Mode: Audio-Only                                          │
-│  ┌─────────────────────────────────────────────────────┐  │
-│  │ Scene 1: Introduction (90s)                     [Edit]│  │
-│  │   Host: Welcome to today's episode...                 │  │
-│  │   Host: Today we're diving deep into...               │  │
-│  │   ... (6-10 lines per scene for audio)                │  │
-│  └─────────────────────────────────────────────────────┘  │
-│                                                             │
-│  [Generate Audio] $0.04                                   │
-└─────────────────────────────────────────────────────────────┘
-```
-
-#### 4.2.3 Tab Implementation for Audio + Video Mode
-
-**When both Audio and Video are selected:**
-
-1. Show two tabs in script editor:
-   - **Tab 1: "Audio Script"** - Audio-optimized (fewer scenes, more content)
-   - **Tab 2: "Video Script"** - Current video script (more scenes, visual)
-
-2. Each tab has independent:
-   - Scene structure
-   - Edit capabilities
-   - Generation buttons
-
-3. Generation actions differ by tab:
-   - Audio Tab: "Generate Audio" button only
-   - Video Tab: "Generate Audio" + "Generate Image" + "Generate Video"
-
-#### 4.2.4 Backend Script Generation Updates
-
-**Script generation endpoint changes:**
-
-```python
-# In PodcastScriptRequest model
-class PodcastScriptRequest(BaseModel):
-    # ... existing fields
-    audio_only: bool = False      # Generate audio-optimized script
-    video_only: bool = False     # Generate video-optimized script (current)
-    # If both False AND audio/video mode is "both", generate both scripts
-```
-
-**Prompt Selection Logic:**
-
-```python
-if request.audio_only:
-    prompt = AUDIO_ONLY_PROMPT  # 3-4 scenes, 6-10 lines/scene
-elif request.video_only:
-    prompt = VIDEO_PROMPT        # Current 5-6 scenes, 2-4 lines/scene
-else:
-    # Generate both scripts with respective prompts
-    audio_prompt = AUDIO_ONLY_PROMPT
-    video_prompt = VIDEO_PROMPT
-```
-
-### 4.3 Phase 3: Backend Script Generation (AI Prompts)
-
-#### 4.2.1 Two-Tier Script Generation Strategy
-
-**Current Behavior (Video Podcast):**
- Existing prompt in `backend/api/podcast/handlers/script.py` (lines 125-151)
- Optimized for video with shorter scenes (2-4 lines per scene)
- 5-6 scenes max for visual storytelling
- Less content per scene to match video duration
-
-**New Audio-Only Mode:**
- New prompt optimized for audio-only content
- More content-dense, information-rich
- Fewer scenes with MORE content per scene
- Maximizes use of research data
- Reduces API calls while delivering more value
-
-#### 4.2.2 Audio-Only Script Prompt
-
-**Location:** `backend/api/podcast/handlers/script.py`
-
-**New Prompt for Audio-Only:**
-
-```python
-AUDIO_ONLY_PROMPT = """Create a DEEP, content-rich podcast script optimized for AUDIO-ONLY delivery.
-
-{f"RESEARCH DATA (Use extensively - this is audio only, more content is better): {research_context[:3000]}" if research_context else "No research available - generate general content"}
-
-{f"BIBLE: {bible_context[:1500]}" if bible_context else ""}
-{f"{analysis_context}" if analysis_context else ""}
-
-Topic: "{request.idea}"
-Duration: {request.duration_minutes} min | Speakers: {request.speakers}
-MODE: AUDIO-ONLY (no video constraints - maximize content density)
-
-COST OPTIMIZATION (Audio-Only):
- 3-4 scenes MAX for entire episode (fewer scenes = fewer API calls)
- EACH scene should have 6-10 LINES (more content per scene)
- Each line: 3-5 sentences, information-dense
- Include: facts, statistics, examples, insights from research
- NO visual descriptions needed (save tokens for content)
- Make every line deliver unique value
-
-STRUCTURE per scene:
- scene_id: string
- title: short descriptive title
- duration: seconds (target {request.duration_minutes*60 // 3}-{request.duration_minutes*60 // 4} per scene)
- emotion: neutral|happy|excited|serious|curious|confident
- lines: array of {{speaker, text, emphasis}}
-  - speaker: "Host" or "Guest"
-  - text: 3-5 sentences, rich with facts/insights
-  - emphasis: true|false for important points
-
-Return JSON with scenes array.
-"""
-```
-
-**Key Differences:**
-
-| Aspect | Video (Current) | Audio-Only (New) |
-|--------|------------------|------------------|
-| Scenes | 5-6 | 3-4 |
-| Lines/Scene | 2-4 | 6-10 |
-| Sentences/Line | 1-3 | 3-5 |
-| Research Usage | 1,200 chars | 3,000 chars |
-| Focus | Visual storytelling | Content density |
-| API Calls | More (lower cost/scene) | Fewer (higher cost/scene) |
-
-#### 4.2.3 Implementation Details
-
-**File:** `backend/api/podcast/handlers/script.py`
-
-1. Add `audio_only: bool` parameter to `PodcastScriptRequest`
-2. Conditionally select prompt based on `audio_only` flag
-3. For audio-only:
-   - Use expanded research context (3,000 chars vs 1,200)
-   - Request more lines per scene
-   - Fewer total scenes
-   - More content per line
-
-### 4.4 Phase 4: Backend Optimizations
-
-#### 4.3.1 Smart Scene Batching
- File: `backend/api/podcast/handlers/audio.py`
- Logic: Group scenes with total chars < 9000
- Add pause markers between scenes
-
-#### 4.3.2 Audio-Only Flag in Project
- Model: Add `audio_only: bool` to project settings
- Skip: Avatar generation, image generation, video rendering
-
-### 4.4 Phase 4: Cost Calculation Updates
-
-#### 4.4.1 Update Frontend Estimation
- File: `frontend/src/services/podcastApi.ts`
- Formula updates:
-  ```typescript
-  const estimatedApiCalls = Math.ceil(totalChars / 9500);
-  const ttsCost = estimatedApiCalls * 0.05;
-  ```
-
---
-
-## 5. Technical Details
-
-### 5.1 Files to Modify
-
-| File | Changes |
-|------|---------|
-| `frontend/src/components/PodcastMaker/types.ts` | Add `audio_only`, `video_only`, `podcast_mode` to project settings |
-| `frontend/src/components/PodcastMaker/CreateModal.tsx` | Add mode toggle (Audio/Video/Both) |
-| `frontend/src/services/podcastApi.ts` | Update cost estimation for each mode |
-| `frontend/src/components/PodcastMaker/ScriptEditor/ScriptEditor.tsx` | Add tab support for Audio + Video mode |
-| `frontend/src/components/PodcastMaker/ScriptEditor/SceneEditor.tsx` | Conditional action buttons per mode |
-| `backend/api/podcast/models.py` | Add `audio_only`, `video_only` fields to request model |
-| `backend/api/podcast/handlers/script.py` | Add audio-only + video-only prompts, return both scripts when needed |
-| `backend/api/podcast/handlers/audio.py` | Implement smart batching |
-
-### 5.2 API Endpoints
-
-```python
-# PodcastScriptRequest model changes
-class PodcastScriptRequest(BaseModel):
-    idea: str
-    duration_minutes: int
-    speakers: int
-    research: Optional[Dict] = None
-    bible: Optional[Dict] = None
-    analysis: Optional[Dict] = None
-    outline: Optional[Dict] = None
-    # NEW FIELDS:
-    audio_only: bool = False      # Generate audio-optimized script
-    video_only: bool = False      # Generate video-optimized script (current)
-    # Both False = generate both scripts for audio+video mode
-
-# Response includes both scripts when needed
-class PodcastScriptResponse(BaseModel):
-    audio_script: Optional[Script] = None   # Audio-optimized
-    video_script: Optional[Script] = None   # Video-optimized
-```
-
-### 5.3 Database Schema
-
-```python
-# In PodcastProject model
-audio_only: bool = False
-scene_length_target: int = 60  # seconds
-```
-
---
-
-## 6. User Experience
-
-### 6.1 Create Phase - Mode Toggle
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│  🎙️ Create New Podcast                                     │
-├─────────────────────────────────────────────────────────────┤
-│  Duration: [5] minutes   Speakers: [1] [2]                   │
-│                                                             │
-│  Podcast Mode:                                              │
-│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐          │
-│  │ Audio Only  │ │ Video Only  │ │ Audio+Video │          │
-│  │   ($0.22)   │ │   ($2.02)   │ │   ($2.24)   │          │
-│  └─────────────┘ └─────────────┘ └─────────────┘          │
-│                                                             │
-│  Est. Cost: $0.22 (audio only) vs $2.02 (with video)       │
-└─────────────────────────────────────────────────────────────┘
-```
-
-### 6.2 Script Editor - Audio Only Mode
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│  Script Editor                                              │
-├─────────────────────────────────────────────────────────────┤
-│  📻 Audio-Only Mode                                         │
-│  ┌─────────────────────────────────────────────────────┐    │
-│  │ Scene 1: Introduction (90s)                     [Edit]│
-│  │   Host: Welcome to today's episode on AI...         │
-│  │   Host: Today we're diving deep into how AI...      │
-│  │   Host: I'm excited to share three key insights...  │
-│  │   ... (6-10 lines for audio)                        │
-│  │                                                      │
-│  │ Scene 2: Main Topic (120s)                      [Edit]│
-│  │   ...                                               │
-│  └─────────────────────────────────────────────────────┘    │
-│                                                             │
-│  [Generate Audio] $0.04      [Generate Image] Disabled    │
-│  [Generate Video] Disabled                                   │
-└─────────────────────────────────────────────────────────────┘
-```
-
-### 6.3 Script Editor - Video Only Mode (Current)
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│  Script Editor                                              │
-├─────────────────────────────────────────────────────────────┤
-│  🎬 Video Mode                                               │
-│  ┌─────────────────────────────────────────────────────┐    │
-│  │ Scene 1: Intro (30s)          [Image] [Audio] [V] │
-│  │ Scene 2: Hook (30s)            [Image] [Audio] [V]  │
-│  │ Scene 3: Content (45s)         [Image] [Audio] [V]  │
-│  │ Scene 4: Example (30s)         [Image] [Audio] [V]  │
-│  │ Scene 5: CTA (15s)             [Image] [Audio] [V]   │
-│  └─────────────────────────────────────────────────────┘    │
-│                                                             │
-│  [Generate Audio] $0.19   [Generate Image] $0.10           │
-│  [Generate Video] $1.50                                     │
-└─────────────────────────────────────────────────────────────┘
-```
-
-### 6.4 Script Editor - Audio + Video Mode (Both)
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│  Script Editor                             [Audio] [Video] │
-├─────────────────────────────────────────────────────────────┤
-│  ┌─────────────────────────────────────────────────────┐  │
-│  │ [Audio] Tab | [Video] Tab                           │  │
-│  ├─────────────────────────────────────────────────────┤  │
-│  │ Audio Script:                                        │  │
-│  │   Scene 1: Intro (90s) - 8 lines                   │  │
-│  │   Scene 2: Deep Dive (120s) - 10 lines              │  │
-│  │                                                      │  │
-│  │ [Generate Audio] $0.04                              │  │
-│  └─────────────────────────────────────────────────────┘  │
-└─────────────────────────────────────────────────────────────┘
-OR
-┌─────────────────────────────────────────────────────────────┐
-│  Script Editor                             [Audio] [Video] │
-├─────────────────────────────────────────────────────────────┤
-│  ┌─────────────────────────────────────────────────────┐  │
-│  │ [Audio] Tab | [Video] Tab                           │  │
-│  ├─────────────────────────────────────────────────────┤  │
-│  │ Video Script:                                       │  │
-│  │   Scene 1: Intro (30s)    [Img] [Aud] [Vid]         │  │
-│  │   Scene 2: Hook (30s)      [Img] [Aud] [Vid]        │  │
-│  │   Scene 3: Content (45s)   [Img] [Aud] [Vid]        │  │
-│  │                                                      │  │
-│  │ [Generate Audio] [Generate Image] [Generate Video]  │  │
-│  └─────────────────────────────────────────────────────┘  │
-└─────────────────────────────────────────────────────────────┘
-```
-
-### 6.5 Cost Comparison UI
-
-| Mode | Scenes | Lines/Scene | TTS Cost | Video Cost | Total |
-|------|--------|-------------|----------|------------|-------|
-| Audio Only | 3-4 | 6-10 | $0.19 | $0 | **$0.22** |
-| Video Only | 5-6 | 2-4 | $0.19 | $1.50 | **$1.69** |
-| Audio+Video | 3-4 + 5-6 | varies | $0.19 | $1.50 | **$1.72** |
-
---
-
-## 7. Testing Plan
-
-### 7.1 Unit Tests
-
-1. Test character count calculation
-2. Test scene batching logic (under 10k chars)
-3. Test cost estimation accuracy
-
-### 7.2 Integration Tests
-
-1. Generate audio for 10-minute podcast with 5 scenes
-2. Verify all scenes generate correctly
-3. Verify cost tracking in database
-
-### 7.3 Performance Tests
-
-1. Measure time for batched vs sequential API calls
-2. Verify no timeout issues with longer text
-
---
-
-## 8. Success Metrics
-
-| Metric | Target | Current |
-|--------|--------|---------|
-| API calls per 5-min podcast | 5 | 7 |
-| Cost per 5-min audio podcast | $0.22 | $0.22 + video |
-| User-visible savings | 50%+ | N/A |
-| Scene length default | 60s | 45s |
-
---
-
-## 9. Appendix: Related Files
-
-### Backend
- `backend/services/llm_providers/main_audio_generation.py` - TTS cost calculation
- `backend/api/podcast/handlers/audio.py` - Audio generation endpoint
- `backend/api/podcast/handlers/script.py` - Script generation
- `backend/services/subscription/pricing_service.py` - Pricing configuration
-
-### Frontend  
- `frontend/src/services/podcastApi.ts` - Cost estimation
- `frontend/src/components/PodcastMaker/CreateModal.tsx` - Create UI
- `frontend/src/components/PodcastMaker/types.ts` - Type definitions
-
---
-
-## Document History
-
-| Version | Date | Author | Changes |
-|---------|------|--------|---------|
-| 1.0 | 2026-04-08 | ALwrity Team | Initial document creation |
-
---
-
-*This document serves as the reference for audio-only podcast optimization in ALwrity Podcast Maker.*