Base code

This commit is contained in:
Kunthawat Greethong
2026-01-08 22:39:53 +07:00
parent 697115c61a
commit c35fa52117
2169 changed files with 626670 additions and 0 deletions

View File

@@ -0,0 +1,148 @@
# AI Podcast Backend Reference
Curated overview of the backend surfaces that the AI Podcast Maker
should call. Covers service clients, research providers, subscription
controls, and FastAPI routes relevant to analysis, research, scripting,
and rendering.
---
## WaveSpeed & Audio Infrastructure
- `backend/services/wavespeed/client.py`
- `WaveSpeedClient.submit_image_to_video(model_path, payload)`
submit WAN 2.5 / InfiniteTalk jobs and receive prediction IDs.
- `WaveSpeedClient.get_prediction_result(prediction_id)` /
`poll_until_complete(...)` shared polling helpers for render jobs.
- `WaveSpeedClient.generate_image(...)` synchronous Ideogram V3 /
Qwen image bytes (mirrors Image Studio usage).
- `WaveSpeedClient.generate_speech(...)` Minimax Speech 02 HD via
WaveSpeed; accepts `voice_id`, `speed`, `sample_rate`, etc. Returns
raw audio bytes (sync) or prediction IDs (async).
- `WaveSpeedClient.optimize_prompt(...)` prompt optimizer that can
improve image/video prompts before rendering.
- `backend/services/wavespeed/infinitetalk.py`
- `animate_scene_with_voiceover(...)` wraps InfiniteTalk (image +
narration to talking video). Enforces payload limits, pulls the
final MP4, and reports cost/duration metadata.
- `backend/services/llm_providers/main_audio_generation.py`
- `generate_audio(...)` subscription-aware TTS orchestration built
on `WaveSpeedClient.generate_speech`. Applies PricingService checks,
records UsageSummary/APIUsageLog entries, and returns provider/model
metadata for frontends.
---
## Research Providers & Adapters
- `backend/services/blog_writer/research/research_service.py`
- Central orchestrator for grounded research. Supports Google Search
grounding (Gemini) and Exa neural search via configurable provider.
- Calls `validate_research_operations` / `validate_exa_research_operations`
before touching external APIs and logs usage through PricingService.
- Returns fact cards (`ResearchSource`, `GroundingMetadata`) already
normalized for downstream mapping.
- `backend/services/blog_writer/research/exa_provider.py`
- `ExaResearchProvider.search(...)` Executes Exa queries, converts
results into `ResearchSource` objects, estimates cost, and tracks it.
- Provides helpers for excerpt extraction, aggregation, and usage
tracking (`track_exa_usage`).
- `backend/services/llm_providers/gemini_grounded_provider.py`
- Implements Gemini + Google Grounding calls with support for cached
metadata, chunk/support parsing, and debugging hooks used by Story
Writer and LinkedIn flows.
- `backend/api/research_config.py`
- Exposes feature flags such as `exa_available`, suggested categories,
- and other metadata needed by the frontend to decide provider options.
---
## Subscription & Pre-flight Validation
- `backend/services/subscription/preflight_validator.py`
- `validate_research_operations(pricing_service, user_id, gpt_provider)`
Blocks research runs if Gemini/HF token budgets would be exceeded
(covers Google Grounding + analyzer passes).
- `validate_exa_research_operations(...)` Same for Exa workflows;
validates Exa call count plus follow-up LLM usage.
- `validate_image_generation_operations(...)`,
`validate_image_upscale_operations(...)`,
`validate_image_editing_operations(...)` templates for validating
other expensive steps (useful for render queue and avatar creation).
- `backend/services/subscription/pricing_service.py`
- Provides `check_usage_limits`, `check_comprehensive_limits`, and
plan metadata (limits per provider) used across validators.
Frontends must call these validators (via thin API wrappers) before
initiating script generation, research, or rendering to surface tier
errors without wasting API calls.
---
## REST Routes to Reuse
### Story Writer (`backend/api/story_writer/router.py`)
- `POST /api/story/generate-setup` Generate initial story setups from
an idea (`story_setup.py::generate_story_setup`).
- `POST /api/story/generate-outline` Structured outline generation via
Gemini with persona/settings context.
- `POST /api/story/generate-images` Batch scene image creation backed
by WaveSpeed (WAN 2.5 / Ideogram). Returns per-scene URLs + metadata.
- `POST /api/story/generate-ai-audio` Minimax Speech 02 HD render for
a single scene with knob controls (voice, speed, pitch, emotion).
- `POST /api/story/optimize-prompt` WaveSpeed prompt optimization API
for cleaning up image/video prompts before rendering.
- `POST /api/story/generate-audio` Legacy multi-scene TTS (gTTS) if a
lower-cost fallback is needed.
- `GET /api/story/images/{filename}` & `/audio/{filename}` Authenticated
asset delivery for generated media.
These endpoints already enforce auth, asset tracking, and subscription
limits; the podcast UI should simply adopt their payloads.
### Blog Writer (`backend/api/blog_writer/router.py`)
- `POST /api/blog/research` (inside router earlier in file) Executes
grounded research via Google or Exa depending on `provider`.
- `POST /api/blog/flow-analysis/basic|advanced` Example of long-running
job orchestration with task IDs (pattern for script/performance analysis).
- `POST /api/blog/seo/analyze` & `/seo/metadata` Illustrate how to pass
authenticated user IDs into PricingService checks, useful for podcast
metadata generation.
- Cache endpoints (`GET/DELETE /api/blog/cache/*`) Provide research
cache stats/clear operations that podcast flows can reuse.
### Image Studio (`backend/api/images.py`)
- `POST /api/images/generate` Subscription-aware image creation with
asset tracking (pattern for cost estimates + upload paths).
- `GET /api/images/image-studio/images/{file}` Serves generated images;
demonstrates query-token auth used by `<img>` tags.
Reuse these routes for avatar defaults or background art inside the
podcast builder instead of writing bespoke services.
---
## Key Data Flow Hooks
- Research job polling: `backend/api/story_writer/routes/story_tasks.py`
plus `task_manager.py` define consistent job IDs and status payloads.
- Media job polling: `StoryImageGenerationService` and `StoryAudioGenerationService`
already drop artifacts into disk/CDN with tracked filenames; the
podcast render queue can subscribe to those patterns.
- Persona assets: onboarding routes in `backend/api/onboarding_endpoints.py`
expose upload endpoints for voice/avatars; pass resulting asset IDs to
the podcast APIs instead of raw files.
Use this reference to swap out the mock podcast helpers with production
APIs while staying inside existing authentication, subscription, and
asset storage conventions.

View File

@@ -0,0 +1,187 @@
# AI Podcast Maker - User Experience Enhancements
## ✅ Implemented Enhancements
### 1. **Hidden AI Backend Details**
- **Before**: "WaveSpeed audio rendering", "Google Grounding", "Exa Neural Search"
- **After**:
- "Natural voice narration" instead of "WaveSpeed audio"
- "Standard Research" and "Deep Research" instead of technical provider names
- "Voice" and "Visuals" instead of "TTS" and "Avatars"
- User-friendly descriptions throughout
### 2. **Improved Dashboard Integration**
- Updated `toolCategories.ts` with better description:
- **Old**: "Generate research-grounded podcast scripts and audio"
- **New**: "Create professional podcast episodes with AI-powered research, scriptwriting, and voice narration"
- Updated features list to be user-focused:
- **Old**: ['Research Workflow', 'Editable Script', 'Scene Approvals', 'WaveSpeed Audio']
- **New**: ['AI Research', 'Smart Scripting', 'Voice Narration', 'Export & Share', 'Episode Library']
### 3. **Inline Audio Player**
- Added `InlineAudioPlayer` component that:
- Plays audio directly in the UI (no new tab)
- Shows progress bar with time scrubbing
- Displays current time and duration
- Includes download button
- Better user experience than opening new tabs
### 4. **Enhanced Export & Sharing**
- Download button for completed audio files
- Share button with native sharing API support
- Fallback to clipboard copy if sharing not available
- Proper file naming based on scene title
### 5. **Better Button Labels & Tooltips**
- "Preview Sample" instead of "Preview"
- "Generate Audio" instead of "Start Full Render"
- "Help" instead of "Docs"
- "My Episodes" button for future episode library
- All tooltips explain user benefits, not technical details
### 6. **Improved Cost Display**
- Changed "TTS" to "Voice"
- Changed "Avatars" to "Visuals"
- Added tooltips explaining what each cost item means
- Removed technical provider names from cost display
## 🚀 Recommended Future Enhancements
### High Priority
#### 1. **Episode Templates & Presets**
```typescript
// Suggested templates:
- Interview Style (2 speakers, conversational)
- Educational (1 speaker, structured)
- Storytelling (1 speaker, narrative)
- News/Update (1 speaker, factual)
- Roundtable Discussion (3+ speakers)
```
**Benefits**:
- Faster episode creation
- Consistent quality
- Better for beginners
#### 2. **Episode Library/History**
- Save completed episodes
- View past episodes
- Re-edit or regenerate from saved projects
- Export history
**Implementation**:
- Add backend endpoint to save/load episodes
- Create episode list view
- Add search/filter functionality
#### 3. **Transcript & Show Notes Export**
- Auto-generate transcript from script
- Create show notes with:
- Episode summary
- Key points
- Timestamps
- Links to sources
- Export formats: PDF, Markdown, HTML
#### 4. **Cost Display Improvements**
- Show in credits (if subscription-based)
- "Estimated 5 credits" instead of "$2.50"
- Progress bar showing remaining budget
- Warning when approaching limits
#### 5. **Quick Start Wizard**
- Step-by-step guided creation
- Template selection
- Smart defaults based on template
- Skip advanced options for beginners
### Medium Priority
#### 6. **Real-time Collaboration**
- Share draft episodes with team
- Comments on scenes
- Approval workflow
- Version history
#### 7. **Voice Customization**
- Voice library with samples
- Voice cloning from samples
- Multiple voices per episode
- Voice emotion preview
#### 8. **Smart Editing**
- AI-powered script suggestions
- Grammar and flow improvements
- Pacing recommendations
- Natural pause detection
#### 9. **Analytics & Insights**
- Episode performance metrics
- Listener engagement predictions
- SEO optimization suggestions
- Social sharing optimization
#### 10. **Integration Features**
- Direct upload to podcast platforms (Spotify, Apple Podcasts)
- RSS feed generation
- Social media preview cards
- Blog post integration
### Low Priority / Nice to Have
#### 11. **Background Music**
- Royalty-free music library
- Auto-sync with script pacing
- Fade in/out controls
#### 12. **Multi-language Support**
- Translate scripts
- Generate audio in multiple languages
- Localized voice options
#### 13. **Mobile App**
- Create episodes on the go
- Voice recording integration
- Quick edits
#### 14. **AI Guest Suggestions**
- Suggest relevant experts
- Generate interview questions
- Contact information lookup
## 📋 Implementation Checklist
### Completed ✅
- [x] Hide technical terms (WaveSpeed, Google Grounding, Exa)
- [x] Update dashboard description
- [x] Add inline audio player
- [x] Add download/share buttons
- [x] Improve button labels and tooltips
- [x] Better cost display with user-friendly terms
### Next Steps (Recommended Order)
1. [ ] Episode templates/presets
2. [ ] Episode library backend + UI
3. [ ] Transcript export
4. [ ] Show notes generation
5. [ ] Cost display in credits
6. [ ] Quick start wizard
## 🎯 User Experience Principles Applied
1. **Hide Complexity**: Users don't need to know about "WaveSpeed" or "Minimax" - they just want good audio
2. **Focus on Outcomes**: "Generate Audio" not "Start Full Render"
3. **Provide Context**: Tooltips explain *why* not *how*
4. **Reduce Friction**: Inline player instead of new tabs
5. **Enable Sharing**: Easy export and sharing options
6. **Guide Users**: Clear labels and helpful descriptions
## 💡 Key Insights
- **Technical terms confuse users**: "WaveSpeed" means nothing to end users
- **Actions should be clear**: "Generate Audio" is better than "Start Full Render"
- **Inline experiences are better**: No need to open new tabs for previews
- **Export is essential**: Users need to download and share their work
- **Templates reduce friction**: Most users want quick starts, not full customization

View File

@@ -0,0 +1,295 @@
# Podcast Maker External API Call Analysis
## Overview
This document analyzes all external API calls made during the podcast creation workflow and how they scale with duration, number of speakers, and other factors.
---
## External API Providers
1. **Gemini (Google)** - LLM for story setup and script generation
2. **Google Grounding** - Research via Gemini's native search grounding
3. **Exa** - Alternative neural search provider for research
4. **WaveSpeed** - API gateway for:
- **Minimax Speech 02 HD** - Text-to-Speech (TTS)
- **InfiniteTalk** - Avatar animation (image + audio → video)
---
## Workflow Phases & API Calls
### Phase 1: Project Creation (`createProject`)
**External API Calls:**
1. **Gemini LLM** - Story setup generation
- **Endpoint**: `/api/story/generate-setup`
- **Backend**: `storyWriterApi.generateStorySetup()`
- **Service**: `backend/services/story_writer/service_components/setup.py`
- **Function**: `llm_text_gen()` → Gemini API
- **Calls per project**: **1 call**
- **Scaling**: Fixed (1 call regardless of duration)
2. **Research Config** (Optional)
- **Endpoint**: `/api/research-config`
- **Calls per project**: **0-1 call** (cached)
- **Scaling**: Fixed
**Total Phase 1**: **1-2 external API calls** (fixed)
---
### Phase 2: Research (`runResearch`)
**External API Calls:**
1. **Google Grounding** (via Gemini) OR **Exa Neural Search**
- **Endpoint**: `/api/blog/research/start` → async task
- **Backend**: `blogWriterApi.startResearch()`
- **Service**: `backend/services/blog_writer/research/research_service.py`
- **Provider Selection**:
- **Google Grounding**: Uses Gemini's native Google Search grounding
- **Exa**: Direct Exa API calls
- **Calls per research**: **1 call** (handles all keywords in one request)
- **Scaling**:
- **Fixed per research operation** (1 call regardless of number of queries)
- **Queries are batched** into a single research request
- **Number of queries**: Typically 1-6 (from `mapPersonaQueries`)
**Polling Calls:**
- **Internal task polling**: `blogWriterApi.pollResearchStatus()`
- **Not external API calls** (internal task status checks)
- **Polling frequency**: Every 2.5 seconds, max 120 attempts (5 minutes)
**Total Phase 2**: **1 external API call** (fixed per research operation)
---
### Phase 3: Script Generation (`generateScript`)
**External API Calls:**
1. **Gemini LLM** - Story outline generation
- **Endpoint**: `/api/story/generate-outline`
- **Backend**: `storyWriterApi.generateOutline()`
- **Service**: `backend/services/story_writer/service_components/outline.py`
- **Function**: `llm_text_gen()` → Gemini API
- **Calls per script**: **1 call**
- **Scaling**:
- **Fixed per script generation** (1 call regardless of duration)
- **Duration affects output length** (more scenes), but not number of API calls
**Total Phase 3**: **1 external API call** (fixed)
---
### Phase 4: Audio Rendering (`renderSceneAudio`)
**External API Calls:**
1. **WaveSpeed → Minimax Speech 02 HD** - Text-to-Speech
- **Endpoint**: `/api/story/generate-audio`
- **Backend**: `storyWriterApi.generateAIAudio()`
- **Service**: `backend/services/wavespeed/client.py::generate_speech()`
- **External API**: WaveSpeed API → Minimax Speech 02 HD
- **Calls per scene**: **1 call per scene**
- **Scaling with duration**:
- **Number of scenes** = `Math.ceil((duration * 60) / scene_length_target)`
- **Default scene_length_target**: 45 seconds
- **Example calculations**:
- 5 minutes → `ceil(300 / 45)` = **7 scenes** = **7 TTS calls**
- 10 minutes → `ceil(600 / 45)` = **14 scenes** = **14 TTS calls**
- 15 minutes → `ceil(900 / 45)` = **20 scenes** = **20 TTS calls**
- 30 minutes → `ceil(1800 / 45)` = **40 scenes** = **40 TTS calls**
- **Scaling with speakers**:
- **Fixed per scene** (1 call per scene regardless of speakers)
- **Speakers affect text splitting** (lines per speaker), but not API calls
- **Text length per call**:
- **Characters per scene** ≈ `(scene_length_target * 15)` (assuming ~15 chars/second)
- **5-minute podcast**: ~675 chars/scene × 7 scenes = ~4,725 total chars
- **30-minute podcast**: ~675 chars/scene × 40 scenes = ~27,000 total chars
**Total Phase 4**: **N external API calls** where **N = number of scenes**
---
### Phase 5: Video Rendering (`generateVideo`) - Optional
**External API Calls:**
1. **WaveSpeed → InfiniteTalk** - Avatar animation
- **Endpoint**: `/api/podcast/render/video`
- **Backend**: `podcastApi.generateVideo()`
- **Service**: `backend/services/wavespeed/infinitetalk.py::animate_scene_with_voiceover()`
- **External API**: WaveSpeed API → InfiniteTalk
- **Calls per scene**: **1 call per scene** (if video is generated)
- **Scaling with duration**:
- **Same as audio rendering**: 1 call per scene
- **5 minutes**: **7 video calls**
- **10 minutes**: **14 video calls**
- **15 minutes**: **20 video calls**
- **30 minutes**: **40 video calls**
- **Scaling with speakers**:
- **Fixed per scene** (1 call per scene regardless of speakers)
- **Avatar image is provided** (not generated per speaker)
**Polling Calls:**
- **Internal task polling**: `podcastApi.pollTaskStatus()`
- **Not external API calls** (internal task status checks)
- **Polling frequency**: Every 2.5 seconds until completion (can take up to 10 minutes per video)
**Total Phase 5**: **N external API calls** where **N = number of scenes** (if video is enabled)
---
## Summary: Total External API Calls
### Minimum Workflow (No Video, 5-minute podcast)
1. Project Creation: **1 call** (Gemini - story setup)
2. Research: **1 call** (Google Grounding or Exa)
3. Script Generation: **1 call** (Gemini - outline)
4. Audio Rendering: **7 calls** (Minimax TTS - 7 scenes)
5. Video Rendering: **0 calls** (not enabled)
**Total**: **10 external API calls** for a 5-minute podcast
### Full Workflow (With Video, 5-minute podcast)
1. Project Creation: **1 call** (Gemini - story setup)
2. Research: **1 call** (Google Grounding or Exa)
3. Script Generation: **1 call** (Gemini - outline)
4. Audio Rendering: **7 calls** (Minimax TTS - 7 scenes)
5. Video Rendering: **7 calls** (InfiniteTalk - 7 scenes)
**Total**: **17 external API calls** for a 5-minute podcast
### Scaling with Duration
| Duration | Scenes | Audio Calls | Video Calls | Total (Audio Only) | Total (Audio + Video) |
|----------|--------|-------------|-------------|-------------------|----------------------|
| 5 min | 7 | 7 | 7 | 10 | 17 |
| 10 min | 14 | 14 | 14 | 17 | 31 |
| 15 min | 20 | 20 | 20 | 23 | 43 |
| 30 min | 40 | 40 | 40 | 43 | 83 |
**Formula**:
- **Scenes** = `ceil((duration_minutes * 60) / scene_length_target)`
- **Total (Audio Only)** = `3 + scenes` (3 fixed + N scenes)
- **Total (Audio + Video)** = `3 + (scenes * 2)` (3 fixed + N audio + N video)
---
## Scaling Factors
### 1. Duration
- **Impact**: Linear scaling of rendering calls (audio + video)
- **Fixed calls**: 3 (setup, research, script)
- **Variable calls**: `2 * scenes` (if video enabled) or `1 * scenes` (audio only)
- **Scene count formula**: `ceil((duration * 60) / scene_length_target)`
### 2. Number of Speakers
- **Impact**: **No impact on external API calls**
- **Reason**:
- Text is split into lines per speaker **before** API calls
- Each scene makes **1 TTS call** regardless of speaker count
- Video uses **1 avatar image** (not per speaker)
### 3. Scene Length Target
- **Impact**: Affects number of scenes (and thus rendering calls)
- **Default**: 45 seconds
- **Shorter scenes** = More scenes = More API calls
- **Longer scenes** = Fewer scenes = Fewer API calls
### 4. Research Provider
- **Impact**: **No impact on call count**
- **Google Grounding**: 1 call (batched)
- **Exa**: 1 call (batched)
- **Both**: Same number of calls
### 5. Video Generation
- **Impact**: **Doubles rendering calls** (adds 1 call per scene)
- **Audio only**: `N` calls (N = scenes)
- **Audio + Video**: `2N` calls (N audio + N video)
---
## Cost Implications
### API Call Costs (Estimated)
1. **Gemini LLM** (Story Setup & Script):
- **Setup**: ~2,000 tokens → ~$0.001-0.002
- **Outline**: ~3,000-5,000 tokens → ~$0.002-0.005
- **Total**: ~$0.003-0.007 per podcast
2. **Google Grounding** (Research):
- **Per research**: ~1,200 tokens → ~$0.001-0.002
- **Fixed cost** regardless of query count
3. **Exa Neural Search** (Alternative):
- **Per research**: ~$0.005 (flat rate)
- **Fixed cost** regardless of query count
4. **Minimax TTS** (Audio):
- **Per scene**: ~$0.05 per 1,000 characters
- **5-minute podcast**: ~4,725 chars → ~$0.24
- **30-minute podcast**: ~27,000 chars → ~$1.35
- **Scales linearly with duration**
5. **InfiniteTalk** (Video):
- **Per scene**: ~$0.03-0.06 per second (depending on resolution)
- **5-minute podcast**: 7 scenes × 45s × $0.03 = ~$9.45
- **30-minute podcast**: 40 scenes × 45s × $0.03 = ~$54.00
- **Scales linearly with duration**
### Total Cost Examples
| Duration | Audio Only | Audio + Video (720p) |
|----------|-----------|---------------------|
| 5 min | ~$0.25 | ~$9.50 |
| 10 min | ~$0.50 | ~$19.00 |
| 15 min | ~$0.75 | ~$28.50 |
| 30 min | ~$1.50 | ~$57.00 |
**Note**: Costs are estimates and may vary based on actual API pricing, text length, and video resolution.
---
## Optimization Opportunities
1. **Batch TTS Calls**: Currently 1 call per scene. Could batch multiple scenes if API supports it.
2. **Cache Research Results**: Already implemented for exact keyword matches.
3. **Parallel Rendering**: Audio and video rendering could be parallelized per scene.
4. **Scene Length Optimization**: Longer scenes = fewer API calls (but may reduce quality).
5. **Video Optional**: Video generation doubles costs - make it optional/on-demand.
---
## Internal vs External Calls
### Internal (Not Counted as External)
- Preflight validation checks (`/api/billing/preflight`)
- Task status polling (`/api/story/task/{taskId}/status`)
- Project persistence (`/api/podcast/projects/*`)
- Content asset library (`/api/content-assets/*`)
### External (Counted)
- Gemini LLM (story setup, script generation)
- Google Grounding (research)
- Exa (research alternative)
- WaveSpeed → Minimax TTS (audio)
- WaveSpeed → InfiniteTalk (video)
---
## Conclusion
**Key Findings:**
1. **Fixed overhead**: 3 external API calls per podcast (setup, research, script)
2. **Variable overhead**: 1-2 calls per scene (audio, optionally video)
3. **Duration is the primary scaling factor** for rendering calls
4. **Number of speakers does NOT affect API call count**
5. **Video generation doubles rendering API calls**
**Recommendations:**
- Monitor API call counts and costs per podcast duration
- Consider batching strategies for TTS calls if supported
- Make video generation optional/on-demand to reduce costs
- Optimize scene length to balance quality vs. API call count

View File

@@ -0,0 +1,167 @@
# Podcast Maker - Persistence & Asset Library Integration
## ✅ Phase 1 Implementation Complete
### 1. **Backend Changes**
#### AssetSource Enum Update
- ✅ Added `PODCAST_MAKER = "podcast_maker"` to `backend/models/content_asset_models.py`
- Allows podcast episodes to be tracked in the unified asset library
#### Content Assets API Enhancement
- ✅ Added `POST /api/content-assets/` endpoint in `backend/api/content_assets/router.py`
- Enables frontend to save audio files directly to asset library
- Validates asset_type and source_module enums
- Returns created asset with full metadata
### 2. **Frontend Changes**
#### Persistence Hook (`usePodcastProjectState.ts`)
- ✅ Created comprehensive state management hook
- ✅ Auto-saves to `localStorage` on every state change
- ✅ Restores state on page load/refresh
- ✅ Tracks all project data:
- Project metadata (id, idea, duration, speakers)
- Step results (analysis, queries, research, script)
- Render jobs with status and progress
- Settings (knobs, research provider, budget cap)
- UI state (current step, visibility flags)
- ✅ Handles Set serialization/deserialization for JSON storage
- ✅ Provides helper functions: `resetState`, `initializeProject`
#### Podcast Dashboard Integration
- ✅ Refactored `PodcastDashboard.tsx` to use persistence hook
- ✅ All state now persists automatically
- ✅ Resume alert shows when project is restored
- ✅ "My Episodes" button navigates to Asset Library filtered by podcasts
- ✅ Recent Episodes preview component shows latest 6 episodes
#### Render Queue Enhancement
- ✅ Updated to use persisted render jobs
- ✅ Auto-saves completed audio files to Asset Library
- ✅ Includes metadata: project_id, scene_id, cost, provider, model
- ✅ Proper initialization when moving to render phase
#### Script Editor Enhancement
- ✅ Syncs script changes with persisted state
- ✅ Prevents regeneration if script already exists
- ✅ Scene approvals persist across refreshes
#### Asset Library Integration
- ✅ Updated `AssetLibrary.tsx` to read URL search params
- ✅ Supports filtering by `source_module` and `asset_type` from URL
- ✅ Navigation: `/asset-library?source_module=podcast_maker&asset_type=audio`
### 3. **API Service Updates**
#### Podcast API (`podcastApi.ts`)
- ✅ Added `saveAudioToAssetLibrary()` function
- ✅ Saves audio files with proper metadata
- ✅ Tags assets with project_id for easy filtering
- ✅ Includes cost, provider, and model information
## 🔄 How It Works
### LocalStorage Persistence Flow
1. **User creates project** → State saved to `localStorage` with key `podcast_project_state`
2. **Each step completion** → State automatically updated in `localStorage`
3. **Browser refresh** → State restored from `localStorage` on mount
4. **Resume alert** → Shows which step was in progress
5. **Audio generation** → Completed files saved to Asset Library via API
### Asset Library Integration Flow
1. **Audio render completes**`saveAudioToAssetLibrary()` called
2. **Backend saves asset** → Creates entry in `content_assets` table
3. **Asset appears in library** → Filterable by `source_module=podcast_maker`
4. **User navigates** → "My Episodes" button opens filtered Asset Library view
5. **Unified management** → All podcast episodes visible alongside other content
## 📋 State Structure
```typescript
interface PodcastProjectState {
// Project metadata
project: { id: string; idea: string; duration: number; speakers: number } | null;
// Step results
analysis: PodcastAnalysis | null;
queries: Query[];
selectedQueries: Set<string>;
research: Research | null;
rawResearch: BlogResearchResponse | null;
estimate: PodcastEstimate | null;
scriptData: Script | null;
// Render jobs
renderJobs: Job[];
// Settings
knobs: Knobs;
researchProvider: ResearchProvider;
budgetCap: number;
// UI state
showScriptEditor: boolean;
showRenderQueue: boolean;
currentStep: 'create' | 'analysis' | 'research' | 'script' | 'render' | null;
// Timestamps
createdAt?: string;
updatedAt?: string;
}
```
## 🎯 User Experience
### Resume After Refresh
- User creates project → Works on analysis → Refreshes browser
- ✅ Project state restored
- ✅ Resume alert shows "Resuming from Analysis step"
- ✅ User can continue where they left off
### Resume After Restart
- User completes research → Closes browser → Returns later
- ✅ Project state restored from localStorage
- ✅ All research data available
- ✅ Can proceed to script generation
### Asset Library Access
- User completes episode → Audio saved to library
- ✅ "My Episodes" button shows all podcast episodes
- ✅ Filtered view: `source_module=podcast_maker&asset_type=audio`
- ✅ Can download, share, favorite episodes
- ✅ Unified with all other ALwrity content
## 🚀 Phase 2: Database Persistence (Future)
For long-term persistence across devices/browsers:
1. **Create `podcast_projects` table** or use `content_assets` with project metadata
2. **Add endpoints**:
- `POST /api/podcast/projects` - Save project snapshot
- `GET /api/podcast/projects/{id}` - Load project
- `GET /api/podcast/projects` - List user's projects
3. **Sync strategy**: Save to DB after each major step completion
4. **Resume UI**: Show list of saved projects on dashboard
## ✅ Testing Checklist
- [x] Project state persists after browser refresh
- [x] Resume alert shows correct step
- [x] Script doesn't regenerate if already exists
- [x] Render jobs persist and restore correctly
- [x] Audio files save to Asset Library
- [x] Asset Library filters by podcast_maker
- [x] Navigation to Asset Library works
- [x] Recent Episodes preview displays correctly
- [x] No console errors or warnings
## 📝 Notes
- **localStorage limit**: ~5-10MB per domain. Podcast projects are typically <100KB, so safe.
- **Data loss risk**: localStorage can be cleared by user. Phase 2 (DB persistence) will address this.
- **Cross-device**: localStorage is browser-specific. Phase 2 will enable cross-device access.
- **Performance**: Auto-save happens on every state change. Debouncing could be added if needed.

View File

@@ -0,0 +1,261 @@
# AI Podcast Maker Integration Plan - Completion Status
## Overview
This document tracks the completion status of each item in the AI Podcast Maker Integration Plan.
---
## 1. Backend Discovery & Interfaces ✅ **COMPLETED**
**Status**: ✅ Complete
**Completed Items**:
- ✅ Reviewed existing services in `backend/services/wavespeed/`, `backend/services/minimax/`
- ✅ Reviewed research adapters (Google Grounding, Exa)
- ✅ Documented REST routes in `backend/api/story_writer/`, `backend/api/blog_writer/`
- ✅ Created `docs/AI_PODCAST_BACKEND_REFERENCE.md` with comprehensive API documentation
**Evidence**:
- `docs/AI_PODCAST_BACKEND_REFERENCE.md` exists and catalogs all relevant endpoints
- `frontend/src/services/podcastApi.ts` uses real backend endpoints
- Backend services properly integrated
---
## 2. Frontend Data Layer Refactor ✅ **COMPLETED**
**Status**: ✅ Complete
**Completed Items**:
- ✅ Replaced all mock helpers with real API wrappers in `podcastApi.ts`
- ✅ Integrated with `aiApiClient` and `pollingApiClient` for backend communication
- ✅ Implemented job polling helper (`waitForTaskCompletion`) for async research/render jobs
- ✅ All API calls use real endpoints (createProject, runResearch, generateScript, renderSceneAudio)
**Evidence**:
- `frontend/src/services/podcastApi.ts` - All functions use real API calls
- No mock data remaining in the codebase
- Proper error handling and async job polling implemented
---
## 3. Subscription & Cost Safeguards ⚠️ **PARTIALLY COMPLETED**
**Status**: ⚠️ Partial - Preflight checks implemented, but UI blocking needs enhancement
**Completed Items**:
- ✅ Pre-flight validation implemented (`ensurePreflight` function)
- ✅ Preflight checks before research (`runResearch`) - lines 286-291
- ✅ Preflight checks before script generation (`generateScript`) - lines 307-312
- ✅ Preflight checks before render operations (`renderSceneAudio`) - lines 373-378
- ✅ Preflight checks before preview (`previewLine`) - lines 344-349
- ✅ Cost estimation function (`estimateCosts`) implemented
- ✅ Estimate displayed in UI
**Missing/Incomplete Items**:
- ⚠️ UI blocking when preflight fails - errors are thrown but UI doesn't proactively prevent actions
- ⚠️ Budget cap enforcement - budget cap is set but not enforced before expensive operations
- ⚠️ Subscription tier-based UI restrictions - HD/multi-speaker modes not hidden for lower tiers
- ⚠️ Preflight validation UI feedback - users don't see why operations are blocked
**Evidence**:
- `frontend/src/services/podcastApi.ts` lines 210-217, 286-291, 307-312, 344-349, 373-378 show preflight checks
- `frontend/src/components/PodcastMaker/PodcastDashboard.tsx` shows estimate but no proactive blocking UI
**Recommendations**:
- Add UI blocking before render operations if preflight fails
- Enforce budget cap before expensive operations
- Hide premium features based on subscription tier
---
## 4. Research Workflow Integration ✅ **COMPLETED**
**Status**: ✅ Complete
**Completed Items**:
- ✅ "Generate queries" wired to backend (uses `storyWriterApi.generateStorySetup`)
- ✅ "Run research" wired to backend Google Grounding & Exa routes
- ✅ Query selection UI implemented
- ✅ Research provider selection (Google/Exa) implemented
- ✅ Async research jobs handled with polling (`waitForTaskCompletion`)
- ✅ Fact cards map correctly to script lines
- ✅ Error/timeout handling implemented
**Evidence**:
- `frontend/src/services/podcastApi.ts` lines 265-297 - `runResearch` function
- `frontend/src/components/PodcastMaker/PodcastDashboard.tsx` - Research UI with provider selection
- Research polling uses `blogWriterApi.pollResearchStatus`
---
## 5. Script Authoring & Approvals ✅ **COMPLETED**
**Status**: ✅ Complete
**Completed Items**:
- ✅ Script generation tied to story writer script API (Gemini-based)
- ✅ Scene IDs persisted from backend
- ✅ Scene approval toggles replaced with actual `/script/approve` API calls
- ✅ Backend gating matches UI state (`approveScene` function)
- ✅ TTS preview implemented using Minimax/WaveSpeed (`previewLine` function)
**Evidence**:
- `frontend/src/services/podcastApi.ts` lines 299-360 - `generateScript` function
- `frontend/src/services/podcastApi.ts` lines 404-411 - `approveScene` function
- `frontend/src/services/podcastApi.ts` lines 362-400 - `previewLine` function
- `backend/api/story_writer/routes/story_content.py` - Scene approval endpoint
---
## 6. Rendering Pipeline ⚠️ **PARTIALLY COMPLETED**
**Status**: ⚠️ Partial - Audio rendering works, but video/avatar rendering not implemented
**Completed Items**:
- ✅ Preview/full render buttons connected to WaveSpeed/Minimax render routes
- ✅ Scene content, knob settings supplied to render API
- ✅ Audio rendering working (`renderSceneAudio`)
- ✅ Render job status tracking in UI
- ✅ Audio files saved to asset library
**Missing/Incomplete Items**:
- ❌ Video rendering not implemented (only audio)
- ❌ Avatar rendering not implemented
- ❌ Job polling for render progress (`/media/jobs/{jobId}`) not implemented
- ❌ Render cancellation not implemented
- ⚠️ Polling intervals cleanup on unmount - needs verification
**Evidence**:
- `frontend/src/services/podcastApi.ts` lines 413-451 - `renderSceneAudio` function
- `frontend/src/components/PodcastMaker/RenderQueue.tsx` - Render queue UI
- Audio generation works, but video/avatar features not implemented
**Recommendations**:
- Implement video rendering using WaveSpeed InfiniteTalk
- Add avatar rendering support
- Implement job polling for long-running render operations
- Add cancellation support
---
## 7. Testing & Telemetry ⚠️ **PARTIALLY COMPLETED**
**Status**: ⚠️ Partial - Logging integrated, but no formal tests
**Completed Items**:
- ✅ Logging integrated with centralized logger (backend uses `loguru`)
- ✅ Error handling and user feedback implemented
- ✅ Structured events for observability (backend logging)
**Missing/Incomplete Items**:
- ❌ Integration tests not created
- ❌ Storybook fixtures not created
- ❌ UI transition tests not implemented
- ❌ Error state tests not implemented
**Evidence**:
- Backend services use `loguru` logger
- Frontend has error handling but no tests
- No test files found for podcast maker
**Recommendations**:
- Create integration tests for API endpoints
- Add Storybook fixtures for UI components
- Test UI transitions and error states
---
## 8. Rollout Considerations ⚠️ **PARTIALLY COMPLETED**
**Status**: ⚠️ Partial - Basic fallbacks exist, but subscription tier restrictions not implemented
**Completed Items**:
- ✅ Fallback to stock voices if voice cloning unavailable
- ✅ Basic error handling and graceful degradation
**Missing/Incomplete Items**:
- ❌ Subscription tier validation not implemented
- ❌ HD quality options not hidden for lower plans
- ❌ Multi-speaker modes not restricted by subscription tier
- ❌ Quality options not filtered by user tier
**Evidence**:
- `frontend/src/components/PodcastMaker/CreateModal.tsx` - Quality options always visible
- No subscription tier checks in UI
- No tier-based feature restrictions
**Recommendations**:
- Add subscription tier checks before showing premium options
- Hide HD/multi-speaker for lower tiers
- Add tier-based UI restrictions
---
## Summary
### Overall Completion: ~75%
**Fully Completed (5/8)**:
1. ✅ Backend Discovery & Interfaces
2. ✅ Frontend Data Layer Refactor
3. ✅ Research Workflow Integration
4. ✅ Script Authoring & Approvals
5. ✅ Database Persistence (Phase 2 - Bonus)
**Partially Completed (4/8)**:
1. ⚠️ Subscription & Cost Safeguards (80% - preflight checks exist, needs better UI feedback and budget enforcement)
2. ⚠️ Rendering Pipeline (60% - audio works, video/avatar missing, no job polling)
3. ⚠️ Testing & Telemetry (40% - logging yes, tests no)
4. ⚠️ Rollout Considerations (30% - basic fallbacks, no tier restrictions)
### Priority Next Steps:
1. **High Priority**:
- Add UI blocking for preflight validation failures
- Implement budget cap enforcement
- Add subscription tier-based UI restrictions
2. **Medium Priority**:
- Implement video rendering (WaveSpeed InfiniteTalk)
- Add render job polling for progress tracking
- Implement render cancellation
3. **Low Priority**:
- Create integration tests
- Add Storybook fixtures
- Comprehensive error state testing
---
## Additional Completed Items (Beyond Original Plan)
### Phase 2 - Database Persistence ✅ **COMPLETED**
- ✅ Database model created (`PodcastProject`)
- ✅ API endpoints for save/load/list projects
- ✅ Automatic database sync after major steps
- ✅ Project list view for resume
- ✅ Cross-device persistence working
### UI/UX Enhancements ✅ **COMPLETED**
- ✅ Modern AI-like styling with MUI and Tailwind
- ✅ Compact UI design
- ✅ Well-written tooltips and messages
- ✅ Progress stepper visualization
- ✅ Component refactoring for maintainability
### Asset Library Integration ✅ **COMPLETED**
- ✅ Completed audio files saved to asset library
- ✅ Asset Library filtering by podcast source
- ✅ "My Episodes" navigation button
---
## Notes
- The core functionality is working and production-ready
- Audio generation is fully functional
- Database persistence enables cross-device resume
- UI is modern and user-friendly
- Main gaps are in video/avatar rendering and subscription tier restrictions