AI Image Studio, AI podcast Maker, AI product Marketing

2025-11-28 14:33:52 +05:30
parent 77d7c0cde6
commit 49e2131715
122 changed files with 22311 additions and 4331 deletions
--- a/docs/AI_PODCAST_BACKEND_REFERENCE.md
+++ b/docs/AI_PODCAST_BACKEND_REFERENCE.md
@@ -0,0 +1,148 @@
+# AI Podcast Backend Reference
+
+Curated overview of the backend surfaces that the AI Podcast Maker
+should call. Covers service clients, research providers, subscription
+controls, and FastAPI routes relevant to analysis, research, scripting,
+and rendering.
+
+---
+
+## WaveSpeed & Audio Infrastructure
+
+- `backend/services/wavespeed/client.py`
+  - `WaveSpeedClient.submit_image_to_video(model_path, payload)` –
+    submit WAN 2.5 / InfiniteTalk jobs and receive prediction IDs.
+  - `WaveSpeedClient.get_prediction_result(prediction_id)` /
+    `poll_until_complete(...)` – shared polling helpers for render jobs.
+  - `WaveSpeedClient.generate_image(...)` – synchronous Ideogram V3 /
+    Qwen image bytes (mirrors Image Studio usage).
+  - `WaveSpeedClient.generate_speech(...)` – Minimax Speech 02 HD via
+    WaveSpeed; accepts `voice_id`, `speed`, `sample_rate`, etc. Returns
+    raw audio bytes (sync) or prediction IDs (async).
+  - `WaveSpeedClient.optimize_prompt(...)` – prompt optimizer that can
+    improve image/video prompts before rendering.
+
+- `backend/services/wavespeed/infinitetalk.py`
+  - `animate_scene_with_voiceover(...)` – wraps InfiniteTalk (image +
+    narration to talking video). Enforces payload limits, pulls the
+    final MP4, and reports cost/duration metadata.
+
+- `backend/services/llm_providers/main_audio_generation.py`
+  - `generate_audio(...)` – subscription-aware TTS orchestration built
+    on `WaveSpeedClient.generate_speech`. Applies PricingService checks,
+    records UsageSummary/APIUsageLog entries, and returns provider/model
+    metadata for frontends.
+
+---
+
+## Research Providers & Adapters
+
+- `backend/services/blog_writer/research/research_service.py`
+  - Central orchestrator for grounded research. Supports Google Search
+    grounding (Gemini) and Exa neural search via configurable provider.
+  - Calls `validate_research_operations` / `validate_exa_research_operations`
+    before touching external APIs and logs usage through PricingService.
+  - Returns fact cards (`ResearchSource`, `GroundingMetadata`) already
+    normalized for downstream mapping.
+
+- `backend/services/blog_writer/research/exa_provider.py`
+  - `ExaResearchProvider.search(...)` – Executes Exa queries, converts
+    results into `ResearchSource` objects, estimates cost, and tracks it.
+  - Provides helpers for excerpt extraction, aggregation, and usage
+    tracking (`track_exa_usage`).
+
+- `backend/services/llm_providers/gemini_grounded_provider.py`
+  - Implements Gemini + Google Grounding calls with support for cached
+    metadata, chunk/support parsing, and debugging hooks used by Story
+    Writer and LinkedIn flows.
+
+- `backend/api/research_config.py`
+  - Exposes feature flags such as `exa_available`, suggested categories,
+  - and other metadata needed by the frontend to decide provider options.
+
+---
+
+## Subscription & Pre-flight Validation
+
+- `backend/services/subscription/preflight_validator.py`
+  - `validate_research_operations(pricing_service, user_id, gpt_provider)`
+    – Blocks research runs if Gemini/HF token budgets would be exceeded
+    (covers Google Grounding + analyzer passes).
+  - `validate_exa_research_operations(...)` – Same for Exa workflows;
+    validates Exa call count plus follow-up LLM usage.
+  - `validate_image_generation_operations(...)`,
+    `validate_image_upscale_operations(...)`,
+    `validate_image_editing_operations(...)` – templates for validating
+    other expensive steps (useful for render queue and avatar creation).
+
+- `backend/services/subscription/pricing_service.py`
+  - Provides `check_usage_limits`, `check_comprehensive_limits`, and
+    plan metadata (limits per provider) used across validators.
+
+Frontends must call these validators (via thin API wrappers) before
+initiating script generation, research, or rendering to surface tier
+errors without wasting API calls.
+
+---
+
+## REST Routes to Reuse
+
+### Story Writer (`backend/api/story_writer/router.py`)
+
+- `POST /api/story/generate-setup` – Generate initial story setups from
+  an idea (`story_setup.py::generate_story_setup`).
+- `POST /api/story/generate-outline` – Structured outline generation via
+  Gemini with persona/settings context.
+- `POST /api/story/generate-images` – Batch scene image creation backed
+  by WaveSpeed (WAN 2.5 / Ideogram). Returns per-scene URLs + metadata.
+- `POST /api/story/generate-ai-audio` – Minimax Speech 02 HD render for
+  a single scene with knob controls (voice, speed, pitch, emotion).
+- `POST /api/story/optimize-prompt` – WaveSpeed prompt optimization API
+  for cleaning up image/video prompts before rendering.
+- `POST /api/story/generate-audio` – Legacy multi-scene TTS (gTTS) if a
+  lower-cost fallback is needed.
+- `GET /api/story/images/{filename}` & `/audio/{filename}` – Authenticated
+  asset delivery for generated media.
+
+These endpoints already enforce auth, asset tracking, and subscription
+limits; the podcast UI should simply adopt their payloads.
+
+### Blog Writer (`backend/api/blog_writer/router.py`)
+
+- `POST /api/blog/research` (inside router earlier in file) – Executes
+  grounded research via Google or Exa depending on `provider`.
+- `POST /api/blog/flow-analysis/basic|advanced` – Example of long-running
+  job orchestration with task IDs (pattern for script/performance analysis).
+- `POST /api/blog/seo/analyze` & `/seo/metadata` – Illustrate how to pass
+  authenticated user IDs into PricingService checks, useful for podcast
+  metadata generation.
+- Cache endpoints (`GET/DELETE /api/blog/cache/*`) – Provide research
+  cache stats/clear operations that podcast flows can reuse.
+
+### Image Studio (`backend/api/images.py`)
+
+- `POST /api/images/generate` – Subscription-aware image creation with
+  asset tracking (pattern for cost estimates + upload paths).
+- `GET /api/images/image-studio/images/{file}` – Serves generated images;
+  demonstrates query-token auth used by `<img>` tags.
+
+Reuse these routes for avatar defaults or background art inside the
+podcast builder instead of writing bespoke services.
+
+---
+
+## Key Data Flow Hooks
+
+- Research job polling: `backend/api/story_writer/routes/story_tasks.py`
+  plus `task_manager.py` define consistent job IDs and status payloads.
+- Media job polling: `StoryImageGenerationService` and `StoryAudioGenerationService`
+  already drop artifacts into disk/CDN with tracked filenames; the
+  podcast render queue can subscribe to those patterns.
+- Persona assets: onboarding routes in `backend/api/onboarding_endpoints.py`
+  expose upload endpoints for voice/avatars; pass resulting asset IDs to
+  the podcast APIs instead of raw files.
+
+Use this reference to swap out the mock podcast helpers with production
+APIs while staying inside existing authentication, subscription, and
+asset storage conventions.
+