Files
ALwrity/docs/AI_BLOG_WRITER_IMPLEMENTATION_SPEC.md

428 lines
19 KiB
Markdown

## AI Blog Writer — Implementation Specification (Copilot-first, Research-led)
### Overview
- **Goal**: Build a SOTA AI blog writer that guides non-technical users end-to-end: research → outline → section generation → quality/SEO → publishing.
- **Approach**: Copilot-first UX using CopilotKit. Reuse LinkedIn assistive writing patterns: Google Search grounding, Exa research, hallucination detector, quality analysis, citations.
- **User Interaction Model**: The user only talks to the Copilot; the editor reflects all state and changes via generative UI and HITL confirmations.
### 🚀 **Current Implementation Status** (Updated: December 2024)
**✅ COMPLETED PHASES:**
- **Stage 1: Research & Strategy** - ✅ FULLY IMPLEMENTED
- **Stage 2: Content Planning (Outline)** - ✅ FULLY IMPLEMENTED
- **Backend Architecture** - ✅ MODULAR & PRODUCTION-READY
- **Frontend UI Components** - ✅ COMPREHENSIVE EDITOR
- **CopilotKit Integration** - ✅ FULLY FUNCTIONAL
**🔄 IN PROGRESS:**
- **Stage 3: Content Generation** - 🔄 PARTIALLY IMPLEMENTED
- **Stage 4: SEO & Publishing** - 🔄 PARTIALLY IMPLEMENTED
**📋 TODO:**
- Section-by-section content generation
- Full SEO optimization pipeline
- Publishing integrations (Wix/WordPress)
- Advanced quality checks
### Key Principles
- **AI-first, HITL**: The assistant leads with intelligent suggestions; the user approves via render-and-wait HITL components where appropriate.
- **Research fidelity**: Google grounding + Exa researcher; hallucination detection with claim verification; pervasive citations.
- **Persona-aware**: Import blog writing persona from DB and apply it across planning/generation/optimizations.
- **SEO-excellent**: Real-time SEO analysis, metadata generation, schema, and image alt handling.
- **Publish-ready**: Smooth handoff to Wix/WordPress; preview and scheduling.
---
## 1) Workflow (4 Stages)
### Stage 1: Research & Strategy (AI Orchestration) ✅ **FULLY IMPLEMENTED**
**✅ IMPLEMENTED FEATURES:**
- **Google Search Grounding**: Single Gemini API call with native Google Search integration
- **Intelligent Caching**: Exact keyword match caching to reduce API costs
- **AI-Powered Analysis**: Keyword analysis, competitor analysis, content angle generation
- **Robust Error Handling**: No fallback data - only real AI-generated insights or graceful failures
- **Progress Tracking**: Real-time progress messages during research operations
**✅ IMPLEMENTED INPUTS:**
- `keywords: string[]`, `industry: string`, `targetAudience: string`, `wordCountTarget: number`
- Persona support (basic implementation)
**✅ IMPLEMENTED BACKEND/SERVICES:**
- **Modular Architecture**: `ResearchService`, `KeywordAnalyzer`, `CompetitorAnalyzer`, `ContentAngleGenerator`
- **Google Grounding**: Native Gemini Google Search integration (no Exa dependency)
- **Caching System**: Intelligent research result caching with TTL and LRU eviction
- **Error Handling**: Graceful failure with specific error messages
**✅ IMPLEMENTED COPILOTKIT ACTIONS:**
- `researchTopic(keywords, industry, target_audience, blogLength)` → comprehensive research with sources
- `chatWithResearchData(question)` → interactive research data exploration
- `getResearchKeywords()` → HITL keyword collection form
- `performResearch(formData)` → research execution with form data
**✅ IMPLEMENTED GENERATIVE UI:**
- **ResearchResults Component**: Sources, credibility scores, keyword analysis, content angles
- **KeywordInputForm**: HITL form for keyword collection with blog length selection
- **Progress Messages**: Real-time loading states with CopilotKit status system
**✅ IMPLEMENTED SUGGESTIONS:**
- "I want to research a topic for my blog" (initial)
- "Let's proceed to create an Outline" (post-research)
- "Chat with Research Data" (exploration)
- "Create outline with custom inputs" (advanced)
---
### Stage 2: Content Planning (AI + Human) ✅ **FULLY IMPLEMENTED**
**✅ IMPLEMENTED DELIVERABLES:**
- **Structured Outline**: H1/H2/H3 hierarchy with per-section key points and target word counts
- **AI-Generated Titles**: Multiple title options with SEO optimization
- **Research Integration**: Outline sections linked to research sources and keywords
- **Word Count Distribution**: Intelligent word allocation across sections
**✅ IMPLEMENTED COPILOTKIT ACTIONS:**
- `generateOutline()` → AI-powered outline generation from research data
- `createOutlineWithCustomInputs(customInstructions)` → custom outline with user instructions
- `refineOutline(operation, sectionId, payload)` → add/remove/move/merge/rename sections
- `enhanceSection(sectionId, focus)` → AI enhancement of individual sections
- `optimizeOutline(focus)` → AI optimization of entire outline
- `rebalanceOutline(targetWords)` → word count rebalancing across sections
**✅ IMPLEMENTED GENERATIVE UI:**
- **EnhancedOutlineEditor**: Interactive outline editor with expandable sections
- **TitleSelector**: AI-generated title options with custom title creation
- **CustomOutlineForm**: HITL form for custom outline instructions
- **Section Management**: Add, edit, reorder, merge sections with visual feedback
- **Research Integration**: Source references and keyword suggestions per section
**✅ IMPLEMENTED SUGGESTIONS:**
- "Generate outline" (standard)
- "Create outline with custom inputs" (advanced)
- "Enhance section [X]" (section-specific)
- "Optimize entire outline" (global)
- "Rebalance word counts" (distribution)
---
### Stage 3: Content Generation (CopilotKit-only, no multi-agent) 🔄 **PARTIALLY IMPLEMENTED**
**🔄 PARTIALLY IMPLEMENTED DELIVERABLES:**
- **Section Generation**: Basic section generation with markdown output
- **Content Structure**: Sectioned markdown with inline citations support
- **Quality Checks**: Hallucination detection integration
**✅ IMPLEMENTED COPILOTKIT ACTIONS:**
- `generateSection(sectionId)` → generates content for specific section
- `generateAllSections()` → placeholder for bulk generation
- `runHallucinationCheck()` → integrates with hallucination detector service
**🔄 PARTIALLY IMPLEMENTED UI:**
- **Section Editors**: Basic markdown editing per section
- **DiffPreview Component**: Exists but needs integration
- **Citation System**: Basic structure in place
**📋 TODO:**
- Full section-by-section content generation
- Advanced content optimization
- Inline citation management
- Content quality improvements
- Progress tracking for bulk generation
---
### Stage 4: Optimization & Publishing (AI + Human) 🔄 **PARTIALLY IMPLEMENTED**
**🔄 PARTIALLY IMPLEMENTED SEO OPTIMIZATION:**
- **SEO Analysis**: Basic SEO analysis with keyword density and structure
- **Metadata Generation**: Title options and meta description generation
- **SEO Integration**: Wraps existing SEO tools services
**✅ IMPLEMENTED COPILOTKIT ACTIONS:**
- `runSEOAnalyze(keywords)` → SEO analysis with scores and recommendations
- `generateSEOMetadata(title)` → metadata generation for titles and descriptions
- `publishToPlatform(platform, schedule)` → placeholder for publishing
**🔄 PARTIALLY IMPLEMENTED UI:**
- **SEOMiniPanel**: Basic SEO analysis display
- **Metadata Management**: Title and description editing
**📋 TODO:**
- Full SEO optimization pipeline
- Advanced SEO recommendations
- Publishing integrations (Wix/WordPress)
- Content optimization with diff preview
- Image alt text and media management
- Schema markup generation
---
## 2) SEO Tools Integration & Metadata
Existing Services to Wrap
- Meta Description, OpenGraph, Image Alt, On-Page SEO, Technical SEO, Content Strategy (see `backend/services/seo_tools/*` and docs).
Unified Endpoints
- `POST /api/blog/seo/analyze` → { seoScore, density, structure, readability, link suggestions, image alt status, recs }
- `POST /api/blog/seo/metadata` → { titleOptions, metaDescriptionOptions, openGraph, twitterCard, schema: { Article, FAQ?, Breadcrumb, Org/Person } }
Editor SEO Panel
- Live density and distribution, readability (Flesch-Kincaid), heading hierarchy, internal/external link suggestions.
- One-click “Apply Fix” with diff preview.
Schema
- Default Article schema; optional FAQ when Q&A snippets exist; Breadcrumb, Organization/Person as applicable.
---
## 3) Dedicated Blog Editor Design (Copilot-first)
Layout
- Left: Markdown Editor (per-section tabs), word count, persona cues, inline citation chips.
- Right: Live Preview (desktop/mobile), SEO SERP snippet preview, social preview (OG/Twitter).
- Sidebar Panels: Research (sources, claims), SEO (scores/fixes), Media (AI images + alt text), History (versions).
Core Components
- `BlogResearchCard` (render-only): sources, credibility scores, add-to-outline.
- `OutlineEditor` (HITL): drag-drop H2/H3, per-section refs and target words.
- `SectionEditor`: markdown area with persona/tone badges; per-section SEO mini-score.
- `DiffPreview` (HITL): apply/reject AI edits.
- `SEOPanel`: density/structure/readability + apply fix.
- `MediaPanel`: AI images, compression, automatic alt-text.
CopilotKit Integrations
- Suggestions: set programmatically (`useCopilotChatHeadless_c`) or via `CopilotSidebar` props.
- Generative UI: `useCopilotAction({ render })` for research cards, outline editor, diff preview, publish dialog.
- HITL: `renderAndWaitForResponse` for approvals at outline, diff apply, and publish steps.
- References: CopilotKit docs — Frontend Actions, Generative UI, Suggestions, HITL.
Persistence
- Persist outline, per-section content, references, persona snapshot, SEO state, metadata drafts.
- Auto-save every 30s; version history for undo.
---
## 4) Backend APIs ✅ **FULLY IMPLEMENTED**
**✅ IMPLEMENTED BLOG ENDPOINTS:**
- `POST /api/blog/research/start` → async research with progress tracking
- `GET /api/blog/research/status/{task_id}` → research progress status
- `POST /api/blog/outline/start` → async outline generation with progress
- `GET /api/blog/outline/status/{task_id}` → outline progress status
- `POST /api/blog/outline/refine` → outline refinement operations
- `POST /api/blog/outline/rebalance` → word count rebalancing
- `POST /api/blog/section/generate` → section content generation
- `POST /api/blog/section/optimize` → content optimization
- `POST /api/blog/quality/hallucination-check` → hallucination detection
- `POST /api/blog/seo/analyze` → SEO analysis and recommendations
- `POST /api/blog/seo/metadata` → metadata generation
- `POST /api/blog/publish` → publishing to platforms
- `GET /api/blog/health` → service health check
**✅ IMPLEMENTED MODULAR ARCHITECTURE:**
- **Core Service**: `BlogWriterService` - main orchestrator
- **Research Module**: `ResearchService`, `KeywordAnalyzer`, `CompetitorAnalyzer`, `ContentAngleGenerator`
- **Outline Module**: `OutlineService`, `OutlineGenerator`, `OutlineOptimizer`, `SectionEnhancer`
- **Caching System**: Intelligent research result caching with TTL and LRU eviction
- **Error Handling**: Graceful failure with specific error messages
**✅ IMPLEMENTED MODELS:**
- `BlogResearchRequest`, `BlogResearchResponse`
- `BlogOutlineRequest`, `BlogOutlineResponse`, `BlogOutlineRefineRequest`
- `BlogSectionRequest`, `BlogSectionResponse`
- `BlogOptimizeRequest`, `BlogOptimizeResponse`
- `BlogSEOAnalyzeRequest`, `BlogSEOAnalyzeResponse`
- `BlogSEOMetadataRequest`, `BlogSEOMetadataResponse`
- `BlogPublishRequest`, `BlogPublishResponse`
- `HallucinationCheckRequest`, `HallucinationCheckResponse`
**✅ REUSED SERVICES:**
- `/api/hallucination-detector/*` - hallucination detection integration
- SEO tools services - wrapped for blog-specific analysis
---
## 5) CopilotKit Action Inventory ✅ **COMPREHENSIVE IMPLEMENTATION**
**✅ RESEARCH ACTIONS (FULLY IMPLEMENTED):**
- `researchTopic(keywords, industry, target_audience, blogLength)` → comprehensive research
- `chatWithResearchData(question)` → interactive research exploration
- `getResearchKeywords()` → HITL keyword collection form
- `performResearch(formData)` → research execution with form data
**✅ PLANNING ACTIONS (FULLY IMPLEMENTED):**
- `generateOutline()` → AI-powered outline generation
- `createOutlineWithCustomInputs(customInstructions)` → custom outline creation
- `refineOutline(operation, sectionId, payload)` → outline refinement operations
- `enhanceSection(sectionId, focus)` → section enhancement
- `optimizeOutline(focus)` → outline optimization
- `rebalanceOutline(targetWords)` → word count rebalancing
**🔄 GENERATION ACTIONS (PARTIALLY IMPLEMENTED):**
- `generateSection(sectionId)` → section content generation ✅
- `generateAllSections()` → bulk generation (placeholder) 🔄
- `runHallucinationCheck()` → hallucination detection ✅
**🔄 SEO ACTIONS (PARTIALLY IMPLEMENTED):**
- `runSEOAnalyze(keywords)` → SEO analysis ✅
- `generateSEOMetadata(title)` → metadata generation ✅
**🔄 PUBLISHING ACTIONS (PARTIALLY IMPLEMENTED):**
- `publishToPlatform(platform, schedule)` → publishing (placeholder) 🔄
**✅ UX/RENDER-ONLY/HITL (FULLY IMPLEMENTED):**
- `ResearchResults` → research data visualization
- `EnhancedOutlineEditor` → interactive outline management
- `KeywordInputForm` → HITL keyword collection
- `CustomOutlineForm` → HITL custom outline creation
- `TitleSelector` → title selection and creation
- `DiffPreview` → content diff visualization
- `SEOMiniPanel` → SEO analysis display
---
## 6) Intelligent Suggestions (states)
Before research
- “Load persona”, “Analyze keywords”, “Research topic”
After research
- “Generate outline”, “Add competitor H2s”, “Attach sources”
Outline ready
- “Generate [Section 1]”, “…”, “Generate all sections”
Draft ready
- “Run fact-check”, “Run SEO analysis”, “Generate metadata”
Final
- “Publish to WordPress”, “Schedule on Wix”
---
## 7) Delivery Plan / Milestones ✅ **UPDATED STATUS**
**✅ MILESTONE 1: Research + Outline (COMPLETED)**
- ✅ Actions: research topic, generate outline, outline editor (HITL)
- ✅ Google Search grounding integration
- ✅ AI-powered keyword and competitor analysis
- ✅ Interactive outline editor with refinement capabilities
- ✅ Research data visualization and exploration
**🔄 MILESTONE 2: Section Generation + Quality (IN PROGRESS)**
- ✅ generateSection (basic implementation)
- 🔄 generateAllSections (needs full implementation)
- 🔄 optimizeSection with diff preview (needs integration)
- ✅ hallucination check integration
- 📋 Content quality improvements and optimization
**🔄 MILESTONE 3: SEO & Metadata (IN PROGRESS)**
- ✅ analyzeSEO panel (basic implementation)
- ✅ generateSEOMetadata (title/meta generation)
- 📋 Advanced SEO recommendations and fixes
- 📋 Schema markup and social media optimization
**📋 MILESTONE 4: Publishing (TODO)**
- 📋 prepareForPublish functionality
- 📋 publishToPlatform (Wix/WordPress integration)
- 📋 Scheduling and publishing workflow
- 📋 Success URL and status tracking
**📋 MILESTONE 5: Polish (TODO)**
- 📋 Advanced readability aids
- 📋 Version history and auto-save
- 📋 Performance optimization
- 📋 Accessibility improvements
---
## 8) Current Architecture & Implementation Details
### 🏗️ **Backend Architecture (Modular & Production-Ready)**
**Core Service Structure:**
```
backend/services/blog_writer/
├── core/
│ └── blog_writer_service.py # Main orchestrator
├── research/
│ ├── research_service.py # Research orchestration
│ ├── keyword_analyzer.py # AI keyword analysis
│ ├── competitor_analyzer.py # Competitor intelligence
│ └── content_angle_generator.py # Content angle discovery
├── outline/
│ ├── outline_service.py # Outline orchestration
│ ├── outline_generator.py # AI outline generation
│ ├── outline_optimizer.py # Outline optimization
│ └── section_enhancer.py # Section enhancement
└── blog_service.py # Entry point (thin wrapper)
```
**Key Features:**
- **No Fallback Data**: Only real AI-generated insights or graceful failures
- **Intelligent Caching**: Research result caching with TTL and LRU eviction
- **Error Handling**: Specific error messages and retry logic
- **Progress Tracking**: Real-time progress updates for long-running operations
### 🎨 **Frontend Architecture (CopilotKit-First)**
**Component Structure:**
```
frontend/src/components/BlogWriter/
├── BlogWriter.tsx # Main orchestrator component
├── ResearchAction.tsx # Research CopilotKit actions
├── ResearchResults.tsx # Research data visualization
├── KeywordInputForm.tsx # HITL keyword collection
├── EnhancedOutlineEditor.tsx # Interactive outline editor
├── TitleSelector.tsx # Title selection and creation
├── CustomOutlineForm.tsx # HITL custom outline creation
├── ResearchDataActions.tsx # Research data interaction
├── EnhancedOutlineActions.tsx # Outline management actions
├── DiffPreview.tsx # Content diff visualization
└── SEOMiniPanel.tsx # SEO analysis display
```
**Key Features:**
- **CopilotKit Integration**: Full action system with HITL components
- **Real-time Updates**: Progress messages and status tracking
- **Interactive UI**: Drag-and-drop, expandable sections, visual feedback
- **Error Handling**: User-friendly error messages and recovery
### 🔧 **Technical Implementation Highlights**
**Research Phase:**
- Single Gemini API call with Google Search grounding
- AI-powered analysis of keywords, competitors, and content angles
- Intelligent caching to reduce API costs
- No fallback data - only real AI insights
**Outline Phase:**
- Research-driven outline generation
- Interactive outline editor with full CRUD operations
- AI-powered section enhancement and optimization
- Word count rebalancing and distribution
**Quality Assurance:**
- Robust error handling with specific messages
- Progress tracking for long-running operations
- Graceful failure without misleading data
- Real-time user feedback and guidance
---
## 9) References
- CopilotKit Quickstart, Frontend Actions, Generative UI, HITL, Suggestions
- Quickstart: https://docs.copilotkit.ai/direct-to-llm/guides/quickstart
- Frontend Actions: https://docs.copilotkit.ai/frontend-actions
- Generative UI: https://docs.copilotkit.ai/direct-to-llm/guides/generative-ui
- Headless + Suggestions + HITL: https://docs.copilotkit.ai/premium/headless-ui
---
## 9) Notes on Reuse from LinkedIn Writer
- Research handler; Gemini grounded provider; citation manager; quality analyzer.
- Hallucination detector + Exa verification endpoints.
- CopilotKit integration patterns: actions, suggestions, render/HITL, state persistence.