# Step 2 (Website Analysis) - Complete Data Flow Analysis ## Overview Step 2 performs comprehensive website analysis including crawling, style detection, pattern analysis, and guideline generation. This document maps the complete data flow from frontend to database. ## API Endpoints Called ### 1. `/api/onboarding/style-detection/complete` (PRIMARY) **Purpose**: Main analysis endpoint that performs the complete workflow **Request** (`POST`): ```typescript { url: string, include_patterns: true, include_guidelines: true } ``` **Response**: ```typescript { success: boolean, crawl_result: { content: string, success: boolean, timestamp: string }, style_analysis: { writing_style: {...}, content_characteristics: {...}, target_audience: {...}, content_type: {...}, recommended_settings: {...}, brand_analysis: {...}, // ← Rich brand insights content_strategy_insights: {...} // ← SWOT analysis }, style_patterns: { style_consistency: {...}, unique_elements: {...} }, style_guidelines: { guidelines: [...], best_practices: [...], avoid_elements: [...], content_strategy: [...], ai_generation_tips: [...], competitive_advantages: [...], content_calendar_suggestions: [...] }, analysis_id: number, warning?: string } ``` ### 2. `/api/onboarding/style-detection/check-existing/{url}` (OPTIONAL) **Purpose**: Check if analysis already exists for this URL **Response**: ```typescript { exists: boolean, analysis_id?: number, analysis?: {...} // Full analysis data if exists } ``` ### 3. `/api/onboarding/style-detection/analysis/{id}` (OPTIONAL) **Purpose**: Load existing analysis by ID ### 4. `/api/onboarding/style-detection/session-analyses` (OPTIONAL) **Purpose**: Get last analysis from session for pre-filling ## Complete Data Structure Collected ### 1. **Writing Style** (`writing_style`) ```json { "tone": "Professional, Informative", "voice": "Active, Direct", "complexity": "Moderate", "engagement_level": "High", "brand_personality": "Trustworthy, Expert", "formality_level": "Semi-formal", "emotional_appeal": "Rational with emotional hooks" } ``` ### 2. **Content Characteristics** (`content_characteristics`) ```json { "sentence_structure": "Mix of short and medium sentences", "vocabulary_level": "Professional/Business", "paragraph_organization": "Clear topic sentences", "content_flow": "Logical progression", "readability_score": "8th-10th grade", "content_density": "Information-rich", "visual_elements_usage": "Moderate" } ``` ### 3. **Target Audience** (`target_audience`) ```json { "demographics": ["B2B", "Enterprise clients", "IT professionals"], "expertise_level": "Intermediate to Advanced", "industry_focus": "Technology/SaaS", "geographic_focus": "Global, US-focused", "psychographic_profile": "Innovation-driven, ROI-focused", "pain_points": ["Efficiency", "Scalability"], "motivations": ["Business growth", "Competitive advantage"] } ``` ### 4. **Content Type** (`content_type`) ```json { "primary_type": "Educational/Thought Leadership", "secondary_types": ["Case Studies", "Product Descriptions"], "purpose": "Inform and convert", "call_to_action": "Demo request, Free trial", "conversion_focus": "Lead generation", "educational_value": "High" } ``` ### 5. **Brand Analysis** (`brand_analysis`) ⭐ **IMPORTANT** ```json { "brand_voice": "Authoritative yet approachable", "brand_values": ["Innovation", "Reliability", "Customer success"], "brand_positioning": "Premium solution provider", "competitive_differentiation": "AI-powered automation", "trust_signals": ["Case studies", "Testimonials", "Security badges"], "authority_indicators": ["Industry certifications", "Expert team"] } ``` ### 6. **Content Strategy Insights** (`content_strategy_insights`) ⭐ **IMPORTANT** ```json { "strengths": [ "Clear value proposition", "Strong technical authority", "Engaging storytelling" ], "weaknesses": [ "Limited social proof", "Technical jargon overuse" ], "opportunities": [ "Video content", "Interactive demos", "Industry thought leadership" ], "threats": [ "Competitor content marketing", "Market saturation" ], "recommended_improvements": [ "Add more case studies", "Simplify technical explanations", "Increase content frequency" ], "content_gaps": [ "Beginner-level tutorials", "Comparison guides", "Industry trend analysis" ] } ``` ### 7. **Recommended Settings** (`recommended_settings`) ```json { "writing_tone": "Professional yet conversational", "target_audience": "B2B decision makers", "content_type": "Educational with conversion focus", "creativity_level": "Balanced", "geographic_location": "US/Global", "industry_context": "B2B SaaS" } ``` ### 8. **Crawl Result** (`crawl_result`) ```json { "content": "Full crawled text content...", "success": true, "timestamp": "2025-10-11T12:00:00Z" } ``` ### 9. **Style Patterns** (`style_patterns`) ```json { "style_consistency": { "consistency_score": 0.85, "common_patterns": ["Data-driven claims", "Action-oriented CTAs"], "variations": ["Blog vs landing page tone"] }, "unique_elements": [ "Custom terminology", "Brand-specific phrases", "Signature formatting" ] } ``` ### 10. **Style Guidelines** (`style_guidelines`) ```json { "guidelines": [ "Use active voice", "Start with benefit statements", "Support claims with data" ], "best_practices": [ "Lead with customer pain points", "Include social proof", "Clear CTAs" ], "avoid_elements": [ "Passive voice", "Overly technical jargon", "Generic claims" ], "content_strategy": [ "Focus on thought leadership", "Build trust through expertise", "Address buyer journey stages" ], "ai_generation_tips": [ "Emphasize ROI and metrics", "Use industry-specific examples", "Balance technical depth with clarity" ], "competitive_advantages": [ "Unique positioning statement", "Differentiating features", "Customer success stories" ], "content_calendar_suggestions": [ "Weekly blog posts", "Monthly case studies", "Quarterly industry reports" ] } ``` ## Current Database Storage (OnboardingDatabaseService) ### What's Saved to `onboarding_sessions.website_analyses` Table: **File**: `backend/services/onboarding_database_service.py` (Line 173) ```python WebsiteAnalysis( session_id=session.id, website_url=analysis_data.get('website_url'), writing_style=analysis_data.get('writing_style'), # ✅ content_characteristics=analysis_data.get('content_characteristics'), # ✅ target_audience=analysis_data.get('target_audience'), # ✅ content_type=analysis_data.get('content_type'), # ✅ recommended_settings=analysis_data.get('recommended_settings'),# ✅ crawl_result=analysis_data.get('crawl_result'), # ✅ style_patterns=analysis_data.get('style_patterns'), # ✅ style_guidelines=analysis_data.get('style_guidelines'), # ✅ status='completed' ) ``` ### ❌ What's MISSING from Database Storage: 1. **brand_analysis** - NOT saved to `onboarding_database_service` 2. **content_strategy_insights** - NOT saved to `onboarding_database_service` ### ✅ What's Saved to `website_analyses` Table (via WebsiteAnalysisService): **File**: `backend/services/website_analysis_service.py` (Lines 44-87) This service saves to a DIFFERENT table (`website_analyses` not `onboarding_sessions.website_analyses`). ```python # Saves to: website_analyses table WebsiteAnalysis( session_id=session_id, # Integer session ID website_url=website_url, writing_style=style_analysis.get('writing_style'), content_characteristics=style_analysis.get('content_characteristics'), target_audience=style_analysis.get('target_audience'), content_type=style_analysis.get('content_type'), recommended_settings=style_analysis.get('recommended_settings'), brand_analysis=style_analysis.get('brand_analysis'), # ✅ SAVED HERE! content_strategy_insights=style_analysis.get('content_strategy_insights'), # ✅ SAVED HERE! crawl_result=analysis_data.get('crawl_result'), style_patterns=analysis_data.get('style_patterns'), style_guidelines=analysis_data.get('style_guidelines'), status='completed' ) ``` ## The Problem: Dual Database Persistence We have **TWO separate database save operations** happening: ### 1. `/style-detection/complete` endpoint (component_logic.py) - Saves to `website_analyses` table via `WebsiteAnalysisService` - Uses **Integer session_id** (converted from Clerk ID via SHA256) - Saves **ALL fields** including `brand_analysis` and `content_strategy_insights` ### 2. `OnboardingProgress.save_progress()` (api_key_manager.py) - Saves to `onboarding_sessions.website_analyses` table via `OnboardingDatabaseService` - Uses **String user_id** (Clerk ID) - **MISSING** `brand_analysis` and `content_strategy_insights` ## Current Frontend Data Structure **File**: `frontend/src/components/OnboardingWizard/WebsiteStep.tsx` (Line 386) ```typescript const stepData = { website: fixedUrl, // ← Should be "website_url" domainName: domainName, analysis: { // ← Nested structure writing_style: {...}, content_characteristics: {...}, target_audience: {...}, content_type: {...}, brand_analysis: {...}, // ✅ Present content_strategy_insights: {...}, // ✅ Present recommended_settings: {...}, // ... ALL the fields from API response guidelines: [...], best_practices: [...], avoid_elements: [...], style_patterns: {...}, // etc. }, useAnalysisForGenAI: true }; ``` ## Solution Required ### 1. Fix Data Transformation (COMPLETED ✅) **File**: `backend/services/api_key_manager.py` (Line 278) Already fixed to flatten the structure: ```python elif step.step_number == 2: # Website Analysis # Transform frontend data structure to match database schema analysis_for_db = { 'website_url': step.data.get('website', ''), 'status': 'completed' } # Merge analysis fields if they exist if 'analysis' in step.data and step.data['analysis']: analysis_for_db.update(step.data['analysis']) self.db_service.save_website_analysis(self.user_id, analysis_for_db, db) ``` ### 2. Update OnboardingDatabaseService to Save ALL Fields **File**: `backend/services/onboarding_database_service.py` **NEEDED**: Add `brand_analysis` and `content_strategy_insights` to the save operation. Check if `WebsiteAnalysis` model has these columns: ```python # Line 206-213 (existing code) website_url=analysis_data.get('website_url', ''), writing_style=analysis_data.get('writing_style'), content_characteristics=analysis_data.get('content_characteristics'), target_audience=analysis_data.get('target_audience'), content_type=analysis_data.get('content_type'), recommended_settings=analysis_data.get('recommended_settings'), brand_analysis=analysis_data.get('brand_analysis'), # ← ADD THIS content_strategy_insights=analysis_data.get('content_strategy_insights'), # ← ADD THIS crawl_result=analysis_data.get('crawl_result'), style_patterns=analysis_data.get('style_patterns'), style_guidelines=analysis_data.get('style_guidelines'), ``` ### 3. Verify Database Model Supports These Fields **File**: `backend/models/onboarding.py` Check `WebsiteAnalysis` model for: - `brand_analysis` column (JSON) - `content_strategy_insights` column (JSON) If missing, add migration. ## Recommendation 1. ✅ **Data transformation fix is complete** (api_key_manager.py updated) 2. ⏳ **Check WebsiteAnalysis model** for brand_analysis and content_strategy_insights columns 3. ⏳ **Update OnboardingDatabaseService.save_website_analysis()** to include these fields 4. ⏳ **Restart backend** to apply changes 5. ⏳ **Re-run Step 2** to save complete data 6. ⏳ **Verify Step 6** displays all fields ## Benefits of Complete Data Storage With `brand_analysis` and `content_strategy_insights` saved: 1. **Better Content Generation**: AI can align with brand values 2. **Strategic Insights**: SWOT analysis informs content strategy 3. **Competitive Intelligence**: Differentiation factors for positioning 4. **Content Planning**: Recommendations and calendar suggestions 5. **Quality Assurance**: Consistency checking against brand guidelines ## Status - ✅ API endpoint returns complete data - ✅ Frontend receives and displays complete data - ✅ Data transformation fix applied (flattening structure) - ⏳ Database model verification needed - ⏳ OnboardingDatabaseService update needed - ⏳ Testing required --- **Next Action**: Check `WebsiteAnalysis` model and update `OnboardingDatabaseService` to save ALL fields.