Files
ALwrity/docs/STEP_2_COMPLETE_DATA_FLOW_ANALYSIS.md

13 KiB

Step 2 (Website Analysis) - Complete Data Flow Analysis

Overview

Step 2 performs comprehensive website analysis including crawling, style detection, pattern analysis, and guideline generation. This document maps the complete data flow from frontend to database.

API Endpoints Called

1. /api/onboarding/style-detection/complete (PRIMARY)

Purpose: Main analysis endpoint that performs the complete workflow

Request (POST):

{
  url: string,
  include_patterns: true,
  include_guidelines: true
}

Response:

{
  success: boolean,
  crawl_result: {
    content: string,
    success: boolean,
    timestamp: string
  },
  style_analysis: {
    writing_style: {...},
    content_characteristics: {...},
    target_audience: {...},
    content_type: {...},
    recommended_settings: {...},
    brand_analysis: {...},              // ← Rich brand insights
    content_strategy_insights: {...}    // ← SWOT analysis
  },
  style_patterns: {
    style_consistency: {...},
    unique_elements: {...}
  },
  style_guidelines: {
    guidelines: [...],
    best_practices: [...],
    avoid_elements: [...],
    content_strategy: [...],
    ai_generation_tips: [...],
    competitive_advantages: [...],
    content_calendar_suggestions: [...]
  },
  analysis_id: number,
  warning?: string
}

2. /api/onboarding/style-detection/check-existing/{url} (OPTIONAL)

Purpose: Check if analysis already exists for this URL

Response:

{
  exists: boolean,
  analysis_id?: number,
  analysis?: {...}  // Full analysis data if exists
}

3. /api/onboarding/style-detection/analysis/{id} (OPTIONAL)

Purpose: Load existing analysis by ID

4. /api/onboarding/style-detection/session-analyses (OPTIONAL)

Purpose: Get last analysis from session for pre-filling

Complete Data Structure Collected

1. Writing Style (writing_style)

{
  "tone": "Professional, Informative",
  "voice": "Active, Direct",
  "complexity": "Moderate",
  "engagement_level": "High",
  "brand_personality": "Trustworthy, Expert",
  "formality_level": "Semi-formal",
  "emotional_appeal": "Rational with emotional hooks"
}

2. Content Characteristics (content_characteristics)

{
  "sentence_structure": "Mix of short and medium sentences",
  "vocabulary_level": "Professional/Business",
  "paragraph_organization": "Clear topic sentences",
  "content_flow": "Logical progression",
  "readability_score": "8th-10th grade",
  "content_density": "Information-rich",
  "visual_elements_usage": "Moderate"
}

3. Target Audience (target_audience)

{
  "demographics": ["B2B", "Enterprise clients", "IT professionals"],
  "expertise_level": "Intermediate to Advanced",
  "industry_focus": "Technology/SaaS",
  "geographic_focus": "Global, US-focused",
  "psychographic_profile": "Innovation-driven, ROI-focused",
  "pain_points": ["Efficiency", "Scalability"],
  "motivations": ["Business growth", "Competitive advantage"]
}

4. Content Type (content_type)

{
  "primary_type": "Educational/Thought Leadership",
  "secondary_types": ["Case Studies", "Product Descriptions"],
  "purpose": "Inform and convert",
  "call_to_action": "Demo request, Free trial",
  "conversion_focus": "Lead generation",
  "educational_value": "High"
}

5. Brand Analysis (brand_analysis) IMPORTANT

{
  "brand_voice": "Authoritative yet approachable",
  "brand_values": ["Innovation", "Reliability", "Customer success"],
  "brand_positioning": "Premium solution provider",
  "competitive_differentiation": "AI-powered automation",
  "trust_signals": ["Case studies", "Testimonials", "Security badges"],
  "authority_indicators": ["Industry certifications", "Expert team"]
}

6. Content Strategy Insights (content_strategy_insights) IMPORTANT

{
  "strengths": [
    "Clear value proposition",
    "Strong technical authority",
    "Engaging storytelling"
  ],
  "weaknesses": [
    "Limited social proof",
    "Technical jargon overuse"
  ],
  "opportunities": [
    "Video content",
    "Interactive demos",
    "Industry thought leadership"
  ],
  "threats": [
    "Competitor content marketing",
    "Market saturation"
  ],
  "recommended_improvements": [
    "Add more case studies",
    "Simplify technical explanations",
    "Increase content frequency"
  ],
  "content_gaps": [
    "Beginner-level tutorials",
    "Comparison guides",
    "Industry trend analysis"
  ]
}
{
  "writing_tone": "Professional yet conversational",
  "target_audience": "B2B decision makers",
  "content_type": "Educational with conversion focus",
  "creativity_level": "Balanced",
  "geographic_location": "US/Global",
  "industry_context": "B2B SaaS"
}

8. Crawl Result (crawl_result)

{
  "content": "Full crawled text content...",
  "success": true,
  "timestamp": "2025-10-11T12:00:00Z"
}

9. Style Patterns (style_patterns)

{
  "style_consistency": {
    "consistency_score": 0.85,
    "common_patterns": ["Data-driven claims", "Action-oriented CTAs"],
    "variations": ["Blog vs landing page tone"]
  },
  "unique_elements": [
    "Custom terminology",
    "Brand-specific phrases",
    "Signature formatting"
  ]
}

10. Style Guidelines (style_guidelines)

{
  "guidelines": [
    "Use active voice",
    "Start with benefit statements",
    "Support claims with data"
  ],
  "best_practices": [
    "Lead with customer pain points",
    "Include social proof",
    "Clear CTAs"
  ],
  "avoid_elements": [
    "Passive voice",
    "Overly technical jargon",
    "Generic claims"
  ],
  "content_strategy": [
    "Focus on thought leadership",
    "Build trust through expertise",
    "Address buyer journey stages"
  ],
  "ai_generation_tips": [
    "Emphasize ROI and metrics",
    "Use industry-specific examples",
    "Balance technical depth with clarity"
  ],
  "competitive_advantages": [
    "Unique positioning statement",
    "Differentiating features",
    "Customer success stories"
  ],
  "content_calendar_suggestions": [
    "Weekly blog posts",
    "Monthly case studies",
    "Quarterly industry reports"
  ]
}

Current Database Storage (OnboardingDatabaseService)

What's Saved to onboarding_sessions.website_analyses Table:

File: backend/services/onboarding_database_service.py (Line 173)

WebsiteAnalysis(
    session_id=session.id,
    website_url=analysis_data.get('website_url'),
    writing_style=analysis_data.get('writing_style'),              # ✅
    content_characteristics=analysis_data.get('content_characteristics'),  # ✅
    target_audience=analysis_data.get('target_audience'),          # ✅
    content_type=analysis_data.get('content_type'),                # ✅
    recommended_settings=analysis_data.get('recommended_settings'),# ✅
    crawl_result=analysis_data.get('crawl_result'),                # ✅
    style_patterns=analysis_data.get('style_patterns'),            # ✅
    style_guidelines=analysis_data.get('style_guidelines'),        # ✅
    status='completed'
)

What's MISSING from Database Storage:

  1. brand_analysis - NOT saved to onboarding_database_service
  2. content_strategy_insights - NOT saved to onboarding_database_service

What's Saved to website_analyses Table (via WebsiteAnalysisService):

File: backend/services/website_analysis_service.py (Lines 44-87)

This service saves to a DIFFERENT table (website_analyses not onboarding_sessions.website_analyses).

# Saves to: website_analyses table
WebsiteAnalysis(
    session_id=session_id,                    # Integer session ID
    website_url=website_url,
    writing_style=style_analysis.get('writing_style'),
    content_characteristics=style_analysis.get('content_characteristics'),
    target_audience=style_analysis.get('target_audience'),
    content_type=style_analysis.get('content_type'),
    recommended_settings=style_analysis.get('recommended_settings'),
    brand_analysis=style_analysis.get('brand_analysis'),           # ✅ SAVED HERE!
    content_strategy_insights=style_analysis.get('content_strategy_insights'),  # ✅ SAVED HERE!
    crawl_result=analysis_data.get('crawl_result'),
    style_patterns=analysis_data.get('style_patterns'),
    style_guidelines=analysis_data.get('style_guidelines'),
    status='completed'
)

The Problem: Dual Database Persistence

We have TWO separate database save operations happening:

1. /style-detection/complete endpoint (component_logic.py)

  • Saves to website_analyses table via WebsiteAnalysisService
  • Uses Integer session_id (converted from Clerk ID via SHA256)
  • Saves ALL fields including brand_analysis and content_strategy_insights

2. OnboardingProgress.save_progress() (api_key_manager.py)

  • Saves to onboarding_sessions.website_analyses table via OnboardingDatabaseService
  • Uses String user_id (Clerk ID)
  • MISSING brand_analysis and content_strategy_insights

Current Frontend Data Structure

File: frontend/src/components/OnboardingWizard/WebsiteStep.tsx (Line 386)

const stepData = {
  website: fixedUrl,              // ← Should be "website_url"
  domainName: domainName,
  analysis: {                     // ← Nested structure
    writing_style: {...},
    content_characteristics: {...},
    target_audience: {...},
    content_type: {...},
    brand_analysis: {...},        // ✅ Present
    content_strategy_insights: {...},  // ✅ Present
    recommended_settings: {...},
    // ... ALL the fields from API response
    guidelines: [...],
    best_practices: [...],
    avoid_elements: [...],
    style_patterns: {...},
    // etc.
  },
  useAnalysisForGenAI: true
};

Solution Required

1. Fix Data Transformation (COMPLETED )

File: backend/services/api_key_manager.py (Line 278)

Already fixed to flatten the structure:

elif step.step_number == 2:  # Website Analysis
    # Transform frontend data structure to match database schema
    analysis_for_db = {
        'website_url': step.data.get('website', ''),
        'status': 'completed'
    }
    # Merge analysis fields if they exist
    if 'analysis' in step.data and step.data['analysis']:
        analysis_for_db.update(step.data['analysis'])
    
    self.db_service.save_website_analysis(self.user_id, analysis_for_db, db)

2. Update OnboardingDatabaseService to Save ALL Fields

File: backend/services/onboarding_database_service.py

NEEDED: Add brand_analysis and content_strategy_insights to the save operation.

Check if WebsiteAnalysis model has these columns:

# Line 206-213 (existing code)
website_url=analysis_data.get('website_url', ''),
writing_style=analysis_data.get('writing_style'),
content_characteristics=analysis_data.get('content_characteristics'),
target_audience=analysis_data.get('target_audience'),
content_type=analysis_data.get('content_type'),
recommended_settings=analysis_data.get('recommended_settings'),
brand_analysis=analysis_data.get('brand_analysis'),              # ← ADD THIS
content_strategy_insights=analysis_data.get('content_strategy_insights'),  # ← ADD THIS
crawl_result=analysis_data.get('crawl_result'),
style_patterns=analysis_data.get('style_patterns'),
style_guidelines=analysis_data.get('style_guidelines'),

3. Verify Database Model Supports These Fields

File: backend/models/onboarding.py

Check WebsiteAnalysis model for:

  • brand_analysis column (JSON)
  • content_strategy_insights column (JSON)

If missing, add migration.

Recommendation

  1. Data transformation fix is complete (api_key_manager.py updated)
  2. Check WebsiteAnalysis model for brand_analysis and content_strategy_insights columns
  3. Update OnboardingDatabaseService.save_website_analysis() to include these fields
  4. Restart backend to apply changes
  5. Re-run Step 2 to save complete data
  6. Verify Step 6 displays all fields

Benefits of Complete Data Storage

With brand_analysis and content_strategy_insights saved:

  1. Better Content Generation: AI can align with brand values
  2. Strategic Insights: SWOT analysis informs content strategy
  3. Competitive Intelligence: Differentiation factors for positioning
  4. Content Planning: Recommendations and calendar suggestions
  5. Quality Assurance: Consistency checking against brand guidelines

Status

  • API endpoint returns complete data
  • Frontend receives and displays complete data
  • Data transformation fix applied (flattening structure)
  • Database model verification needed
  • OnboardingDatabaseService update needed
  • Testing required

Next Action: Check WebsiteAnalysis model and update OnboardingDatabaseService to save ALL fields.