13 KiB
Step 2 (Website Analysis) - Complete Data Flow Analysis
Overview
Step 2 performs comprehensive website analysis including crawling, style detection, pattern analysis, and guideline generation. This document maps the complete data flow from frontend to database.
API Endpoints Called
1. /api/onboarding/style-detection/complete (PRIMARY)
Purpose: Main analysis endpoint that performs the complete workflow
Request (POST):
{
url: string,
include_patterns: true,
include_guidelines: true
}
Response:
{
success: boolean,
crawl_result: {
content: string,
success: boolean,
timestamp: string
},
style_analysis: {
writing_style: {...},
content_characteristics: {...},
target_audience: {...},
content_type: {...},
recommended_settings: {...},
brand_analysis: {...}, // ← Rich brand insights
content_strategy_insights: {...} // ← SWOT analysis
},
style_patterns: {
style_consistency: {...},
unique_elements: {...}
},
style_guidelines: {
guidelines: [...],
best_practices: [...],
avoid_elements: [...],
content_strategy: [...],
ai_generation_tips: [...],
competitive_advantages: [...],
content_calendar_suggestions: [...]
},
analysis_id: number,
warning?: string
}
2. /api/onboarding/style-detection/check-existing/{url} (OPTIONAL)
Purpose: Check if analysis already exists for this URL
Response:
{
exists: boolean,
analysis_id?: number,
analysis?: {...} // Full analysis data if exists
}
3. /api/onboarding/style-detection/analysis/{id} (OPTIONAL)
Purpose: Load existing analysis by ID
4. /api/onboarding/style-detection/session-analyses (OPTIONAL)
Purpose: Get last analysis from session for pre-filling
Complete Data Structure Collected
1. Writing Style (writing_style)
{
"tone": "Professional, Informative",
"voice": "Active, Direct",
"complexity": "Moderate",
"engagement_level": "High",
"brand_personality": "Trustworthy, Expert",
"formality_level": "Semi-formal",
"emotional_appeal": "Rational with emotional hooks"
}
2. Content Characteristics (content_characteristics)
{
"sentence_structure": "Mix of short and medium sentences",
"vocabulary_level": "Professional/Business",
"paragraph_organization": "Clear topic sentences",
"content_flow": "Logical progression",
"readability_score": "8th-10th grade",
"content_density": "Information-rich",
"visual_elements_usage": "Moderate"
}
3. Target Audience (target_audience)
{
"demographics": ["B2B", "Enterprise clients", "IT professionals"],
"expertise_level": "Intermediate to Advanced",
"industry_focus": "Technology/SaaS",
"geographic_focus": "Global, US-focused",
"psychographic_profile": "Innovation-driven, ROI-focused",
"pain_points": ["Efficiency", "Scalability"],
"motivations": ["Business growth", "Competitive advantage"]
}
4. Content Type (content_type)
{
"primary_type": "Educational/Thought Leadership",
"secondary_types": ["Case Studies", "Product Descriptions"],
"purpose": "Inform and convert",
"call_to_action": "Demo request, Free trial",
"conversion_focus": "Lead generation",
"educational_value": "High"
}
5. Brand Analysis (brand_analysis) ⭐ IMPORTANT
{
"brand_voice": "Authoritative yet approachable",
"brand_values": ["Innovation", "Reliability", "Customer success"],
"brand_positioning": "Premium solution provider",
"competitive_differentiation": "AI-powered automation",
"trust_signals": ["Case studies", "Testimonials", "Security badges"],
"authority_indicators": ["Industry certifications", "Expert team"]
}
6. Content Strategy Insights (content_strategy_insights) ⭐ IMPORTANT
{
"strengths": [
"Clear value proposition",
"Strong technical authority",
"Engaging storytelling"
],
"weaknesses": [
"Limited social proof",
"Technical jargon overuse"
],
"opportunities": [
"Video content",
"Interactive demos",
"Industry thought leadership"
],
"threats": [
"Competitor content marketing",
"Market saturation"
],
"recommended_improvements": [
"Add more case studies",
"Simplify technical explanations",
"Increase content frequency"
],
"content_gaps": [
"Beginner-level tutorials",
"Comparison guides",
"Industry trend analysis"
]
}
7. Recommended Settings (recommended_settings)
{
"writing_tone": "Professional yet conversational",
"target_audience": "B2B decision makers",
"content_type": "Educational with conversion focus",
"creativity_level": "Balanced",
"geographic_location": "US/Global",
"industry_context": "B2B SaaS"
}
8. Crawl Result (crawl_result)
{
"content": "Full crawled text content...",
"success": true,
"timestamp": "2025-10-11T12:00:00Z"
}
9. Style Patterns (style_patterns)
{
"style_consistency": {
"consistency_score": 0.85,
"common_patterns": ["Data-driven claims", "Action-oriented CTAs"],
"variations": ["Blog vs landing page tone"]
},
"unique_elements": [
"Custom terminology",
"Brand-specific phrases",
"Signature formatting"
]
}
10. Style Guidelines (style_guidelines)
{
"guidelines": [
"Use active voice",
"Start with benefit statements",
"Support claims with data"
],
"best_practices": [
"Lead with customer pain points",
"Include social proof",
"Clear CTAs"
],
"avoid_elements": [
"Passive voice",
"Overly technical jargon",
"Generic claims"
],
"content_strategy": [
"Focus on thought leadership",
"Build trust through expertise",
"Address buyer journey stages"
],
"ai_generation_tips": [
"Emphasize ROI and metrics",
"Use industry-specific examples",
"Balance technical depth with clarity"
],
"competitive_advantages": [
"Unique positioning statement",
"Differentiating features",
"Customer success stories"
],
"content_calendar_suggestions": [
"Weekly blog posts",
"Monthly case studies",
"Quarterly industry reports"
]
}
Current Database Storage (OnboardingDatabaseService)
What's Saved to onboarding_sessions.website_analyses Table:
File: backend/services/onboarding_database_service.py (Line 173)
WebsiteAnalysis(
session_id=session.id,
website_url=analysis_data.get('website_url'),
writing_style=analysis_data.get('writing_style'), # ✅
content_characteristics=analysis_data.get('content_characteristics'), # ✅
target_audience=analysis_data.get('target_audience'), # ✅
content_type=analysis_data.get('content_type'), # ✅
recommended_settings=analysis_data.get('recommended_settings'),# ✅
crawl_result=analysis_data.get('crawl_result'), # ✅
style_patterns=analysis_data.get('style_patterns'), # ✅
style_guidelines=analysis_data.get('style_guidelines'), # ✅
status='completed'
)
❌ What's MISSING from Database Storage:
- brand_analysis - NOT saved to
onboarding_database_service - content_strategy_insights - NOT saved to
onboarding_database_service
✅ What's Saved to website_analyses Table (via WebsiteAnalysisService):
File: backend/services/website_analysis_service.py (Lines 44-87)
This service saves to a DIFFERENT table (website_analyses not onboarding_sessions.website_analyses).
# Saves to: website_analyses table
WebsiteAnalysis(
session_id=session_id, # Integer session ID
website_url=website_url,
writing_style=style_analysis.get('writing_style'),
content_characteristics=style_analysis.get('content_characteristics'),
target_audience=style_analysis.get('target_audience'),
content_type=style_analysis.get('content_type'),
recommended_settings=style_analysis.get('recommended_settings'),
brand_analysis=style_analysis.get('brand_analysis'), # ✅ SAVED HERE!
content_strategy_insights=style_analysis.get('content_strategy_insights'), # ✅ SAVED HERE!
crawl_result=analysis_data.get('crawl_result'),
style_patterns=analysis_data.get('style_patterns'),
style_guidelines=analysis_data.get('style_guidelines'),
status='completed'
)
The Problem: Dual Database Persistence
We have TWO separate database save operations happening:
1. /style-detection/complete endpoint (component_logic.py)
- Saves to
website_analysestable viaWebsiteAnalysisService - Uses Integer session_id (converted from Clerk ID via SHA256)
- Saves ALL fields including
brand_analysisandcontent_strategy_insights
2. OnboardingProgress.save_progress() (api_key_manager.py)
- Saves to
onboarding_sessions.website_analysestable viaOnboardingDatabaseService - Uses String user_id (Clerk ID)
- MISSING
brand_analysisandcontent_strategy_insights
Current Frontend Data Structure
File: frontend/src/components/OnboardingWizard/WebsiteStep.tsx (Line 386)
const stepData = {
website: fixedUrl, // ← Should be "website_url"
domainName: domainName,
analysis: { // ← Nested structure
writing_style: {...},
content_characteristics: {...},
target_audience: {...},
content_type: {...},
brand_analysis: {...}, // ✅ Present
content_strategy_insights: {...}, // ✅ Present
recommended_settings: {...},
// ... ALL the fields from API response
guidelines: [...],
best_practices: [...],
avoid_elements: [...],
style_patterns: {...},
// etc.
},
useAnalysisForGenAI: true
};
Solution Required
1. Fix Data Transformation (COMPLETED ✅)
File: backend/services/api_key_manager.py (Line 278)
Already fixed to flatten the structure:
elif step.step_number == 2: # Website Analysis
# Transform frontend data structure to match database schema
analysis_for_db = {
'website_url': step.data.get('website', ''),
'status': 'completed'
}
# Merge analysis fields if they exist
if 'analysis' in step.data and step.data['analysis']:
analysis_for_db.update(step.data['analysis'])
self.db_service.save_website_analysis(self.user_id, analysis_for_db, db)
2. Update OnboardingDatabaseService to Save ALL Fields
File: backend/services/onboarding_database_service.py
NEEDED: Add brand_analysis and content_strategy_insights to the save operation.
Check if WebsiteAnalysis model has these columns:
# Line 206-213 (existing code)
website_url=analysis_data.get('website_url', ''),
writing_style=analysis_data.get('writing_style'),
content_characteristics=analysis_data.get('content_characteristics'),
target_audience=analysis_data.get('target_audience'),
content_type=analysis_data.get('content_type'),
recommended_settings=analysis_data.get('recommended_settings'),
brand_analysis=analysis_data.get('brand_analysis'), # ← ADD THIS
content_strategy_insights=analysis_data.get('content_strategy_insights'), # ← ADD THIS
crawl_result=analysis_data.get('crawl_result'),
style_patterns=analysis_data.get('style_patterns'),
style_guidelines=analysis_data.get('style_guidelines'),
3. Verify Database Model Supports These Fields
File: backend/models/onboarding.py
Check WebsiteAnalysis model for:
brand_analysiscolumn (JSON)content_strategy_insightscolumn (JSON)
If missing, add migration.
Recommendation
- ✅ Data transformation fix is complete (api_key_manager.py updated)
- ⏳ Check WebsiteAnalysis model for brand_analysis and content_strategy_insights columns
- ⏳ Update OnboardingDatabaseService.save_website_analysis() to include these fields
- ⏳ Restart backend to apply changes
- ⏳ Re-run Step 2 to save complete data
- ⏳ Verify Step 6 displays all fields
Benefits of Complete Data Storage
With brand_analysis and content_strategy_insights saved:
- Better Content Generation: AI can align with brand values
- Strategic Insights: SWOT analysis informs content strategy
- Competitive Intelligence: Differentiation factors for positioning
- Content Planning: Recommendations and calendar suggestions
- Quality Assurance: Consistency checking against brand guidelines
Status
- ✅ API endpoint returns complete data
- ✅ Frontend receives and displays complete data
- ✅ Data transformation fix applied (flattening structure)
- ⏳ Database model verification needed
- ⏳ OnboardingDatabaseService update needed
- ⏳ Testing required
Next Action: Check WebsiteAnalysis model and update OnboardingDatabaseService to save ALL fields.