Add brand analysis columns to onboarding database and migration scripts
This commit is contained in:
435
docs/STEP_2_COMPLETE_DATA_FLOW_ANALYSIS.md
Normal file
435
docs/STEP_2_COMPLETE_DATA_FLOW_ANALYSIS.md
Normal file
@@ -0,0 +1,435 @@
|
||||
# Step 2 (Website Analysis) - Complete Data Flow Analysis
|
||||
|
||||
## Overview
|
||||
|
||||
Step 2 performs comprehensive website analysis including crawling, style detection, pattern analysis, and guideline generation. This document maps the complete data flow from frontend to database.
|
||||
|
||||
## API Endpoints Called
|
||||
|
||||
### 1. `/api/onboarding/style-detection/complete` (PRIMARY)
|
||||
|
||||
**Purpose**: Main analysis endpoint that performs the complete workflow
|
||||
|
||||
**Request** (`POST`):
|
||||
```typescript
|
||||
{
|
||||
url: string,
|
||||
include_patterns: true,
|
||||
include_guidelines: true
|
||||
}
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```typescript
|
||||
{
|
||||
success: boolean,
|
||||
crawl_result: {
|
||||
content: string,
|
||||
success: boolean,
|
||||
timestamp: string
|
||||
},
|
||||
style_analysis: {
|
||||
writing_style: {...},
|
||||
content_characteristics: {...},
|
||||
target_audience: {...},
|
||||
content_type: {...},
|
||||
recommended_settings: {...},
|
||||
brand_analysis: {...}, // ← Rich brand insights
|
||||
content_strategy_insights: {...} // ← SWOT analysis
|
||||
},
|
||||
style_patterns: {
|
||||
style_consistency: {...},
|
||||
unique_elements: {...}
|
||||
},
|
||||
style_guidelines: {
|
||||
guidelines: [...],
|
||||
best_practices: [...],
|
||||
avoid_elements: [...],
|
||||
content_strategy: [...],
|
||||
ai_generation_tips: [...],
|
||||
competitive_advantages: [...],
|
||||
content_calendar_suggestions: [...]
|
||||
},
|
||||
analysis_id: number,
|
||||
warning?: string
|
||||
}
|
||||
```
|
||||
|
||||
### 2. `/api/onboarding/style-detection/check-existing/{url}` (OPTIONAL)
|
||||
|
||||
**Purpose**: Check if analysis already exists for this URL
|
||||
|
||||
**Response**:
|
||||
```typescript
|
||||
{
|
||||
exists: boolean,
|
||||
analysis_id?: number,
|
||||
analysis?: {...} // Full analysis data if exists
|
||||
}
|
||||
```
|
||||
|
||||
### 3. `/api/onboarding/style-detection/analysis/{id}` (OPTIONAL)
|
||||
|
||||
**Purpose**: Load existing analysis by ID
|
||||
|
||||
### 4. `/api/onboarding/style-detection/session-analyses` (OPTIONAL)
|
||||
|
||||
**Purpose**: Get last analysis from session for pre-filling
|
||||
|
||||
## Complete Data Structure Collected
|
||||
|
||||
### 1. **Writing Style** (`writing_style`)
|
||||
```json
|
||||
{
|
||||
"tone": "Professional, Informative",
|
||||
"voice": "Active, Direct",
|
||||
"complexity": "Moderate",
|
||||
"engagement_level": "High",
|
||||
"brand_personality": "Trustworthy, Expert",
|
||||
"formality_level": "Semi-formal",
|
||||
"emotional_appeal": "Rational with emotional hooks"
|
||||
}
|
||||
```
|
||||
|
||||
### 2. **Content Characteristics** (`content_characteristics`)
|
||||
```json
|
||||
{
|
||||
"sentence_structure": "Mix of short and medium sentences",
|
||||
"vocabulary_level": "Professional/Business",
|
||||
"paragraph_organization": "Clear topic sentences",
|
||||
"content_flow": "Logical progression",
|
||||
"readability_score": "8th-10th grade",
|
||||
"content_density": "Information-rich",
|
||||
"visual_elements_usage": "Moderate"
|
||||
}
|
||||
```
|
||||
|
||||
### 3. **Target Audience** (`target_audience`)
|
||||
```json
|
||||
{
|
||||
"demographics": ["B2B", "Enterprise clients", "IT professionals"],
|
||||
"expertise_level": "Intermediate to Advanced",
|
||||
"industry_focus": "Technology/SaaS",
|
||||
"geographic_focus": "Global, US-focused",
|
||||
"psychographic_profile": "Innovation-driven, ROI-focused",
|
||||
"pain_points": ["Efficiency", "Scalability"],
|
||||
"motivations": ["Business growth", "Competitive advantage"]
|
||||
}
|
||||
```
|
||||
|
||||
### 4. **Content Type** (`content_type`)
|
||||
```json
|
||||
{
|
||||
"primary_type": "Educational/Thought Leadership",
|
||||
"secondary_types": ["Case Studies", "Product Descriptions"],
|
||||
"purpose": "Inform and convert",
|
||||
"call_to_action": "Demo request, Free trial",
|
||||
"conversion_focus": "Lead generation",
|
||||
"educational_value": "High"
|
||||
}
|
||||
```
|
||||
|
||||
### 5. **Brand Analysis** (`brand_analysis`) ⭐ **IMPORTANT**
|
||||
```json
|
||||
{
|
||||
"brand_voice": "Authoritative yet approachable",
|
||||
"brand_values": ["Innovation", "Reliability", "Customer success"],
|
||||
"brand_positioning": "Premium solution provider",
|
||||
"competitive_differentiation": "AI-powered automation",
|
||||
"trust_signals": ["Case studies", "Testimonials", "Security badges"],
|
||||
"authority_indicators": ["Industry certifications", "Expert team"]
|
||||
}
|
||||
```
|
||||
|
||||
### 6. **Content Strategy Insights** (`content_strategy_insights`) ⭐ **IMPORTANT**
|
||||
```json
|
||||
{
|
||||
"strengths": [
|
||||
"Clear value proposition",
|
||||
"Strong technical authority",
|
||||
"Engaging storytelling"
|
||||
],
|
||||
"weaknesses": [
|
||||
"Limited social proof",
|
||||
"Technical jargon overuse"
|
||||
],
|
||||
"opportunities": [
|
||||
"Video content",
|
||||
"Interactive demos",
|
||||
"Industry thought leadership"
|
||||
],
|
||||
"threats": [
|
||||
"Competitor content marketing",
|
||||
"Market saturation"
|
||||
],
|
||||
"recommended_improvements": [
|
||||
"Add more case studies",
|
||||
"Simplify technical explanations",
|
||||
"Increase content frequency"
|
||||
],
|
||||
"content_gaps": [
|
||||
"Beginner-level tutorials",
|
||||
"Comparison guides",
|
||||
"Industry trend analysis"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 7. **Recommended Settings** (`recommended_settings`)
|
||||
```json
|
||||
{
|
||||
"writing_tone": "Professional yet conversational",
|
||||
"target_audience": "B2B decision makers",
|
||||
"content_type": "Educational with conversion focus",
|
||||
"creativity_level": "Balanced",
|
||||
"geographic_location": "US/Global",
|
||||
"industry_context": "B2B SaaS"
|
||||
}
|
||||
```
|
||||
|
||||
### 8. **Crawl Result** (`crawl_result`)
|
||||
```json
|
||||
{
|
||||
"content": "Full crawled text content...",
|
||||
"success": true,
|
||||
"timestamp": "2025-10-11T12:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### 9. **Style Patterns** (`style_patterns`)
|
||||
```json
|
||||
{
|
||||
"style_consistency": {
|
||||
"consistency_score": 0.85,
|
||||
"common_patterns": ["Data-driven claims", "Action-oriented CTAs"],
|
||||
"variations": ["Blog vs landing page tone"]
|
||||
},
|
||||
"unique_elements": [
|
||||
"Custom terminology",
|
||||
"Brand-specific phrases",
|
||||
"Signature formatting"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 10. **Style Guidelines** (`style_guidelines`)
|
||||
```json
|
||||
{
|
||||
"guidelines": [
|
||||
"Use active voice",
|
||||
"Start with benefit statements",
|
||||
"Support claims with data"
|
||||
],
|
||||
"best_practices": [
|
||||
"Lead with customer pain points",
|
||||
"Include social proof",
|
||||
"Clear CTAs"
|
||||
],
|
||||
"avoid_elements": [
|
||||
"Passive voice",
|
||||
"Overly technical jargon",
|
||||
"Generic claims"
|
||||
],
|
||||
"content_strategy": [
|
||||
"Focus on thought leadership",
|
||||
"Build trust through expertise",
|
||||
"Address buyer journey stages"
|
||||
],
|
||||
"ai_generation_tips": [
|
||||
"Emphasize ROI and metrics",
|
||||
"Use industry-specific examples",
|
||||
"Balance technical depth with clarity"
|
||||
],
|
||||
"competitive_advantages": [
|
||||
"Unique positioning statement",
|
||||
"Differentiating features",
|
||||
"Customer success stories"
|
||||
],
|
||||
"content_calendar_suggestions": [
|
||||
"Weekly blog posts",
|
||||
"Monthly case studies",
|
||||
"Quarterly industry reports"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Current Database Storage (OnboardingDatabaseService)
|
||||
|
||||
### What's Saved to `onboarding_sessions.website_analyses` Table:
|
||||
|
||||
**File**: `backend/services/onboarding_database_service.py` (Line 173)
|
||||
|
||||
```python
|
||||
WebsiteAnalysis(
|
||||
session_id=session.id,
|
||||
website_url=analysis_data.get('website_url'),
|
||||
writing_style=analysis_data.get('writing_style'), # ✅
|
||||
content_characteristics=analysis_data.get('content_characteristics'), # ✅
|
||||
target_audience=analysis_data.get('target_audience'), # ✅
|
||||
content_type=analysis_data.get('content_type'), # ✅
|
||||
recommended_settings=analysis_data.get('recommended_settings'),# ✅
|
||||
crawl_result=analysis_data.get('crawl_result'), # ✅
|
||||
style_patterns=analysis_data.get('style_patterns'), # ✅
|
||||
style_guidelines=analysis_data.get('style_guidelines'), # ✅
|
||||
status='completed'
|
||||
)
|
||||
```
|
||||
|
||||
### ❌ What's MISSING from Database Storage:
|
||||
|
||||
1. **brand_analysis** - NOT saved to `onboarding_database_service`
|
||||
2. **content_strategy_insights** - NOT saved to `onboarding_database_service`
|
||||
|
||||
### ✅ What's Saved to `website_analyses` Table (via WebsiteAnalysisService):
|
||||
|
||||
**File**: `backend/services/website_analysis_service.py` (Lines 44-87)
|
||||
|
||||
This service saves to a DIFFERENT table (`website_analyses` not `onboarding_sessions.website_analyses`).
|
||||
|
||||
```python
|
||||
# Saves to: website_analyses table
|
||||
WebsiteAnalysis(
|
||||
session_id=session_id, # Integer session ID
|
||||
website_url=website_url,
|
||||
writing_style=style_analysis.get('writing_style'),
|
||||
content_characteristics=style_analysis.get('content_characteristics'),
|
||||
target_audience=style_analysis.get('target_audience'),
|
||||
content_type=style_analysis.get('content_type'),
|
||||
recommended_settings=style_analysis.get('recommended_settings'),
|
||||
brand_analysis=style_analysis.get('brand_analysis'), # ✅ SAVED HERE!
|
||||
content_strategy_insights=style_analysis.get('content_strategy_insights'), # ✅ SAVED HERE!
|
||||
crawl_result=analysis_data.get('crawl_result'),
|
||||
style_patterns=analysis_data.get('style_patterns'),
|
||||
style_guidelines=analysis_data.get('style_guidelines'),
|
||||
status='completed'
|
||||
)
|
||||
```
|
||||
|
||||
## The Problem: Dual Database Persistence
|
||||
|
||||
We have **TWO separate database save operations** happening:
|
||||
|
||||
### 1. `/style-detection/complete` endpoint (component_logic.py)
|
||||
- Saves to `website_analyses` table via `WebsiteAnalysisService`
|
||||
- Uses **Integer session_id** (converted from Clerk ID via SHA256)
|
||||
- Saves **ALL fields** including `brand_analysis` and `content_strategy_insights`
|
||||
|
||||
### 2. `OnboardingProgress.save_progress()` (api_key_manager.py)
|
||||
- Saves to `onboarding_sessions.website_analyses` table via `OnboardingDatabaseService`
|
||||
- Uses **String user_id** (Clerk ID)
|
||||
- **MISSING** `brand_analysis` and `content_strategy_insights`
|
||||
|
||||
## Current Frontend Data Structure
|
||||
|
||||
**File**: `frontend/src/components/OnboardingWizard/WebsiteStep.tsx` (Line 386)
|
||||
|
||||
```typescript
|
||||
const stepData = {
|
||||
website: fixedUrl, // ← Should be "website_url"
|
||||
domainName: domainName,
|
||||
analysis: { // ← Nested structure
|
||||
writing_style: {...},
|
||||
content_characteristics: {...},
|
||||
target_audience: {...},
|
||||
content_type: {...},
|
||||
brand_analysis: {...}, // ✅ Present
|
||||
content_strategy_insights: {...}, // ✅ Present
|
||||
recommended_settings: {...},
|
||||
// ... ALL the fields from API response
|
||||
guidelines: [...],
|
||||
best_practices: [...],
|
||||
avoid_elements: [...],
|
||||
style_patterns: {...},
|
||||
// etc.
|
||||
},
|
||||
useAnalysisForGenAI: true
|
||||
};
|
||||
```
|
||||
|
||||
## Solution Required
|
||||
|
||||
### 1. Fix Data Transformation (COMPLETED ✅)
|
||||
|
||||
**File**: `backend/services/api_key_manager.py` (Line 278)
|
||||
|
||||
Already fixed to flatten the structure:
|
||||
|
||||
```python
|
||||
elif step.step_number == 2: # Website Analysis
|
||||
# Transform frontend data structure to match database schema
|
||||
analysis_for_db = {
|
||||
'website_url': step.data.get('website', ''),
|
||||
'status': 'completed'
|
||||
}
|
||||
# Merge analysis fields if they exist
|
||||
if 'analysis' in step.data and step.data['analysis']:
|
||||
analysis_for_db.update(step.data['analysis'])
|
||||
|
||||
self.db_service.save_website_analysis(self.user_id, analysis_for_db, db)
|
||||
```
|
||||
|
||||
### 2. Update OnboardingDatabaseService to Save ALL Fields
|
||||
|
||||
**File**: `backend/services/onboarding_database_service.py`
|
||||
|
||||
**NEEDED**: Add `brand_analysis` and `content_strategy_insights` to the save operation.
|
||||
|
||||
Check if `WebsiteAnalysis` model has these columns:
|
||||
|
||||
```python
|
||||
# Line 206-213 (existing code)
|
||||
website_url=analysis_data.get('website_url', ''),
|
||||
writing_style=analysis_data.get('writing_style'),
|
||||
content_characteristics=analysis_data.get('content_characteristics'),
|
||||
target_audience=analysis_data.get('target_audience'),
|
||||
content_type=analysis_data.get('content_type'),
|
||||
recommended_settings=analysis_data.get('recommended_settings'),
|
||||
brand_analysis=analysis_data.get('brand_analysis'), # ← ADD THIS
|
||||
content_strategy_insights=analysis_data.get('content_strategy_insights'), # ← ADD THIS
|
||||
crawl_result=analysis_data.get('crawl_result'),
|
||||
style_patterns=analysis_data.get('style_patterns'),
|
||||
style_guidelines=analysis_data.get('style_guidelines'),
|
||||
```
|
||||
|
||||
### 3. Verify Database Model Supports These Fields
|
||||
|
||||
**File**: `backend/models/onboarding.py`
|
||||
|
||||
Check `WebsiteAnalysis` model for:
|
||||
- `brand_analysis` column (JSON)
|
||||
- `content_strategy_insights` column (JSON)
|
||||
|
||||
If missing, add migration.
|
||||
|
||||
## Recommendation
|
||||
|
||||
1. ✅ **Data transformation fix is complete** (api_key_manager.py updated)
|
||||
2. ⏳ **Check WebsiteAnalysis model** for brand_analysis and content_strategy_insights columns
|
||||
3. ⏳ **Update OnboardingDatabaseService.save_website_analysis()** to include these fields
|
||||
4. ⏳ **Restart backend** to apply changes
|
||||
5. ⏳ **Re-run Step 2** to save complete data
|
||||
6. ⏳ **Verify Step 6** displays all fields
|
||||
|
||||
## Benefits of Complete Data Storage
|
||||
|
||||
With `brand_analysis` and `content_strategy_insights` saved:
|
||||
|
||||
1. **Better Content Generation**: AI can align with brand values
|
||||
2. **Strategic Insights**: SWOT analysis informs content strategy
|
||||
3. **Competitive Intelligence**: Differentiation factors for positioning
|
||||
4. **Content Planning**: Recommendations and calendar suggestions
|
||||
5. **Quality Assurance**: Consistency checking against brand guidelines
|
||||
|
||||
## Status
|
||||
|
||||
- ✅ API endpoint returns complete data
|
||||
- ✅ Frontend receives and displays complete data
|
||||
- ✅ Data transformation fix applied (flattening structure)
|
||||
- ⏳ Database model verification needed
|
||||
- ⏳ OnboardingDatabaseService update needed
|
||||
- ⏳ Testing required
|
||||
|
||||
---
|
||||
|
||||
**Next Action**: Check `WebsiteAnalysis` model and update `OnboardingDatabaseService` to save ALL fields.
|
||||
|
||||
Reference in New Issue
Block a user