Added documentation for the auto-population feature and the analytics integration.
This commit is contained in:
@@ -0,0 +1,486 @@
|
||||
# Auto-Population Code Walkthrough
|
||||
|
||||
## Overview
|
||||
|
||||
This document provides a comprehensive code walkthrough of the auto-population feature that fills 30 strategy input fields using onboarding data and AI insights.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Flow Overview](#flow-overview)
|
||||
2. [Frontend Flow](#frontend-flow)
|
||||
3. [Backend Flow](#backend-flow)
|
||||
4. [Database Tables Used](#database-tables-used)
|
||||
5. [Field Mapping](#field-mapping)
|
||||
6. [AI Integration](#ai-integration)
|
||||
7. [API Calls and Subscription Checks](#api-calls-and-subscription-checks)
|
||||
|
||||
## Flow Overview
|
||||
|
||||
### High-Level Flow
|
||||
|
||||
```
|
||||
User Clicks "Auto-Populate Fields"
|
||||
↓
|
||||
Frontend: AutoPopulationConsentModal (User Consent)
|
||||
↓
|
||||
Frontend: strategyBuilderStore.autoPopulateFromOnboarding()
|
||||
↓
|
||||
Frontend: API Call to /api/content-planning/enhanced-strategies/onboarding-data
|
||||
↓
|
||||
Backend: utility_endpoints.py → get_onboarding_data()
|
||||
↓
|
||||
Backend: EnhancedStrategyService._get_onboarding_data()
|
||||
↓
|
||||
Backend: DataProcessorService.get_onboarding_data()
|
||||
↓
|
||||
Backend: AutoFillService.get_autofill()
|
||||
↓
|
||||
Backend: OnboardingDataIntegrationService.process_onboarding_data() (Database Queries)
|
||||
↓
|
||||
Backend: AutoFillService.get_autofill() → Normalizers + Transformers
|
||||
↓
|
||||
Backend: AIStructuredAutofillService.generate_autofill_fields() (AI Generation)
|
||||
↓
|
||||
Backend: AIServiceManager.execute_structured_json_call() (AI API Call)
|
||||
↓
|
||||
Backend: Response with 30 fields
|
||||
↓
|
||||
Frontend: Store fields in strategyBuilderStore
|
||||
↓
|
||||
Frontend: Display fields in ContentStrategyBuilder
|
||||
```
|
||||
|
||||
## Frontend Flow
|
||||
|
||||
### 1. User Consent Modal
|
||||
|
||||
**File**: `frontend/src/components/ContentPlanningDashboard/components/AutoPopulationConsentModal.tsx`
|
||||
|
||||
- **Purpose**: Explains auto-population to non-technical users (content creators, digital marketers, solopreneurs)
|
||||
- **Features**:
|
||||
- Clear explanation of what auto-population does
|
||||
- Benefits (Instant Setup, AI-Powered Insights, Your Data Your Control, Always Editable)
|
||||
- Data sources used (Website Analysis, Research Preferences, Business Details, AI Analysis)
|
||||
- Two buttons: "Skip Auto-Population" (Cancel) and "Auto-Populate Fields" (Confirm)
|
||||
|
||||
### 2. ContentStrategyBuilder Component
|
||||
|
||||
**File**: `frontend/src/components/ContentPlanningDashboard/components/ContentStrategyBuilder.tsx`
|
||||
|
||||
**Key Changes**:
|
||||
- Removed automatic `useEffect` that triggered auto-population on mount
|
||||
- Added consent modal state: `showAutoPopulationConsentModal`
|
||||
- Added consent tracking: `autoPopulateConsentAsked` (persisted in sessionStorage)
|
||||
- Modal shows on first mount (with 500ms delay for rendering)
|
||||
- Auto-population only triggers after user clicks "Auto-Populate Fields"
|
||||
|
||||
**State Management**:
|
||||
```typescript
|
||||
const [showAutoPopulationConsentModal, setShowAutoPopulationConsentModal] = useState(false);
|
||||
const [autoPopulateConsentAsked, setAutoPopulateConsentAsked] = useState(() => {
|
||||
return sessionStorage.getItem('autoPopulateConsentAsked') === 'true';
|
||||
});
|
||||
const [autoPopulateAttempted, setAutoPopulateAttempted] = useState(false);
|
||||
```
|
||||
|
||||
**Consent Handlers**:
|
||||
- `handleAutoPopulationConsent()`: Triggers auto-population, saves consent to sessionStorage
|
||||
- `handleAutoPopulationCancel()`: Skips auto-population, saves consent to sessionStorage
|
||||
|
||||
### 3. Strategy Builder Store
|
||||
|
||||
**File**: `frontend/src/stores/strategyBuilderStore.ts`
|
||||
|
||||
**Function**: `autoPopulateFromOnboarding(forceRefresh?: boolean)`
|
||||
|
||||
**Steps**:
|
||||
1. **Global Protection**: Checks `isAutoPopulating` flag to prevent multiple simultaneous calls
|
||||
2. **Validation**: Checks if already populated (unless `forceRefresh`)
|
||||
3. **API Call**: Calls `contentPlanningApi.getOnboardingData()`
|
||||
4. **Response Processing**:
|
||||
- Extracts `fields`, `sources`, `input_data_points` from response
|
||||
- Validates AI generation success (`meta.ai_used` and `meta.ai_overrides_count > 0`)
|
||||
- Transforms field values and stores in:
|
||||
- `fieldValues`: Form data
|
||||
- `autoPopulatedFields`: Tracking which fields were auto-populated
|
||||
- `personalizationData`: User data used
|
||||
- `confidenceScores`: AI confidence scores
|
||||
5. **State Update**: Updates store with populated fields
|
||||
|
||||
**API Endpoint**: `GET /api/content-planning/enhanced-strategies/onboarding-data`
|
||||
|
||||
## Backend Flow
|
||||
|
||||
### 1. API Endpoint
|
||||
|
||||
**File**: `backend/api/content_planning/api/content_strategy/endpoints/utility_endpoints.py`
|
||||
|
||||
**Endpoint**: `GET /onboarding-data`
|
||||
|
||||
**Authentication**: Required (`get_current_user`)
|
||||
|
||||
**Flow**:
|
||||
1. Extracts `user_id` from authenticated token
|
||||
2. Creates `EnhancedStrategyDBService` and `EnhancedStrategyService`
|
||||
3. Calls `enhanced_service._get_onboarding_data(user_id)`
|
||||
4. Returns response via `ResponseBuilder.create_success_response()`
|
||||
|
||||
### 2. Enhanced Strategy Service
|
||||
|
||||
**File**: `backend/api/content_planning/services/enhanced_strategy_service.py`
|
||||
|
||||
**Method**: `_get_onboarding_data(user_id: int)`
|
||||
|
||||
**Flow**:
|
||||
1. Calls `core_service.data_processor_service.get_onboarding_data(user_id)`
|
||||
2. Returns processed onboarding data
|
||||
|
||||
### 3. Data Processor Service
|
||||
|
||||
**File**: `backend/api/content_planning/services/content_strategy/utils/data_processors.py`
|
||||
|
||||
**Class**: `DataProcessorService`
|
||||
|
||||
**Method**: `async def get_onboarding_data(user_id: int)`
|
||||
|
||||
**Flow**:
|
||||
1. Creates `AutoFillService(db)` instance
|
||||
2. Calls `service.get_autofill(user_id)`
|
||||
3. Returns comprehensive onboarding data payload
|
||||
|
||||
### 4. AutoFill Service
|
||||
|
||||
**File**: `backend/api/content_planning/services/content_strategy/autofill/autofill_service.py`
|
||||
|
||||
**Class**: `AutoFillService`
|
||||
|
||||
**Method**: `async def get_autofill(user_id: int)`
|
||||
|
||||
**Steps**:
|
||||
1. **Integration**: Calls `integration.process_onboarding_data(user_id, db)` to collect raw data
|
||||
2. **Normalization**:
|
||||
- `normalize_website_analysis(website_raw)`
|
||||
- `normalize_research_preferences(research_raw)`
|
||||
- `normalize_api_keys(api_raw)`
|
||||
3. **Quality Assessment**:
|
||||
- `calculate_quality_scores_from_raw()`
|
||||
- `calculate_confidence_from_raw()`
|
||||
- `calculate_data_freshness()`
|
||||
4. **Transformation**: Calls `transform_to_fields()` to map to 30 frontend fields
|
||||
5. **Transparency**:
|
||||
- `build_data_sources_map()` (field → data source mapping)
|
||||
- `build_input_data_points()` (detailed input data points)
|
||||
6. **Validation**: Validates output structure
|
||||
7. **Return**: Returns payload with fields, sources, quality scores, confidence levels, data freshness, input data points
|
||||
|
||||
**Note**: This service does NOT use AI. It only transforms existing onboarding data.
|
||||
|
||||
### 5. Onboarding Data Integration Service
|
||||
|
||||
**File**: `backend/api/content_planning/services/content_strategy/onboarding/data_integration.py`
|
||||
|
||||
**Class**: `OnboardingDataIntegrationService`
|
||||
|
||||
**Method**: `async def process_onboarding_data(user_id: int, db: Session)`
|
||||
|
||||
**Database Queries**:
|
||||
1. **Website Analysis**:
|
||||
- Queries `OnboardingSession` for latest session
|
||||
- Queries `WebsiteAnalysis` for latest analysis
|
||||
- Returns: `website_url`, `content_goals`, `target_metrics`, `performance_metrics`, `competitors`, `target_audience`, `writing_style`, etc.
|
||||
|
||||
2. **Research Preferences**:
|
||||
- Queries `ResearchPreferences` for session
|
||||
- Returns: `research_depth`, `content_types`, `target_audience`, `audience_research`, `content_preferences`, etc.
|
||||
|
||||
3. **API Keys**:
|
||||
- Queries `APIKey` for user
|
||||
- Returns: `providers`, `total_keys`, available services
|
||||
|
||||
4. **Onboarding Session**:
|
||||
- Queries `OnboardingSession` for user
|
||||
- Returns: `business_size`, `budget`, `team_size`, `timeline`, `region`, etc.
|
||||
|
||||
**Returns**: Integrated data dictionary with all sources
|
||||
|
||||
## Database Tables Used
|
||||
|
||||
### 1. `onboarding_sessions`
|
||||
|
||||
**Columns Used**:
|
||||
- `user_id` (filter)
|
||||
- `id` (join key)
|
||||
- `updated_at` (ordering)
|
||||
- `business_size`, `budget`, `team_size`, `timeline`, `region`, `progress`
|
||||
|
||||
### 2. `website_analyses`
|
||||
|
||||
**Columns Used**:
|
||||
- `session_id` (join key)
|
||||
- `updated_at` (ordering)
|
||||
- `website_url`, `status`, `content_goals`, `target_metrics`, `performance_metrics`, `competitors`, `target_audience`, `writing_style`, `content_type`, `content_characteristics`, `recommended_settings`, `style_guidelines`
|
||||
|
||||
### 3. `research_preferences`
|
||||
|
||||
**Columns Used**:
|
||||
- `session_id` (join key)
|
||||
- `research_depth`, `content_types`, `target_audience`, `audience_research`, `content_preferences`, `auto_research`, `factual_content`
|
||||
|
||||
### 4. `api_keys`
|
||||
|
||||
**Columns Used**:
|
||||
- `user_id` (filter)
|
||||
- `provider` (aggregation)
|
||||
- `is_active` (filter)
|
||||
|
||||
## Field Mapping
|
||||
|
||||
### 30 Fields Mapped to Onboarding Data
|
||||
|
||||
**File**: `backend/api/content_planning/services/content_strategy/autofill/transformer.py`
|
||||
|
||||
**Function**: `transform_to_fields()`
|
||||
|
||||
#### Business Context (8 fields)
|
||||
1. **business_objectives** → `website.content_goals`
|
||||
2. **target_metrics** → `website.target_metrics` or `website.performance_metrics`
|
||||
3. **content_budget** → `website.content_budget` or `session.budget`
|
||||
4. **team_size** → `website.team_size` or `session.team_size`
|
||||
5. **implementation_timeline** → `website.implementation_timeline` or `session.timeline`
|
||||
6. **market_share** → `website.market_share` or derived from `performance_metrics`
|
||||
7. **competitive_position** → `website.competitors` (derived)
|
||||
8. **performance_metrics** → `website.performance_metrics`
|
||||
|
||||
#### Audience Intelligence (6 fields)
|
||||
9. **content_preferences** → `research.content_preferences`
|
||||
10. **consumption_patterns** → `research.audience_intelligence.consumption_patterns`
|
||||
11. **audience_pain_points** → `research.audience_intelligence.pain_points`
|
||||
12. **buying_journey** → `research.audience_intelligence.buying_journey`
|
||||
13. **seasonal_trends** → Default: `['Q1: Planning', 'Q2: Execution', 'Q3: Optimization', 'Q4: Review']`
|
||||
14. **engagement_metrics** → Derived from `website.performance_metrics`
|
||||
|
||||
#### Competitive Intelligence (5 fields)
|
||||
15. **top_competitors** → `website.competitors`
|
||||
16. **competitor_content_strategies** → Default: `['Educational content', 'Case studies', 'Thought leadership']`
|
||||
17. **market_gaps** → `website.content_gaps`
|
||||
18. **industry_trends** → `research.industry_focus`
|
||||
19. **emerging_trends** → `research.trend_analysis`
|
||||
|
||||
#### Content Strategy (7 fields)
|
||||
20. **preferred_formats** → `research.content_types`
|
||||
21. **content_mix** → Derived from `research.content_types` and `website.content_goals`
|
||||
22. **content_frequency** → `research.content_calendar.frequency`
|
||||
23. **optimal_timing** → `research.content_calendar.timing`
|
||||
24. **quality_metrics** → Derived from `website.performance_metrics`
|
||||
25. **editorial_guidelines** → `website.style_guidelines`
|
||||
26. **brand_voice** → `website.writing_style.tone` or `session.brand_voice`
|
||||
|
||||
#### Performance & Analytics (4 fields)
|
||||
27. **traffic_sources** → Derived from `website.performance_metrics`
|
||||
28. **conversion_rates** → `website.performance_metrics.conversion_rate`
|
||||
29. **content_roi_targets** → Derived from `session.budget` and `performance_metrics`
|
||||
30. **ab_testing_capabilities** → Derived from `session.team_size`
|
||||
|
||||
## AI Integration
|
||||
|
||||
### When AI is Used
|
||||
|
||||
**File**: `backend/api/content_planning/services/content_strategy/autofill/ai_refresh.py`
|
||||
|
||||
**Class**: `AutoFillRefreshService`
|
||||
|
||||
**Critical Clarification**: The standard `AutoFillService.get_autofill()` does **NOT use AI**. It only transforms existing onboarding data using database queries and simple mappings.
|
||||
|
||||
**Standard Autofill (Default)**:
|
||||
- Uses `AutoFillService.get_autofill()` (NO AI)
|
||||
- Database queries only (0 tokens)
|
||||
- Direct mappings and simple derivations (~80%+ fields)
|
||||
- Fast (~100-200ms)
|
||||
- Used in standard "Auto-Populate Fields" flow
|
||||
|
||||
**AI Autofill (Optional - Refresh Flow)**:
|
||||
- Uses `AIStructuredAutofillService.generate_autofill_fields()` (WITH AI)
|
||||
- AI generation (3500-5000 tokens per call, up to 15,000 with retries)
|
||||
- Personalized values for missing/incomplete fields
|
||||
- Slower (~2-5 seconds per call)
|
||||
- Used in "Refresh Data (AI)" flow only
|
||||
|
||||
**AI is used in**:
|
||||
- `AutoFillRefreshService.build_fresh_payload()` (for refresh flows)
|
||||
- `AIStructuredAutofillService.generate_autofill_fields()` (for AI-only generation)
|
||||
|
||||
### AI Service
|
||||
|
||||
**File**: `backend/api/content_planning/services/content_strategy/autofill/ai_structured_autofill.py`
|
||||
|
||||
**Class**: `AIStructuredAutofillService`
|
||||
|
||||
**Method**: `async def generate_autofill_fields(user_id: int, context: Dict[str, Any])`
|
||||
|
||||
**Flow**:
|
||||
1. **Context Summary**: Builds personalized context from onboarding data
|
||||
2. **Schema**: Builds JSON schema for 30 fields
|
||||
3. **Prompt**: Builds personalized prompt with user's website URL, industry, business size, writing tone, target audience, etc.
|
||||
4. **AI Call**: Calls `self.ai.execute_structured_json_call()`
|
||||
- **Service Type**: `AIServiceType.STRATEGIC_INTELLIGENCE`
|
||||
- **Prompt**: Personalized prompt with user context
|
||||
- **Schema**: JSON schema with 30 field definitions
|
||||
5. **Retry Logic**: Up to 2 retries if success rate < 80% or missing fields > 6
|
||||
6. **Normalization**: Normalizes values (numbers, booleans, select options, arrays)
|
||||
7. **Validation**: Ensures all 30 fields are populated
|
||||
8. **Return**: Returns fields with metadata (ai_used, ai_overrides_count, success_rate, attempts)
|
||||
|
||||
### AI Service Manager
|
||||
|
||||
**File**: `backend/services/ai_service_manager.py` (referenced but not in content_planning)
|
||||
|
||||
**Method**: `execute_structured_json_call()`
|
||||
|
||||
**Flow**:
|
||||
1. Gets AI service (via `get_service_manager()`)
|
||||
2. Calls `main_text_generation()` with:
|
||||
- Prompt
|
||||
- Schema (JSON structure)
|
||||
- User ID (for subscription checks)
|
||||
3. **Subscription Check**: Uses `user_id` for pre-flight subscription validation
|
||||
4. **Pre-flight Check**: Validates subscription limits before API call
|
||||
5. **API Call**: Makes structured JSON call to AI provider (Gemini)
|
||||
6. **Response**: Returns structured JSON with 30 fields
|
||||
|
||||
### AI Prompts
|
||||
|
||||
**File**: `backend/api/content_planning/services/content_strategy/autofill/ai_structured_autofill.py`
|
||||
|
||||
**Method**: `_build_prompt(context_summary: Dict[str, Any])`
|
||||
|
||||
**Prompt Structure**:
|
||||
1. **Personalized Context**:
|
||||
- User profile (website URL, business size, region)
|
||||
- Content analysis (writing tone, content type, target demographics)
|
||||
- Audience insights (pain points, preferences, industry focus)
|
||||
- AI recommendations (recommended tone, content type, style guidelines)
|
||||
- Research configuration (research depth, content types, auto research)
|
||||
- API capabilities (available services, providers)
|
||||
|
||||
2. **Instructions**:
|
||||
- Generate 30 fields personalized for user's website
|
||||
- Avoid generic placeholder values
|
||||
- Use real insights from website analysis
|
||||
- Make each field specific to user's business
|
||||
|
||||
3. **Field Examples**: Shows example format for all 30 fields
|
||||
|
||||
**Prompt Length**: ~3000-4000 characters (includes context + instructions + examples)
|
||||
|
||||
### AI Schema
|
||||
|
||||
**Method**: `_build_schema()`
|
||||
|
||||
**Schema Structure**:
|
||||
- **Type**: OBJECT
|
||||
- **Properties**: 30 field definitions
|
||||
- Each field has: `type` (STRING/NUMBER/BOOLEAN), `description`
|
||||
- **Required**: All 30 fields
|
||||
- **Property Ordering**: `CORE_FIELDS` order (critical for consistent JSON output)
|
||||
|
||||
## API Calls and Subscription Checks
|
||||
|
||||
### API Call Flow
|
||||
|
||||
1. **Frontend → Backend**: `GET /api/content-planning/enhanced-strategies/onboarding-data`
|
||||
- **Authentication**: Required (Bearer token)
|
||||
- **User ID**: Extracted from token
|
||||
|
||||
2. **Backend → Database**: Multiple queries (see Database Tables section)
|
||||
- No API calls, only database queries
|
||||
|
||||
3. **Backend → AI Service** (if using AI):
|
||||
- **Service**: `AIServiceManager.execute_structured_json_call()`
|
||||
- **Provider**: Gemini (via `gemini_provider`)
|
||||
- **Method**: `main_text_generation()`
|
||||
- **Subscription Check**: Pre-flight validation using `user_id`
|
||||
- **Pre-flight Check**: Validates subscription limits before API call
|
||||
|
||||
### Subscription and Pre-flight Checks
|
||||
|
||||
**File**: `backend/services/ai_service_manager.py` (referenced)
|
||||
|
||||
**Checks Performed**:
|
||||
1. **Subscription Validation**:
|
||||
- Checks user's subscription tier
|
||||
- Validates API usage limits
|
||||
- Uses `user_id` for subscription lookup
|
||||
|
||||
2. **Pre-flight Check**:
|
||||
- Validates request before making API call
|
||||
- Checks rate limits
|
||||
- Validates token usage estimate
|
||||
|
||||
3. **Post-call Tracking**:
|
||||
- Tracks token usage
|
||||
- Updates subscription usage stats
|
||||
- Records API calls
|
||||
|
||||
### Number of API Calls
|
||||
|
||||
**Standard Flow** (default - NO AI):
|
||||
- **AI Calls**: 0 (NO AI USED)
|
||||
- **API Calls**: 0 (only database queries)
|
||||
- **Database Queries**: 4-5 (OnboardingSession, WebsiteAnalysis, ResearchPreferences, APIKey)
|
||||
- **Token Usage**: 0 tokens
|
||||
- **Speed**: ~100-200ms
|
||||
- **Used in**: Standard "Auto-Populate Fields" flow
|
||||
|
||||
**AI-Enhanced Flow** (optional - WITH AI - refresh flow only):
|
||||
- **AI Calls**: 1-3 (depending on retries)
|
||||
- Initial call: 1
|
||||
- Retries (if success rate < 80%): up to 2 more
|
||||
- **Database Queries**: 4-5 (same as standard flow)
|
||||
- **AI Provider**: Gemini (via `gemini_provider`)
|
||||
- **Token Usage**: 3500-5000 tokens per call (up to 15,000 with retries)
|
||||
- **Speed**: ~2-5 seconds per call
|
||||
- **Used in**: "Refresh Data (AI)" flow only (optional)
|
||||
|
||||
### Token Usage
|
||||
|
||||
**Estimated Tokens per Call**:
|
||||
- **Input**: ~2000-3000 tokens (prompt + context)
|
||||
- **Output**: ~1500-2000 tokens (30 fields JSON)
|
||||
- **Total**: ~3500-5000 tokens per call
|
||||
|
||||
**With Retries** (max 2 retries):
|
||||
- **Best Case**: 3500-5000 tokens (1 call, 100% success)
|
||||
- **Worst Case**: 10500-15000 tokens (3 calls, <80% success each time)
|
||||
|
||||
## Summary
|
||||
|
||||
### Key Points
|
||||
|
||||
1. **User Consent**: Auto-population now requires explicit user consent via modal
|
||||
2. **No Auto-Trigger**: Removed automatic `useEffect` that triggered on mount
|
||||
3. **Database First**: Standard autofill uses only database queries (NO AI - 0 tokens)
|
||||
4. **AI Optional**: AI is only used in refresh flows (NOT standard auto-population)
|
||||
5. **30 Fields**: All 30 strategic input fields are mapped from onboarding data
|
||||
- **80%+ are direct database mappings** (no AI needed)
|
||||
- **Standard autofill can fill most fields** from database queries
|
||||
- **AI autofill is optional** (only for personalization in refresh flows)
|
||||
6. **Subscription Checks**: All AI calls use `user_id` for subscription and pre-flight checks
|
||||
7. **Token Usage**:
|
||||
- **Standard autofill**: 0 tokens (database queries only)
|
||||
- **AI autofill (refresh)**: 3500-5000 tokens per call (up to 15,000 with retries)
|
||||
8. **Architecture**: Standard autofill is the default (fast, free). AI autofill is optional (personalized, costs tokens).
|
||||
|
||||
### Data Sources Priority
|
||||
|
||||
1. **Website Analysis** (highest priority)
|
||||
2. **Research Preferences**
|
||||
3. **Onboarding Session**
|
||||
4. **API Keys** (for capabilities only)
|
||||
5. **AI Generation** (only in refresh flows)
|
||||
|
||||
### Performance Considerations
|
||||
|
||||
- **Standard Flow**: Fast (database queries only, ~100-200ms)
|
||||
- **AI-Enhanced Flow**: Slower (AI API calls, ~2-5 seconds per call)
|
||||
- **Retries**: Can add up to 2x-3x latency if retries are needed
|
||||
- **Caching**: Onboarding data is cached (TTL: 30 minutes)
|
||||
Reference in New Issue
Block a user