Added documentation for the auto-population feature and the analytics integration.

2026-01-17 11:01:10 +05:30
parent 8193cdba67
commit 1db10ccd0f
61 changed files with 6773 additions and 579 deletions
--- a/backend/api/content_planning/docs/ARCHITECTURE_REVIEW_30_INPUTS_AND_AI_AUTOFILL.md
+++ b/backend/api/content_planning/docs/ARCHITECTURE_REVIEW_30_INPUTS_AND_AI_AUTOFILL.md
@@ -0,0 +1,471 @@
+# Architecture Review: 30 Inputs and AI Autofill
+
+## Executive Summary
+
+This document reviews the architectural decisions around the 30 strategic input fields and the AI autofill feature, addressing critical questions about redundancy, necessity, and optimization.
+
+## Key Questions Addressed
+
+1. **Why are 30 inputs needed?** Are they required for content strategy generation?
+2. **Are 30 inputs direct database mappings or personalized for strategy generation?**
+3. **Is AI autofill redundant?** Given that strategy generation already uses AI to analyze onboarding data?
+4. **Should AI autofill be removed?** If database queries can do the same job?
+
+---
+
+## 1. Why 30 Inputs Are Needed
+
+### Database Schema Requirement
+
+The 30 fields are **stored as columns** in the `EnhancedContentStrategy` model:
+
+```python
+class EnhancedContentStrategy(Base):
+    # Business Context (8 fields)
+    business_objectives = Column(JSON, nullable=True)
+    target_metrics = Column(JSON, nullable=True)
+    content_budget = Column(Float, nullable=True)
+    team_size = Column(Integer, nullable=True)
+    implementation_timeline = Column(String, nullable=True)
+    market_share = Column(Float, nullable=True)
+    competitive_position = Column(String, nullable=True)
+    performance_metrics = Column(JSON, nullable=True)
+    
+    # Audience Intelligence (6 fields)
+    content_preferences = Column(JSON, nullable=True)
+    consumption_patterns = Column(JSON, nullable=True)
+    audience_pain_points = Column(JSON, nullable=True)
+    buying_journey = Column(JSON, nullable=True)
+    seasonal_trends = Column(JSON, nullable=True)
+    engagement_metrics = Column(JSON, nullable=True)
+    
+    # ... (20 more fields)
+```
+
+### Strategy Generation Flow
+
+**Critical Finding**: The 30 fields are the **INPUT schema** for strategy generation, not the output:
+
+```
+User Fills 30 Fields (Frontend)
+  ↓
+Strategy Created with 30 Fields (Database)
+  ↓
+AI Recommendations Generated FROM 30 Fields (Not from onboarding data)
+  ↓
+Strategy Object Stored (with 30 fields + AI recommendations)
+```
+
+**Code Evidence**: `backend/api/content_planning/services/content_strategy/core/strategy_service.py`
+
+```python
+async def create_enhanced_strategy(self, strategy_data: Dict[str, Any], db: Session):
+    # Creates strategy with 30 fields from strategy_data
+    enhanced_strategy = EnhancedContentStrategy(
+        business_objectives=strategy_data.get('business_objectives'),
+        target_metrics=strategy_data.get('target_metrics'),
+        # ... all 30 fields
+    )
+    
+    # Save to database
+    db.add(enhanced_strategy)
+    db.commit()
+    
+    # THEN generate AI recommendations FROM the strategy object
+    await self.strategy_analyzer.generate_comprehensive_ai_recommendations(
+        enhanced_strategy,  # ← Uses the strategy object (30 fields), not onboarding data
+        db,
+        user_id=str(user_id)
+    )
+```
+
+**AI Recommendations Use Strategy Fields**: `backend/api/content_planning/services/content_strategy/ai_analysis/strategy_analyzer.py`
+
+```python
+def create_specialized_prompt(self, strategy: EnhancedContentStrategy, analysis_type: str):
+    base_context = f"""
+    Business Context:
+    - Industry: {strategy.industry}
+    - Business Objectives: {strategy.business_objectives}  # ← From strategy object
+    - Target Metrics: {strategy.target_metrics}            # ← From strategy object
+    # ... all 30 fields from strategy object
+    """
+```
+
+### Conclusion: 30 Fields ARE Required
+
+**Yes, the 30 fields are required** because:
+1. They are the **database schema** for storing strategies
+2. They are the **input structure** for AI recommendations
+3. AI recommendations are generated **FROM these 30 fields**, not from onboarding data directly
+4. They provide a **structured interface** for users to define their strategy
+
+---
+
+## 2. Are 30 Inputs Direct Database Mappings or Personalized?
+
+### Field Mapping Analysis
+
+**File**: `backend/api/content_planning/services/content_strategy/autofill/transformer.py`
+
+#### Direct Mappings (No Transformation)
+
+Most fields are **direct mappings** from onboarding data:
+
+```python
+# Business Context - Direct Mappings
+business_objectives → website.content_goals                    # Direct
+target_metrics → website.target_metrics                        # Direct
+content_budget → session.budget                                # Direct
+team_size → session.team_size                                  # Direct
+implementation_timeline → session.timeline                     # Direct
+performance_metrics → website.performance_metrics              # Direct
+
+# Audience Intelligence - Direct Mappings
+content_preferences → research.content_preferences             # Direct
+consumption_patterns → research.audience_intelligence.consumption_patterns  # Direct
+audience_pain_points → research.audience_intelligence.pain_points  # Direct
+buying_journey → research.audience_intelligence.buying_journey  # Direct
+
+# Competitive Intelligence - Direct Mappings
+top_competitors → website.competitors                          # Direct
+market_gaps → website.content_gaps                            # Direct
+industry_trends → research.industry_focus                      # Direct
+emerging_trends → research.trend_analysis                      # Direct
+
+# Content Strategy - Direct Mappings
+preferred_formats → research.content_types                     # Direct
+content_frequency → research.content_calendar.frequency        # Direct
+optimal_timing → research.content_calendar.timing              # Direct
+editorial_guidelines → website.style_guidelines                # Direct
+brand_voice → website.writing_style.tone                       # Direct
+```
+
+#### Simple Derivations (Minimal Transformation)
+
+Some fields require **simple derivations**:
+
+```python
+# Derived from existing data (no AI needed)
+market_share → derived from performance_metrics                # Simple calculation
+competitive_position → derived from competitors                # Simple categorization
+engagement_metrics → derived from performance_metrics          # Simple extraction
+traffic_sources → derived from performance_metrics             # Simple extraction
+conversion_rates → performance_metrics.conversion_rate        # Simple extraction
+content_roi_targets → derived from budget + performance_metrics  # Simple calculation
+ab_testing_capabilities → derived from team_size               # Simple boolean logic
+content_mix → derived from content_types + content_goals       # Simple mapping
+quality_metrics → derived from performance_metrics             # Simple extraction
+```
+
+#### Hardcoded Defaults (No Personalization)
+
+Some fields use **hardcoded defaults** (not personalized):
+
+```python
+seasonal_trends → ['Q1: Planning', 'Q2: Execution', 'Q3: Optimization', 'Q4: Review']  # Hardcoded
+competitor_content_strategies → ['Educational content', 'Case studies', 'Thought leadership']  # Hardcoded
+```
+
+### Standard Flow Does NOT Use AI
+
+**Critical Finding**: The standard `AutoFillService.get_autofill()` does **NOT use AI**:
+
+```python
+# backend/api/content_planning/services/content_strategy/autofill/autofill_service.py
+
+async def get_autofill(self, user_id: int):
+    # Step 1: Get raw onboarding data (database queries only)
+    raw_data = await self.integration.process_onboarding_data(user_id, db)
+    
+    # Step 2: Normalize data (no AI)
+    normalized_data = self._normalize_data(raw_data)
+    
+    # Step 3: Transform to fields (no AI - just mapping)
+    fields = self._transform_to_fields(normalized_data)
+    
+    # Step 4: Return fields
+    return {
+        'fields': fields,
+        'sources': sources,
+        'meta': {
+            'ai_used': False,  # ← Standard flow does NOT use AI
+            'ai_overrides_count': 0
+        }
+    }
+```
+
+### Conclusion: Fields Are Mostly Direct Mappings
+
+**Most fields (80%+) are direct database mappings or simple derivations:**
+- **Direct mappings**: ~18 fields (60%)
+- **Simple derivations**: ~10 fields (33%)
+- **Hardcoded defaults**: ~2 fields (7%)
+- **AI-generated**: 0 fields in standard flow
+
+**AI is only used in "refresh" flows** (`AIStructuredAutofillService`), not in standard autofill.
+
+---
+
+## 3. Is AI Autofill Redundant?
+
+### Current Architecture
+
+**Standard Autofill Flow** (No AI):
+```
+Onboarding Data (Database)
+  ↓
+AutoFillService.get_autofill()
+  ↓
+Transform to 30 Fields (Mapping/Transformation)
+  ↓
+Return Fields to Frontend
+```
+
+**AI Autofill Flow** (Refresh Only):
+```
+Onboarding Data (Database)
+  ↓
+AIStructuredAutofillService.generate_autofill_fields()
+  ↓
+AI Call (Gemini) - 3500-5000 tokens
+  ↓
+Generate 30 Fields (AI-generated)
+  ↓
+Return Fields to Frontend
+```
+
+**Strategy Generation Flow** (After 30 Fields Are Filled):
+```
+30 Fields (From User Input)
+  ↓
+Create EnhancedContentStrategy (Database)
+  ↓
+generate_comprehensive_ai_recommendations()
+  ↓
+AI Call (Gemini) - Analyzes 30 Fields
+  ↓
+Generate AI Recommendations
+```
+
+### Redundancy Analysis
+
+#### Question: Is AI autofill redundant?
+
+**Argument FOR redundancy:**
+1. ✅ Standard autofill can fill 80%+ fields from database queries
+2. ✅ AI autofill uses the same onboarding data that standard autofill uses
+3. ✅ Strategy generation already uses AI to analyze the 30 fields
+4. ✅ AI autofill costs 3500-5000 tokens per call (with retries: up to 15,000 tokens)
+
+**Argument AGAINST redundancy:**
+1. ⚠️ AI autofill can **personalize** fields that are missing or generic
+2. ⚠️ AI autofill can **infer** fields from context (e.g., market_gaps from competitors)
+3. ⚠️ AI autofill can **transform** unstructured onboarding data into structured fields
+4. ⚠️ AI autofill is only used in "refresh" flows (not standard flow)
+
+### Key Distinction
+
+**Standard autofill (database queries):**
+- Fills fields that **exist** in onboarding data
+- Uses **direct mappings** and simple derivations
+- **No AI calls** (0 tokens)
+- **Fast** (~100-200ms)
+
+**AI autofill (refresh flow):**
+- Fills fields that **don't exist** in onboarding data
+- **Personalizes** generic/default values
+- **Uses AI** (3500-5000 tokens per call)
+- **Slower** (~2-5 seconds per call)
+
+### Conclusion: AI Autofill is Partially Redundant
+
+**AI autofill is redundant IF:**
+- Standard autofill can fill all 30 fields from database queries
+- Users are okay with generic/default values for missing fields
+- Cost optimization is prioritized over personalization
+
+**AI autofill is NOT redundant IF:**
+- Onboarding data is incomplete (missing fields)
+- Users want personalized values (not generic defaults)
+- Personalization improves user experience
+
+---
+
+## 4. Recommendation: Should AI Autofill Be Removed?
+
+### Option 1: Keep Both (Current Architecture) ✅ **RECOMMENDED**
+
+**Pros:**
+- Standard autofill: Fast, free, works for complete onboarding data
+- AI autofill: Personalized, works for incomplete onboarding data
+- User choice: Standard autofill by default, AI autofill for refresh
+
+**Cons:**
+- More complexity (two flows)
+- AI autofill costs tokens (only in refresh flows)
+
+**Implementation:**
+- Keep standard autofill as default (database queries only)
+- Keep AI autofill as "Refresh with AI" option (optional)
+- Make it clear to users when AI is used vs. database queries
+
+### Option 2: Remove AI Autofill (Database Queries Only) ⚠️ **NOT RECOMMENDED**
+
+**Pros:**
+- Simpler architecture (one flow)
+- No AI costs for autofill
+- Faster (database queries only)
+
+**Cons:**
+- Less personalization (generic defaults for missing fields)
+- Poor user experience if onboarding data is incomplete
+- Users may need to manually fill missing fields
+
+**When to consider:**
+- If onboarding data is always complete
+- If personalization is not a priority
+- If cost optimization is critical
+
+### Option 3: Remove Standard Autofill (AI Only) ❌ **NOT RECOMMENDED**
+
+**Pros:**
+- Maximum personalization
+- Consistent AI-generated values
+
+**Cons:**
+- High cost (AI call for every autofill)
+- Slower (2-5 seconds per call)
+- Unnecessary if onboarding data is complete
+
+**When to consider:**
+- If onboarding data is always incomplete
+- If personalization is critical
+- If cost is not a concern
+
+---
+
+## 5. Final Recommendations
+
+### Recommended Architecture
+
+**Keep current architecture with clarifications:**
+
+1. **Standard Autofill (Default)** - Database queries only:
+   - Use `AutoFillService.get_autofill()` (no AI)
+   - Fill fields from onboarding data (direct mappings + derivations)
+   - Use generic defaults for missing fields
+   - **Cost**: 0 tokens, **Speed**: ~100-200ms
+
+2. **AI Autofill (Optional - Refresh Flow)** - AI generation:
+   - Use `AIStructuredAutofillService.generate_autofill_fields()` (with AI)
+   - Personalize fields that are missing or generic
+   - **Cost**: 3500-5000 tokens (up to 15,000 with retries), **Speed**: ~2-5 seconds
+
+3. **Strategy Generation (After 30 Fields)** - AI recommendations:
+   - Uses 30 fields (from user input or autofill)
+   - Generates AI recommendations FROM 30 fields
+   - **Cost**: Separate AI call, **Speed**: ~2-5 seconds
+
+### Key Insights
+
+1. **30 fields ARE required** - They're the database schema and input for AI recommendations
+2. **Most fields (80%+) are direct mappings** - Standard autofill can fill them from database queries
+3. **AI autofill is optional** - Only used in "refresh" flows, not standard autofill
+4. **Strategy generation uses 30 fields** - Not onboarding data directly
+5. **AI autofill is partially redundant** - But provides personalization value when onboarding data is incomplete
+
+### Action Items
+
+1. ✅ **Keep current architecture** (standard autofill + optional AI autofill)
+2. ✅ **Clarify documentation** - Make it clear when AI is used vs. database queries
+3. ✅ **Update walkthrough document** - Clarify that standard autofill does NOT use AI
+4. ✅ **Consider cost optimization** - Only use AI autofill when necessary (incomplete data)
+
+---
+
+## 6. Updated Flow Diagrams
+
+### Standard Autofill Flow (No AI)
+
+```
+User Clicks "Auto-Populate Fields"
+  ↓
+Frontend: API Call to /onboarding-data
+  ↓
+Backend: AutoFillService.get_autofill()
+  ↓
+OnboardingDataIntegrationService.process_onboarding_data() (Database Queries)
+  ↓
+Transform to 30 Fields (Mapping/Transformation - NO AI)
+  ↓
+Return Fields to Frontend (Database queries only, 0 tokens)
+```
+
+### AI Autofill Flow (Refresh Only)
+
+```
+User Clicks "Refresh Data (AI)"
+  ↓
+Frontend: API Call to /autofill-refresh
+  ↓
+Backend: AIStructuredAutofillService.generate_autofill_fields()
+  ↓
+OnboardingDataIntegrationService.process_onboarding_data() (Database Queries)
+  ↓
+AI Call (Gemini) - Generate 30 Fields (3500-5000 tokens)
+  ↓
+Return Fields to Frontend (AI-generated, personalized)
+```
+
+### Strategy Generation Flow (After 30 Fields)
+
+```
+User Fills 30 Fields (From autofill or manual input)
+  ↓
+Frontend: POST /create with strategy_data (30 fields)
+  ↓
+Backend: create_enhanced_strategy()
+  ↓
+Create EnhancedContentStrategy (Database - 30 fields stored)
+  ↓
+generate_comprehensive_ai_recommendations()
+  ↓
+AI Call (Gemini) - Analyze 30 Fields, Generate Recommendations
+  ↓
+Store AI Recommendations (Separate from 30 fields)
+```
+
+---
+
+## Summary
+
+### Answers to Key Questions
+
+1. **Why are 30 inputs needed?**
+   - ✅ They are the database schema for storing strategies
+   - ✅ They are the input structure for AI recommendations
+   - ✅ AI recommendations are generated FROM these 30 fields
+
+2. **Are 30 inputs direct mappings or personalized?**
+   - ✅ 80%+ are direct database mappings or simple derivations
+   - ✅ Standard autofill does NOT use AI (database queries only)
+   - ✅ AI autofill is only used in "refresh" flows (optional)
+
+3. **Is AI autofill redundant?**
+   - ⚠️ Partially redundant (standard autofill can fill 80%+ fields)
+   - ⚠️ But provides personalization value when onboarding data is incomplete
+   - ⚠️ Only used in "refresh" flows, not standard autofill
+
+4. **Should AI autofill be removed?**
+   - ✅ **NO** - Keep both standard autofill (default) and AI autofill (optional)
+   - ✅ Standard autofill: Fast, free, works for complete data
+   - ✅ AI autofill: Personalized, works for incomplete data
+   - ✅ User choice: Standard autofill by default, AI autofill for refresh
+
+### Final Recommendation
+
+**Keep current architecture** with better documentation:
+- Standard autofill (database queries) - Default, fast, free
+- AI autofill (refresh flow) - Optional, personalized, costs tokens
+- Strategy generation (AI recommendations) - Uses 30 fields, separate AI call
--- a/backend/api/content_planning/docs/AUTO_POPULATION_CODE_WALKTHROUGH.md
+++ b/backend/api/content_planning/docs/AUTO_POPULATION_CODE_WALKTHROUGH.md
@@ -0,0 +1,486 @@
+# Auto-Population Code Walkthrough
+
+## Overview
+
+This document provides a comprehensive code walkthrough of the auto-population feature that fills 30 strategy input fields using onboarding data and AI insights.
+
+## Table of Contents
+
+1. [Flow Overview](#flow-overview)
+2. [Frontend Flow](#frontend-flow)
+3. [Backend Flow](#backend-flow)
+4. [Database Tables Used](#database-tables-used)
+5. [Field Mapping](#field-mapping)
+6. [AI Integration](#ai-integration)
+7. [API Calls and Subscription Checks](#api-calls-and-subscription-checks)
+
+## Flow Overview
+
+### High-Level Flow
+
+```
+User Clicks "Auto-Populate Fields" 
+  ↓
+Frontend: AutoPopulationConsentModal (User Consent)
+  ↓
+Frontend: strategyBuilderStore.autoPopulateFromOnboarding()
+  ↓
+Frontend: API Call to /api/content-planning/enhanced-strategies/onboarding-data
+  ↓
+Backend: utility_endpoints.py → get_onboarding_data()
+  ↓
+Backend: EnhancedStrategyService._get_onboarding_data()
+  ↓
+Backend: DataProcessorService.get_onboarding_data()
+  ↓
+Backend: AutoFillService.get_autofill()
+  ↓
+Backend: OnboardingDataIntegrationService.process_onboarding_data() (Database Queries)
+  ↓
+Backend: AutoFillService.get_autofill() → Normalizers + Transformers
+  ↓
+Backend: AIStructuredAutofillService.generate_autofill_fields() (AI Generation)
+  ↓
+Backend: AIServiceManager.execute_structured_json_call() (AI API Call)
+  ↓
+Backend: Response with 30 fields
+  ↓
+Frontend: Store fields in strategyBuilderStore
+  ↓
+Frontend: Display fields in ContentStrategyBuilder
+```
+
+## Frontend Flow
+
+### 1. User Consent Modal
+
+**File**: `frontend/src/components/ContentPlanningDashboard/components/AutoPopulationConsentModal.tsx`
+
+- **Purpose**: Explains auto-population to non-technical users (content creators, digital marketers, solopreneurs)
+- **Features**:
+  - Clear explanation of what auto-population does
+  - Benefits (Instant Setup, AI-Powered Insights, Your Data Your Control, Always Editable)
+  - Data sources used (Website Analysis, Research Preferences, Business Details, AI Analysis)
+  - Two buttons: "Skip Auto-Population" (Cancel) and "Auto-Populate Fields" (Confirm)
+
+### 2. ContentStrategyBuilder Component
+
+**File**: `frontend/src/components/ContentPlanningDashboard/components/ContentStrategyBuilder.tsx`
+
+**Key Changes**:
+- Removed automatic `useEffect` that triggered auto-population on mount
+- Added consent modal state: `showAutoPopulationConsentModal`
+- Added consent tracking: `autoPopulateConsentAsked` (persisted in sessionStorage)
+- Modal shows on first mount (with 500ms delay for rendering)
+- Auto-population only triggers after user clicks "Auto-Populate Fields"
+
+**State Management**:
+```typescript
+const [showAutoPopulationConsentModal, setShowAutoPopulationConsentModal] = useState(false);
+const [autoPopulateConsentAsked, setAutoPopulateConsentAsked] = useState(() => {
+  return sessionStorage.getItem('autoPopulateConsentAsked') === 'true';
+});
+const [autoPopulateAttempted, setAutoPopulateAttempted] = useState(false);
+```
+
+**Consent Handlers**:
+- `handleAutoPopulationConsent()`: Triggers auto-population, saves consent to sessionStorage
+- `handleAutoPopulationCancel()`: Skips auto-population, saves consent to sessionStorage
+
+### 3. Strategy Builder Store
+
+**File**: `frontend/src/stores/strategyBuilderStore.ts`
+
+**Function**: `autoPopulateFromOnboarding(forceRefresh?: boolean)`
+
+**Steps**:
+1. **Global Protection**: Checks `isAutoPopulating` flag to prevent multiple simultaneous calls
+2. **Validation**: Checks if already populated (unless `forceRefresh`)
+3. **API Call**: Calls `contentPlanningApi.getOnboardingData()`
+4. **Response Processing**:
+   - Extracts `fields`, `sources`, `input_data_points` from response
+   - Validates AI generation success (`meta.ai_used` and `meta.ai_overrides_count > 0`)
+   - Transforms field values and stores in:
+     - `fieldValues`: Form data
+     - `autoPopulatedFields`: Tracking which fields were auto-populated
+     - `personalizationData`: User data used
+     - `confidenceScores`: AI confidence scores
+5. **State Update**: Updates store with populated fields
+
+**API Endpoint**: `GET /api/content-planning/enhanced-strategies/onboarding-data`
+
+## Backend Flow
+
+### 1. API Endpoint
+
+**File**: `backend/api/content_planning/api/content_strategy/endpoints/utility_endpoints.py`
+
+**Endpoint**: `GET /onboarding-data`
+
+**Authentication**: Required (`get_current_user`)
+
+**Flow**:
+1. Extracts `user_id` from authenticated token
+2. Creates `EnhancedStrategyDBService` and `EnhancedStrategyService`
+3. Calls `enhanced_service._get_onboarding_data(user_id)`
+4. Returns response via `ResponseBuilder.create_success_response()`
+
+### 2. Enhanced Strategy Service
+
+**File**: `backend/api/content_planning/services/enhanced_strategy_service.py`
+
+**Method**: `_get_onboarding_data(user_id: int)`
+
+**Flow**:
+1. Calls `core_service.data_processor_service.get_onboarding_data(user_id)`
+2. Returns processed onboarding data
+
+### 3. Data Processor Service
+
+**File**: `backend/api/content_planning/services/content_strategy/utils/data_processors.py`
+
+**Class**: `DataProcessorService`
+
+**Method**: `async def get_onboarding_data(user_id: int)`
+
+**Flow**:
+1. Creates `AutoFillService(db)` instance
+2. Calls `service.get_autofill(user_id)`
+3. Returns comprehensive onboarding data payload
+
+### 4. AutoFill Service
+
+**File**: `backend/api/content_planning/services/content_strategy/autofill/autofill_service.py`
+
+**Class**: `AutoFillService`
+
+**Method**: `async def get_autofill(user_id: int)`
+
+**Steps**:
+1. **Integration**: Calls `integration.process_onboarding_data(user_id, db)` to collect raw data
+2. **Normalization**: 
+   - `normalize_website_analysis(website_raw)`
+   - `normalize_research_preferences(research_raw)`
+   - `normalize_api_keys(api_raw)`
+3. **Quality Assessment**:
+   - `calculate_quality_scores_from_raw()`
+   - `calculate_confidence_from_raw()`
+   - `calculate_data_freshness()`
+4. **Transformation**: Calls `transform_to_fields()` to map to 30 frontend fields
+5. **Transparency**: 
+   - `build_data_sources_map()` (field → data source mapping)
+   - `build_input_data_points()` (detailed input data points)
+6. **Validation**: Validates output structure
+7. **Return**: Returns payload with fields, sources, quality scores, confidence levels, data freshness, input data points
+
+**Note**: This service does NOT use AI. It only transforms existing onboarding data.
+
+### 5. Onboarding Data Integration Service
+
+**File**: `backend/api/content_planning/services/content_strategy/onboarding/data_integration.py`
+
+**Class**: `OnboardingDataIntegrationService`
+
+**Method**: `async def process_onboarding_data(user_id: int, db: Session)`
+
+**Database Queries**:
+1. **Website Analysis**:
+   - Queries `OnboardingSession` for latest session
+   - Queries `WebsiteAnalysis` for latest analysis
+   - Returns: `website_url`, `content_goals`, `target_metrics`, `performance_metrics`, `competitors`, `target_audience`, `writing_style`, etc.
+
+2. **Research Preferences**:
+   - Queries `ResearchPreferences` for session
+   - Returns: `research_depth`, `content_types`, `target_audience`, `audience_research`, `content_preferences`, etc.
+
+3. **API Keys**:
+   - Queries `APIKey` for user
+   - Returns: `providers`, `total_keys`, available services
+
+4. **Onboarding Session**:
+   - Queries `OnboardingSession` for user
+   - Returns: `business_size`, `budget`, `team_size`, `timeline`, `region`, etc.
+
+**Returns**: Integrated data dictionary with all sources
+
+## Database Tables Used
+
+### 1. `onboarding_sessions`
+
+**Columns Used**:
+- `user_id` (filter)
+- `id` (join key)
+- `updated_at` (ordering)
+- `business_size`, `budget`, `team_size`, `timeline`, `region`, `progress`
+
+### 2. `website_analyses`
+
+**Columns Used**:
+- `session_id` (join key)
+- `updated_at` (ordering)
+- `website_url`, `status`, `content_goals`, `target_metrics`, `performance_metrics`, `competitors`, `target_audience`, `writing_style`, `content_type`, `content_characteristics`, `recommended_settings`, `style_guidelines`
+
+### 3. `research_preferences`
+
+**Columns Used**:
+- `session_id` (join key)
+- `research_depth`, `content_types`, `target_audience`, `audience_research`, `content_preferences`, `auto_research`, `factual_content`
+
+### 4. `api_keys`
+
+**Columns Used**:
+- `user_id` (filter)
+- `provider` (aggregation)
+- `is_active` (filter)
+
+## Field Mapping
+
+### 30 Fields Mapped to Onboarding Data
+
+**File**: `backend/api/content_planning/services/content_strategy/autofill/transformer.py`
+
+**Function**: `transform_to_fields()`
+
+#### Business Context (8 fields)
+1. **business_objectives** → `website.content_goals`
+2. **target_metrics** → `website.target_metrics` or `website.performance_metrics`
+3. **content_budget** → `website.content_budget` or `session.budget`
+4. **team_size** → `website.team_size` or `session.team_size`
+5. **implementation_timeline** → `website.implementation_timeline` or `session.timeline`
+6. **market_share** → `website.market_share` or derived from `performance_metrics`
+7. **competitive_position** → `website.competitors` (derived)
+8. **performance_metrics** → `website.performance_metrics`
+
+#### Audience Intelligence (6 fields)
+9. **content_preferences** → `research.content_preferences`
+10. **consumption_patterns** → `research.audience_intelligence.consumption_patterns`
+11. **audience_pain_points** → `research.audience_intelligence.pain_points`
+12. **buying_journey** → `research.audience_intelligence.buying_journey`
+13. **seasonal_trends** → Default: `['Q1: Planning', 'Q2: Execution', 'Q3: Optimization', 'Q4: Review']`
+14. **engagement_metrics** → Derived from `website.performance_metrics`
+
+#### Competitive Intelligence (5 fields)
+15. **top_competitors** → `website.competitors`
+16. **competitor_content_strategies** → Default: `['Educational content', 'Case studies', 'Thought leadership']`
+17. **market_gaps** → `website.content_gaps`
+18. **industry_trends** → `research.industry_focus`
+19. **emerging_trends** → `research.trend_analysis`
+
+#### Content Strategy (7 fields)
+20. **preferred_formats** → `research.content_types`
+21. **content_mix** → Derived from `research.content_types` and `website.content_goals`
+22. **content_frequency** → `research.content_calendar.frequency`
+23. **optimal_timing** → `research.content_calendar.timing`
+24. **quality_metrics** → Derived from `website.performance_metrics`
+25. **editorial_guidelines** → `website.style_guidelines`
+26. **brand_voice** → `website.writing_style.tone` or `session.brand_voice`
+
+#### Performance & Analytics (4 fields)
+27. **traffic_sources** → Derived from `website.performance_metrics`
+28. **conversion_rates** → `website.performance_metrics.conversion_rate`
+29. **content_roi_targets** → Derived from `session.budget` and `performance_metrics`
+30. **ab_testing_capabilities** → Derived from `session.team_size`
+
+## AI Integration
+
+### When AI is Used
+
+**File**: `backend/api/content_planning/services/content_strategy/autofill/ai_refresh.py`
+
+**Class**: `AutoFillRefreshService`
+
+**Critical Clarification**: The standard `AutoFillService.get_autofill()` does **NOT use AI**. It only transforms existing onboarding data using database queries and simple mappings.
+
+**Standard Autofill (Default)**: 
+- Uses `AutoFillService.get_autofill()` (NO AI)
+- Database queries only (0 tokens)
+- Direct mappings and simple derivations (~80%+ fields)
+- Fast (~100-200ms)
+- Used in standard "Auto-Populate Fields" flow
+
+**AI Autofill (Optional - Refresh Flow)**:
+- Uses `AIStructuredAutofillService.generate_autofill_fields()` (WITH AI)
+- AI generation (3500-5000 tokens per call, up to 15,000 with retries)
+- Personalized values for missing/incomplete fields
+- Slower (~2-5 seconds per call)
+- Used in "Refresh Data (AI)" flow only
+
+**AI is used in**:
+- `AutoFillRefreshService.build_fresh_payload()` (for refresh flows)
+- `AIStructuredAutofillService.generate_autofill_fields()` (for AI-only generation)
+
+### AI Service
+
+**File**: `backend/api/content_planning/services/content_strategy/autofill/ai_structured_autofill.py`
+
+**Class**: `AIStructuredAutofillService`
+
+**Method**: `async def generate_autofill_fields(user_id: int, context: Dict[str, Any])`
+
+**Flow**:
+1. **Context Summary**: Builds personalized context from onboarding data
+2. **Schema**: Builds JSON schema for 30 fields
+3. **Prompt**: Builds personalized prompt with user's website URL, industry, business size, writing tone, target audience, etc.
+4. **AI Call**: Calls `self.ai.execute_structured_json_call()`
+   - **Service Type**: `AIServiceType.STRATEGIC_INTELLIGENCE`
+   - **Prompt**: Personalized prompt with user context
+   - **Schema**: JSON schema with 30 field definitions
+5. **Retry Logic**: Up to 2 retries if success rate < 80% or missing fields > 6
+6. **Normalization**: Normalizes values (numbers, booleans, select options, arrays)
+7. **Validation**: Ensures all 30 fields are populated
+8. **Return**: Returns fields with metadata (ai_used, ai_overrides_count, success_rate, attempts)
+
+### AI Service Manager
+
+**File**: `backend/services/ai_service_manager.py` (referenced but not in content_planning)
+
+**Method**: `execute_structured_json_call()`
+
+**Flow**:
+1. Gets AI service (via `get_service_manager()`)
+2. Calls `main_text_generation()` with:
+   - Prompt
+   - Schema (JSON structure)
+   - User ID (for subscription checks)
+3. **Subscription Check**: Uses `user_id` for pre-flight subscription validation
+4. **Pre-flight Check**: Validates subscription limits before API call
+5. **API Call**: Makes structured JSON call to AI provider (Gemini)
+6. **Response**: Returns structured JSON with 30 fields
+
+### AI Prompts
+
+**File**: `backend/api/content_planning/services/content_strategy/autofill/ai_structured_autofill.py`
+
+**Method**: `_build_prompt(context_summary: Dict[str, Any])`
+
+**Prompt Structure**:
+1. **Personalized Context**: 
+   - User profile (website URL, business size, region)
+   - Content analysis (writing tone, content type, target demographics)
+   - Audience insights (pain points, preferences, industry focus)
+   - AI recommendations (recommended tone, content type, style guidelines)
+   - Research configuration (research depth, content types, auto research)
+   - API capabilities (available services, providers)
+
+2. **Instructions**:
+   - Generate 30 fields personalized for user's website
+   - Avoid generic placeholder values
+   - Use real insights from website analysis
+   - Make each field specific to user's business
+
+3. **Field Examples**: Shows example format for all 30 fields
+
+**Prompt Length**: ~3000-4000 characters (includes context + instructions + examples)
+
+### AI Schema
+
+**Method**: `_build_schema()`
+
+**Schema Structure**:
+- **Type**: OBJECT
+- **Properties**: 30 field definitions
+  - Each field has: `type` (STRING/NUMBER/BOOLEAN), `description`
+- **Required**: All 30 fields
+- **Property Ordering**: `CORE_FIELDS` order (critical for consistent JSON output)
+
+## API Calls and Subscription Checks
+
+### API Call Flow
+
+1. **Frontend → Backend**: `GET /api/content-planning/enhanced-strategies/onboarding-data`
+   - **Authentication**: Required (Bearer token)
+   - **User ID**: Extracted from token
+
+2. **Backend → Database**: Multiple queries (see Database Tables section)
+   - No API calls, only database queries
+
+3. **Backend → AI Service** (if using AI):
+   - **Service**: `AIServiceManager.execute_structured_json_call()`
+   - **Provider**: Gemini (via `gemini_provider`)
+   - **Method**: `main_text_generation()`
+   - **Subscription Check**: Pre-flight validation using `user_id`
+   - **Pre-flight Check**: Validates subscription limits before API call
+
+### Subscription and Pre-flight Checks
+
+**File**: `backend/services/ai_service_manager.py` (referenced)
+
+**Checks Performed**:
+1. **Subscription Validation**: 
+   - Checks user's subscription tier
+   - Validates API usage limits
+   - Uses `user_id` for subscription lookup
+
+2. **Pre-flight Check**:
+   - Validates request before making API call
+   - Checks rate limits
+   - Validates token usage estimate
+
+3. **Post-call Tracking**:
+   - Tracks token usage
+   - Updates subscription usage stats
+   - Records API calls
+
+### Number of API Calls
+
+**Standard Flow** (default - NO AI):
+- **AI Calls**: 0 (NO AI USED)
+- **API Calls**: 0 (only database queries)
+- **Database Queries**: 4-5 (OnboardingSession, WebsiteAnalysis, ResearchPreferences, APIKey)
+- **Token Usage**: 0 tokens
+- **Speed**: ~100-200ms
+- **Used in**: Standard "Auto-Populate Fields" flow
+
+**AI-Enhanced Flow** (optional - WITH AI - refresh flow only):
+- **AI Calls**: 1-3 (depending on retries)
+  - Initial call: 1
+  - Retries (if success rate < 80%): up to 2 more
+- **Database Queries**: 4-5 (same as standard flow)
+- **AI Provider**: Gemini (via `gemini_provider`)
+- **Token Usage**: 3500-5000 tokens per call (up to 15,000 with retries)
+- **Speed**: ~2-5 seconds per call
+- **Used in**: "Refresh Data (AI)" flow only (optional)
+
+### Token Usage
+
+**Estimated Tokens per Call**:
+- **Input**: ~2000-3000 tokens (prompt + context)
+- **Output**: ~1500-2000 tokens (30 fields JSON)
+- **Total**: ~3500-5000 tokens per call
+
+**With Retries** (max 2 retries):
+- **Best Case**: 3500-5000 tokens (1 call, 100% success)
+- **Worst Case**: 10500-15000 tokens (3 calls, <80% success each time)
+
+## Summary
+
+### Key Points
+
+1. **User Consent**: Auto-population now requires explicit user consent via modal
+2. **No Auto-Trigger**: Removed automatic `useEffect` that triggered on mount
+3. **Database First**: Standard autofill uses only database queries (NO AI - 0 tokens)
+4. **AI Optional**: AI is only used in refresh flows (NOT standard auto-population)
+5. **30 Fields**: All 30 strategic input fields are mapped from onboarding data
+   - **80%+ are direct database mappings** (no AI needed)
+   - **Standard autofill can fill most fields** from database queries
+   - **AI autofill is optional** (only for personalization in refresh flows)
+6. **Subscription Checks**: All AI calls use `user_id` for subscription and pre-flight checks
+7. **Token Usage**: 
+   - **Standard autofill**: 0 tokens (database queries only)
+   - **AI autofill (refresh)**: 3500-5000 tokens per call (up to 15,000 with retries)
+8. **Architecture**: Standard autofill is the default (fast, free). AI autofill is optional (personalized, costs tokens).
+
+### Data Sources Priority
+
+1. **Website Analysis** (highest priority)
+2. **Research Preferences**
+3. **Onboarding Session**
+4. **API Keys** (for capabilities only)
+5. **AI Generation** (only in refresh flows)
+
+### Performance Considerations
+
+- **Standard Flow**: Fast (database queries only, ~100-200ms)
+- **AI-Enhanced Flow**: Slower (AI API calls, ~2-5 seconds per call)
+- **Retries**: Can add up to 2x-3x latency if retries are needed
+- **Caching**: Onboarding data is cached (TTL: 30 minutes)
--- a/backend/api/content_planning/docs/PROVIDER_SWITCHING_AI_AUTOFILL.md
+++ b/backend/api/content_planning/docs/PROVIDER_SWITCHING_AI_AUTOFILL.md
@@ -0,0 +1,210 @@
+# Provider Switching for AI Autofill
+
+## Overview
+
+This document clarifies that AI autofill **already supports provider switching** via the `GPT_PROVIDER` environment variable, similar to how blog writer and story writer handle provider selection.
+
+## Current Architecture
+
+### AI Autofill Flow
+
+```
+AIStructuredAutofillService.generate_autofill_fields()
+  ↓
+AIServiceManager.execute_structured_json_call()
+  ↓
+AIServiceManager._call_llm_with_checks()
+  ↓
+llm_text_gen() from main_text_generation.py
+  ↓
+Provider Selection (based on GPT_PROVIDER env var)
+  ↓
+gemini_provider OR huggingface_provider
+```
+
+### Provider Switching Pattern
+
+**File**: `backend/services/ai_service_manager.py`
+
+The `AIServiceManager.execute_structured_json_call()` method already uses `llm_text_gen()` from `main_text_generation.py`, which supports provider switching:
+
+```python
+def _call_llm_with_checks(self, prompt: str, schema: Dict[str, Any], user_id: str):
+    """Call LLM through main_text_generation with subscription checks."""
+    from services.llm_providers.main_text_generation import llm_text_gen
+    
+    # Call through main_text_generation for subscription checks
+    result = llm_text_gen(
+        prompt=prompt,
+        json_struct=schema,
+        user_id=user_id  # Pass user_id for subscription checks
+    )
+    return result
+```
+
+**File**: `backend/services/llm_providers/main_text_generation.py`
+
+The `llm_text_gen()` function already supports provider switching via `GPT_PROVIDER` environment variable:
+
+```python
+def llm_text_gen(prompt: str, system_prompt: Optional[str] = None, json_struct: Optional[Dict[str, Any]] = None, user_id: str = None):
+    # Check for GPT_PROVIDER environment variable
+    env_provider = os.getenv('GPT_PROVIDER', '').lower()
+    if env_provider in ['gemini', 'google']:
+        gpt_provider = "google"
+        model = "gemini-2.0-flash-001"
+    elif env_provider in ['hf_response_api', 'huggingface', 'hf']:
+        gpt_provider = "huggingface"
+        model = "openai/gpt-oss-120b:groq"
+    
+    # Auto-detect based on available API keys if no env var
+    if not env_provider:
+        api_key_manager = APIKeyManager()
+        if api_key_manager.get_api_key("gemini"):
+            gpt_provider = "google"
+        elif api_key_manager.get_api_key("hf_token"):
+            gpt_provider = "huggingface"
+    
+    # Route to appropriate provider
+    if gpt_provider == "google":
+        if json_struct:
+            response_text = gemini_structured_json_response(...)
+        else:
+            response_text = gemini_text_response(...)
+    elif gpt_provider == "huggingface":
+        if json_struct:
+            response_text = huggingface_structured_json_response(...)
+        else:
+            response_text = huggingface_text_response(...)
+```
+
+## Comparison with Blog Writer and Story Writer
+
+### Blog Writer Pattern
+
+**File**: `backend/api/blog_writer/content/enhanced_content_generator.py`
+
+```python
+from services.llm_providers.main_text_generation import llm_text_gen
+
+async def generate_section(self, section: Any, research: Any, mode: str = "polished"):
+    # Provider-agnostic text generation (respect GPT_PROVIDER & circuit-breaker)
+    ai_resp = llm_text_gen(
+        prompt=prompt,
+        json_struct=None,
+        system_prompt=None,
+    )
+```
+
+### Story Writer Pattern
+
+Story writer follows the same pattern - uses `llm_text_gen()` from `main_text_generation.py` which respects `GPT_PROVIDER`.
+
+### AI Autofill Pattern
+
+**File**: `backend/api/content_planning/services/content_strategy/autofill/ai_structured_autofill.py`
+
+```python
+from services.ai_service_manager import AIServiceManager, AIServiceType
+
+class AIStructuredAutofillService:
+    def __init__(self):
+        self.ai = AIServiceManager()  # Uses AIServiceManager, not direct provider
+    
+    async def generate_autofill_fields(self, user_id: int, context: Dict[str, Any]):
+        result = await self.ai.execute_structured_json_call(
+            service_type=AIServiceType.STRATEGIC_INTELLIGENCE,
+            prompt=prompt,
+            schema=schema
+        )
+        # AIServiceManager routes to llm_text_gen() which respects GPT_PROVIDER
+```
+
+## Supported Providers
+
+### Google Gemini (Default)
+
+- **Environment Variable**: `GPT_PROVIDER=gemini` or `GPT_PROVIDER=google`
+- **Model**: `gemini-2.0-flash-001`
+- **Structured JSON**: `gemini_structured_json_response()`
+- **Text Generation**: `gemini_text_response()`
+
+### HuggingFace
+
+- **Environment Variable**: `GPT_PROVIDER=huggingface` or `GPT_PROVIDER=hf` or `GPT_PROVIDER=hf_response_api`
+- **Model**: `openai/gpt-oss-120b:groq`
+- **Structured JSON**: `huggingface_structured_json_response()`
+- **Text Generation**: `huggingface_text_response()`
+
+## Configuration
+
+### Environment Variable
+
+Set `GPT_PROVIDER` environment variable to control provider selection:
+
+```bash
+# Use Google Gemini
+export GPT_PROVIDER=gemini
+
+# Use HuggingFace
+export GPT_PROVIDER=huggingface
+```
+
+### Auto-Detection
+
+If `GPT_PROVIDER` is not set, the system auto-detects based on available API keys:
+
+1. **Gemini**: If `GEMINI_API_KEY` is configured, uses Gemini
+2. **HuggingFace**: If `HF_TOKEN` is configured and Gemini is not available, uses HuggingFace
+
+### API Key Configuration
+
+Ensure API keys are configured in the environment:
+
+```bash
+# For Gemini
+export GEMINI_API_KEY=your_gemini_api_key
+
+# For HuggingFace
+export HF_TOKEN=your_huggingface_token
+```
+
+## Key Points
+
+### ✅ Already Supported
+
+1. **Provider Switching**: AI autofill already supports provider switching via `GPT_PROVIDER` env var
+2. **Consistent Pattern**: Uses the same pattern as blog writer and story writer (`llm_text_gen()`)
+3. **No Hardcoding**: Not hardcoded to `gemini_provider` - routes through `main_text_generation.py`
+4. **HuggingFace Support**: Already supports HuggingFace provider
+
+### Architecture Benefits
+
+1. **Consistent Provider Selection**: All AI features use the same provider selection logic
+2. **Subscription Checks**: All AI calls go through `llm_text_gen()` which includes subscription checks
+3. **Usage Tracking**: All AI calls are tracked through the same usage tracking system
+4. **Provider Abstraction**: AI autofill doesn't need to know about specific providers
+
+## Migration Notes
+
+### No Changes Required
+
+The AI autofill code **does not need any changes** - it already uses the correct pattern:
+
+- ✅ Uses `AIServiceManager.execute_structured_json_call()`
+- ✅ Routes through `llm_text_gen()` from `main_text_generation.py`
+- ✅ Respects `GPT_PROVIDER` environment variable
+- ✅ Supports both Gemini and HuggingFace
+
+### Verification
+
+To verify provider switching works:
+
+1. Set `GPT_PROVIDER=huggingface` in environment
+2. Call AI autofill endpoint
+3. Check logs for provider used (should show "huggingface")
+4. Verify structured JSON response format
+
+## Summary
+
+**AI autofill already supports provider switching** - no code changes are required. The system uses the same provider selection pattern as blog writer and story writer, routing through `llm_text_gen()` from `main_text_generation.py`, which respects the `GPT_PROVIDER` environment variable and supports both Gemini and HuggingFace providers.