Files
ALwrity/docs/Onboarding/ONBOARDING_DATA_PERSISTENCE_REVIEW.md

7.0 KiB

Onboarding Data Persistence - Critical Review

Fixes Applied

1. Step Completion Data Saving (step_management_service.py)

Status: CORRECTLY IMPLEMENTED

All steps now save data to database:

  • Step 1 (API Keys): Saves via save_api_key() for each provider
  • Step 2 (Website Analysis): Saves via save_website_analysis()
  • Step 3 (Research Preferences): Saves via save_research_preferences()
  • Step 4 (Persona Data): Saves via save_persona_data()

Data Structure Handling:

  • Correctly handles both { data: {...} } wrapper and flat structures
  • Uses request_data.get('data') or request_data pattern
  • Non-blocking: Step completion continues even if save fails (with warnings)

Error Tracking:

  • save_errors list tracks all failures
  • Warnings included in response for frontend visibility
  • Detailed logging with / indicators

2. Error Handling Improvements (database_service.py)

Status: CORRECTLY IMPLEMENTED

All save methods now have:

  • Detailed error logging with data keys
  • Full traceback logging
  • Catches both SQLAlchemyError and general Exception
  • Proper rollback on errors
  • Returns False on failure (non-blocking)

Methods Updated:

  • save_website_analysis()
  • save_research_preferences()
  • save_persona_data()
  • save_api_key()

3. Competitor Analysis Data Flow

Status: ⚠️ IMPLEMENTED BUT CURRENTLY FAILING IN SOME SESSIONS

Saving Flow:

  1. When: During Step 3, when /api/onboarding/step3/discover-competitors is called
  2. Where: step3_research_service.pystore_research_data() method (lines 427-469)
  3. How: Saves each competitor to CompetitorAnalysis table with:
    • session_id (links to user's onboarding session)
    • competitor_url and competitor_domain
    • analysis_data (JSON with title, summary, insights, etc.)
    • status (completed/failed/in_progress)

Fetching Flow:

  1. Where: data_integration.py_get_competitor_analysis() method (lines 450-484)
  2. How:
    • Gets latest onboarding session for user
    • Queries CompetitorAnalysis table filtered by session_id
    • Converts records to dictionaries with to_dict()
    • Adds data_freshness and confidence_level metadata
  3. Returns: List of competitor dictionaries

Usage Flow:

  1. Integration: process_onboarding_data() calls _get_competitor_analysis() (line 51)
  2. Normalization: autofill_service.py calls normalize_competitor_analysis() (line 74)
  3. Transformation: Normalized data passed to transform_to_fields() for field mapping
  4. Fields Populated:
    • top_competitors
    • competitor_content_strategies
    • market_gaps
    • industry_trends
    • emerging_trends

🔍 Verification Checklist

Step Completion Data Saving

  • Step 1 saves API keys
  • Step 2 saves website analysis
  • Step 3 saves research preferences
  • Step 4 saves persona data
  • Handles { data: {...} } wrapper structure
  • Handles flat structure (backward compatibility)
  • Non-blocking error handling
  • Warnings returned in response

Error Handling

  • Detailed error logging
  • Traceback included
  • Data keys logged for debugging
  • Proper rollback on errors
  • Non-blocking (returns False, doesn't raise)

Competitor Analysis

  • Competitors saved during discovery (Step 3)
  • Competitors fetched by user_id and session_id
  • Competitors normalized correctly
  • Competitors used in transformer for field mapping
  • Data flow: Save → Fetch → Normalize → Transform

⚠️ Potential Issues & Notes

1. Step 3 Data Structure

Note: Step 3 completion saves research_preferences, but competitor data is saved separately via the /discover-competitors endpoint. This is intentional and correct:

  • Competitor discovery happens asynchronously during Step 3
  • Research preferences (content_types, target_audience, etc.) are saved on step completion
  • Both are needed and work together

2. Data Structure Handling

Verified: The code correctly handles:

# Frontend sends: { data: { website: "...", analysis: {...} } }
# Code extracts: request_data.get('data') or request_data
# This works for both wrapped and flat structures

3. Competitor Analysis Timing

Note: Competitor analysis is saved when /discover-competitors is called, which may happen:

  • Before step 3 completion (user discovers competitors first)
  • After step 3 completion (user completes step then discovers)

Both scenarios work because:

  • Competitors are linked by session_id (not step completion)
  • Fetching uses session_id to get all competitors for the user

Confirmation (Updated)

Partial confirmation based on current logs:

  1. Step 2, 3, 4 data saving: Implemented, but real data still appears sparse for some users
  2. Error handling: Implemented and non-blocking
  3. ⚠️ Competitor analysis: Save flow exists, but no competitor records found for the current session in logs
  4. Data structure handling: Handles both wrapped and flat structures
  5. Logging: Detailed logging for debugging

🔍 Current Findings From Logs (Jan 15)

  1. Competitor records missing:
    • Session found, but 0 competitor records for session
    • Indicates either discover step not called or save did not persist
  2. Session timestamp logging error:
    • OnboardingSession does not have created_at field (logging bug)
    • Fix applied: Log now uses started_at or updated_at
  3. Input data points crash:
    • build_input_data_points() signature mismatch caused 500 errors
    • Fix applied: Signature now includes gsc_raw and bing_raw
  4. GSC/Bing analytics init errors:
    • SEODashboardService.__init__() requires db argument but called without it
    • Fix applied: Service is now instantiated with a DB session

🧪 Testing Recommendations

  1. Test Step 2: Complete website analysis → Verify data persists → Check autofill uses real data
  2. Test Step 3: Complete research preferences → Discover competitors → Verify both save → Check autofill uses both
  3. Test Step 4: Complete persona generation → Verify data persists → Check autofill uses real data
  4. Test Error Handling: Simulate database error → Verify step still completes with warnings
  5. Test Data Refresh: Complete steps → Refresh page → Verify data persists
  6. Test Competitor Discovery: Call /api/onboarding/step3/discover-competitors → verify DB rows
  7. Test Content Strategy Autofill: Verify meta.missing_optional_sources does not include competitor_analysis

📊 Expected Impact

Before Fixes:

  • Steps 2, 3, 4 completed but data not saved
  • Content strategy autofill used placeholders/fallbacks
  • Silent failures

After Fixes:

  • All step data persisted to database
  • Content strategy autofill uses real user data
  • Better error visibility and debugging
  • Warnings returned to frontend if saves fail