Files
ALwrity/docs/STEP_2_WEBSITE_ANALYSIS_DATA_TRANSFORMATION_FIX.md

5.6 KiB

Step 2 Website Analysis Data Transformation Fix

Problem

Step 6 (FinalStep) was not displaying website analysis data, even though:

  • API Keys were successfully saved and retrieved
  • Research Preferences were successfully saved and retrieved
  • Persona Data was successfully saved and retrieved
  • Website Analysis was NOT being saved to the database

Root Cause

Data Structure Mismatch between frontend and backend:

Frontend Data Structure (WebsiteStep.tsx)

const stepData = {
  website: "https://example.com",  // ← Note: "website", not "website_url"
  domainName: "example.com",
  analysis: {                      // ← Nested object
    writing_style: { ... },
    content_characteristics: { ... },
    target_audience: { ... },
    content_type: { ... },
    // etc.
  },
  useAnalysisForGenAI: true
};

Database Schema Expects (Flat Structure)

{
  'website_url': 'https://example.com',  # ← "website_url" at root level
  'writing_style': { ... },              # ← All fields at root level
  'content_characteristics': { ... },
  'target_audience': { ... },
  'content_type': { ... },
  'recommended_settings': { ... },
  'crawl_result': { ... },
  'style_patterns': { ... },
  'style_guidelines': { ... },
  'status': 'completed'
}

The Issue

In backend/services/api_key_manager.py (line 278-280), the code was passing step.data directly to save_website_analysis():

elif step.step_number == 2:  # Website Analysis
    self.db_service.save_website_analysis(self.user_id, step.data, db)

But step.data had this structure:

{
  'website': 'https://example.com',
  'analysis': {
    'writing_style': { ... },
    # ...
  }
}

The database service expected website_url at the root level and all analysis fields flattened, so it couldn't find any of the data and saved an empty record (or didn't save at all).

Solution

Transform the frontend data structure to match the database schema before saving:

File: backend/services/api_key_manager.py (lines 278-289)

elif step.step_number == 2:  # Website Analysis
    # Transform frontend data structure to match database schema
    analysis_for_db = {
        'website_url': step.data.get('website', ''),
        'status': 'completed'
    }
    # Merge analysis fields if they exist
    if 'analysis' in step.data and step.data['analysis']:
        analysis_for_db.update(step.data['analysis'])
    
    self.db_service.save_website_analysis(self.user_id, analysis_for_db, db)
    logger.info(f"✅ DATABASE: Website analysis saved to database for user {self.user_id}")

What This Does:

  1. Creates base structure: {'website_url': '...', 'status': 'completed'}
  2. Flattens nested analysis object: Uses .update() to merge all analysis fields to root level
  3. Result: Data matches database schema exactly

Example Transformation:

Before (frontend format):

{
  'website': 'https://example.com',
  'analysis': {
    'writing_style': {'tone': 'Professional'},
    'target_audience': {'demographics': ['B2B']}
  }
}

After (database format):

{
  'website_url': 'https://example.com',
  'status': 'completed',
  'writing_style': {'tone': 'Professional'},
  'target_audience': {'demographics': ['B2B']}
}

Testing

To verify the fix:

  1. Restart the backend server to load the updated code
  2. Complete Step 2 (Website Analysis) in the onboarding flow
  3. Check backend logs for:
    ✅ DATABASE: Website analysis saved to database for user {user_id}
    
  4. Navigate to Step 6 (FinalStep)
  5. Verify website URL and style analysis are displayed

Expected Backend Logs After Fix:

INFO|api_key_manager.py:289|✅ DATABASE: Website analysis saved to database for user {user_id}
INFO|onboarding_summary_service.py:85|Retrieved website analysis from database for user {user_id}
  • frontend/src/components/OnboardingWizard/WebsiteStep.tsx - Frontend data structure
  • backend/services/api_key_manager.py - Data transformation logic
  • backend/services/onboarding_database_service.py - Database save/retrieve methods
  • backend/models/onboarding.py - WebsiteAnalysis model schema

Why This Pattern?

This is a common issue in full-stack applications where:

  1. Frontend optimizes for UI structure (nested for component organization)
  2. Database optimizes for query performance (flat for indexing)
  3. Backend middleware transforms between the two

Alternative Solutions Considered

Option 1: Change Frontend Structure

Rejected: Would break all existing Step 2 components and localStorage caching

Option 2: Change Database Schema

Rejected: Would require complex JSON queries and lose type safety

Option 3: Transform in Middleware (Selected)

Best: Minimal code change, maintains backward compatibility, clear separation of concerns

Future Improvements

Consider adding a data transformation layer for all onboarding steps to handle similar mismatches proactively:

class OnboardingDataTransformer:
    @staticmethod
    def transform_step_2(frontend_data: Dict) -> Dict:
        """Transform Step 2 data from frontend to database format."""
        return {
            'website_url': frontend_data.get('website', ''),
            'status': 'completed',
            **frontend_data.get('analysis', {})
        }

This would centralize all data transformations and make the codebase more maintainable.

Status

Fixed: Website analysis data now saves correctly to database
Pending: Restart backend and test with actual user flow