Files
ALwrity/docs/Onboarding/ONBOARDING_DATA_PERSISTENCE_REVIEW.md

174 lines
7.0 KiB
Markdown

# Onboarding Data Persistence - Critical Review
## ✅ Fixes Applied
### 1. Step Completion Data Saving (`step_management_service.py`)
**Status**: ✅ **CORRECTLY IMPLEMENTED**
All steps now save data to database:
- **Step 1 (API Keys)**: ✅ Saves via `save_api_key()` for each provider
- **Step 2 (Website Analysis)**: ✅ Saves via `save_website_analysis()`
- **Step 3 (Research Preferences)**: ✅ Saves via `save_research_preferences()`
- **Step 4 (Persona Data)**: ✅ Saves via `save_persona_data()`
**Data Structure Handling**:
- Correctly handles both `{ data: {...} }` wrapper and flat structures
- Uses `request_data.get('data') or request_data` pattern
- Non-blocking: Step completion continues even if save fails (with warnings)
**Error Tracking**:
- `save_errors` list tracks all failures
- Warnings included in response for frontend visibility
- Detailed logging with ✅/❌ indicators
### 2. Error Handling Improvements (`database_service.py`)
**Status**: ✅ **CORRECTLY IMPLEMENTED**
All save methods now have:
- ✅ Detailed error logging with data keys
- ✅ Full traceback logging
- ✅ Catches both `SQLAlchemyError` and general `Exception`
- ✅ Proper rollback on errors
- ✅ Returns `False` on failure (non-blocking)
**Methods Updated**:
- `save_website_analysis()`
- `save_research_preferences()`
- `save_persona_data()`
- `save_api_key()`
### 3. Competitor Analysis Data Flow
**Status**: ⚠️ **IMPLEMENTED BUT CURRENTLY FAILING IN SOME SESSIONS**
#### Saving Flow:
1. **When**: During Step 3, when `/api/onboarding/step3/discover-competitors` is called
2. **Where**: `step3_research_service.py``store_research_data()` method (lines 427-469)
3. **How**: Saves each competitor to `CompetitorAnalysis` table with:
- `session_id` (links to user's onboarding session)
- `competitor_url` and `competitor_domain`
- `analysis_data` (JSON with title, summary, insights, etc.)
- `status` (completed/failed/in_progress)
#### Fetching Flow:
1. **Where**: `data_integration.py``_get_competitor_analysis()` method (lines 450-484)
2. **How**:
- Gets latest onboarding session for user
- Queries `CompetitorAnalysis` table filtered by `session_id`
- Converts records to dictionaries with `to_dict()`
- Adds `data_freshness` and `confidence_level` metadata
3. **Returns**: List of competitor dictionaries
#### Usage Flow:
1. **Integration**: `process_onboarding_data()` calls `_get_competitor_analysis()` (line 51)
2. **Normalization**: `autofill_service.py` calls `normalize_competitor_analysis()` (line 74)
3. **Transformation**: Normalized data passed to `transform_to_fields()` for field mapping
4. **Fields Populated**:
- `top_competitors`
- `competitor_content_strategies`
- `market_gaps`
- `industry_trends`
- `emerging_trends`
## 🔍 Verification Checklist
### Step Completion Data Saving
- [x] Step 1 saves API keys
- [x] Step 2 saves website analysis
- [x] Step 3 saves research preferences
- [x] Step 4 saves persona data
- [x] Handles `{ data: {...} }` wrapper structure
- [x] Handles flat structure (backward compatibility)
- [x] Non-blocking error handling
- [x] Warnings returned in response
### Error Handling
- [x] Detailed error logging
- [x] Traceback included
- [x] Data keys logged for debugging
- [x] Proper rollback on errors
- [x] Non-blocking (returns False, doesn't raise)
### Competitor Analysis
- [x] Competitors saved during discovery (Step 3)
- [x] Competitors fetched by user_id and session_id
- [x] Competitors normalized correctly
- [x] Competitors used in transformer for field mapping
- [x] Data flow: Save → Fetch → Normalize → Transform
## ⚠️ Potential Issues & Notes
### 1. Step 3 Data Structure
**Note**: Step 3 completion saves `research_preferences`, but competitor data is saved separately via the `/discover-competitors` endpoint. This is **intentional** and **correct**:
- Competitor discovery happens asynchronously during Step 3
- Research preferences (content_types, target_audience, etc.) are saved on step completion
- Both are needed and work together
### 2. Data Structure Handling
**Verified**: The code correctly handles:
```python
# Frontend sends: { data: { website: "...", analysis: {...} } }
# Code extracts: request_data.get('data') or request_data
# This works for both wrapped and flat structures
```
### 3. Competitor Analysis Timing
**Note**: Competitor analysis is saved when `/discover-competitors` is called, which may happen:
- Before step 3 completion (user discovers competitors first)
- After step 3 completion (user completes step then discovers)
Both scenarios work because:
- Competitors are linked by `session_id` (not step completion)
- Fetching uses `session_id` to get all competitors for the user
## ✅ Confirmation (Updated)
**Partial confirmation based on current logs:**
1.**Step 2, 3, 4 data saving**: Implemented, but real data still appears sparse for some users
2.**Error handling**: Implemented and non-blocking
3. ⚠️ **Competitor analysis**: Save flow exists, but **no competitor records found** for the current session in logs
4.**Data structure handling**: Handles both wrapped and flat structures
5.**Logging**: Detailed logging for debugging
## 🔍 Current Findings From Logs (Jan 15)
1. **Competitor records missing**:
- Session found, but **0 competitor records** for session
- Indicates either discover step not called or save did not persist
2. **Session timestamp logging error**:
- `OnboardingSession` does **not** have `created_at` field (logging bug)
- **Fix applied**: Log now uses `started_at` or `updated_at`
3. **Input data points crash**:
- `build_input_data_points()` signature mismatch caused 500 errors
- **Fix applied**: Signature now includes `gsc_raw` and `bing_raw`
4. **GSC/Bing analytics init errors**:
- `SEODashboardService.__init__()` requires `db` argument but called without it
- **Fix applied**: Service is now instantiated with a DB session
## 🧪 Testing Recommendations
1. **Test Step 2**: Complete website analysis → Verify data persists → Check autofill uses real data
2. **Test Step 3**: Complete research preferences → Discover competitors → Verify both save → Check autofill uses both
3. **Test Step 4**: Complete persona generation → Verify data persists → Check autofill uses real data
4. **Test Error Handling**: Simulate database error → Verify step still completes with warnings
5. **Test Data Refresh**: Complete steps → Refresh page → Verify data persists
6. **Test Competitor Discovery**: Call `/api/onboarding/step3/discover-competitors` → verify DB rows
7. **Test Content Strategy Autofill**: Verify `meta.missing_optional_sources` does **not** include `competitor_analysis`
## 📊 Expected Impact
**Before Fixes**:
- Steps 2, 3, 4 completed but data not saved
- Content strategy autofill used placeholders/fallbacks
- Silent failures
**After Fixes**:
- All step data persisted to database
- Content strategy autofill uses real user data
- Better error visibility and debugging
- Warnings returned to frontend if saves fail