5.4 KiB
Step 2 Dual Persistence Issue and Fix
Problem Discovery
User reported that after our database migration changes, they cannot see previous analysis in Step 2's cache/existing analysis feature.
Root Cause Analysis
Two Competing Systems Writing to Same Table
Both systems write to website_analyses table but with different session_id strategies:
1. Style Detection System (Original)
Endpoints: /api/onboarding/style-detection/*
Service: WebsiteAnalysisService
Session ID Type: INTEGER (SHA256 hash of Clerk user_id)
# component_logic.py line 523
user_id_int = clerk_user_id_to_int(user_id) # SHA256 hash → 724716666
# Saves to website_analyses table
analysis_service.save_analysis(user_id_int, request.url, response_data)
# Result: session_id = 724716666
2. Onboarding System (New)
Service: OnboardingDatabaseService
Session ID Type: Auto-increment integer from OnboardingSession.id
# OnboardingDatabaseService
session = self.get_or_create_session(user_id, session_db) # user_id is Clerk string
# session.id = 1, 2, 3, etc. (auto-increment)
# Saves to website_analyses table
analysis = WebsiteAnalysis(session_id=session.id, ...) # session_id = 1, 2, 3...
The Conflict
When a user analyzes their website:
- Analysis happens →
/style-detection/completesaves withsession_id = 724716666 - Check existing → Queries for
session_id = 724716666✅ FINDS IT - User clicks Continue →
OnboardingProgress.save_progress()saves withsession_id = 3(fromOnboardingSession.id) - Result: TWO records in
website_analysesfor same URL but differentsession_idvalues!
-- Table: website_analyses
id | session_id | website_url | writing_style | ...
----|-------------|-----------------------|---------------|----
42 | 724716666 | https://example.com | {...} | ... (from /style-detection/complete)
43 | 3 | https://example.com | {...} | ... (from OnboardingProgress.save_progress)
Why User Can't See Previous Analysis
After our migration:
OnboardingSession.user_idchanged to STRING (Clerk ID)OnboardingSession.idis auto-increment (1, 2, 3...)- Step 2 queries using SHA256 hash approach (724716666)
- Onboarding system saves using auto-increment ID (3)
- They never match!
Solutions
Option 1: Unified Session ID Strategy (RECOMMENDED)
Make both systems use the same session_id approach: the OnboardingSession.id.
Changes Required:
- Update
/style-detection/completeendpoint to useOnboardingSession:
# backend/api/component_logic.py
@router.post("/style-detection/complete")
async def complete_style_detection(request, current_user):
user_id = str(current_user.get('id'))
# Get or create OnboardingSession (not SHA256 hash)
from services.onboarding_database_service import OnboardingDatabaseService
onboarding_service = OnboardingDatabaseService()
db = next(get_db())
session = onboarding_service.get_or_create_session(user_id, db)
session_id = session.id # Use OnboardingSession.id instead of hash
# Save using this session_id
analysis_service.save_analysis(session_id, request.url, response_data)
- Update
check-existingendpoint similarly:
@router.get("/style-detection/check-existing/{website_url:path}")
async def check_existing_analysis(website_url, current_user):
user_id = str(current_user.get('id'))
# Get OnboardingSession (not SHA256 hash)
onboarding_service = OnboardingDatabaseService()
db = next(get_db())
session = onboarding_service.get_session_by_user(user_id, db)
if not session:
return {"exists": False}
# Query using OnboardingSession.id
existing = analysis_service.check_existing_analysis(session.id, website_url)
return existing
- Update
get-analysis/:idendpoint similarly.
Option 2: Keep Dual System, Sync Both Records
Keep both approaches but ensure both records are created/updated together.
❌ Not recommended - More complexity, potential for sync issues.
Option 3: Query Both Ways
Query by both session_id types and merge results.
❌ Not recommended - Hacky, doesn't solve root cause.
Implementation Plan
Phase 1: Update Style Detection Endpoints ✅
- Update
/style-detection/completeto useOnboardingSession.id - Update
/style-detection/check-existing/{url}to useOnboardingSession.id - Update
/style-detection/analysis/{id}to useOnboardingSession.id - Update
/style-detection/session-analysesto useOnboardingSession.id
Phase 2: Data Migration
Clean up duplicate records:
-- Keep only OnboardingSession-based records
DELETE FROM website_analyses
WHERE session_id NOT IN (
SELECT id FROM onboarding_sessions
);
Phase 3: Remove SHA256 Hash Approach
Remove clerk_user_id_to_int() function as it's no longer needed.
Benefits of Unified Approach
- ✅ Single source of truth for session_id
- ✅ No duplicate records
- ✅ Consistent user isolation
- ✅ Simpler codebase
- ✅ Cache/existing analysis works correctly
- ✅ Step 6 can retrieve data
Status
- ⏳ Pending: Update style detection endpoints
- ⏳ Pending: Test existing analysis feature
- ⏳ Pending: Data migration script
Next Action: Update /style-detection/* endpoints to use OnboardingSession.id instead of SHA256 hash.