7.6 KiB
🚨 CRITICAL: Onboarding Data Must Use Database
Issue Summary
Severity: 🔴 CRITICAL
Impact: User isolation, data persistence, security
Status: ⚠️ NEEDS IMMEDIATE FIX AFTER DEPLOYMENT STABILIZES
Problem Description
The onboarding system currently saves all user data to a JSON file (.onboarding_progress.json) instead of using the database. This causes multiple critical issues:
1. No User Isolation 🔴
- All users share the same JSON file
- User data can be overwritten by other users
- Privacy violation - users can see each other's data
- Line:
backend/services/api_key_manager.py:45 - Code:
self.progress_file = progress_file or ".onboarding_progress.json"
2. Data Loss on Deployment 🔴
- Render uses ephemeral filesystem
- File is deleted on every deployment/restart
- Users lose all onboarding progress
- Have to restart onboarding after each deployment
3. No Scalability 🔴
- Won't work with multiple backend instances
- File locking issues
- Race conditions
- Performance bottleneck
4. Security Risk 🔴
- API keys stored in plain text JSON file
- No encryption
- File accessible with filesystem access
- Should be in database with proper security
Current Architecture
User completes step → OnboardingProgress.mark_step_completed()
→ save_progress() (line 214)
→ json.dump(progress_data, ".onboarding_progress.json")
File Location: backend/.onboarding_progress.json
Affected Code:
backend/services/api_key_manager.py(OnboardingProgress class)backend/api/onboarding_utils/endpoints_core.pybackend/api/onboarding_utils/step_management_service.py
Database Models Available
✅ Good News: Proper database models already exist!
File: backend/models/onboarding.py
- OnboardingSession (user_id, current_step, progress, started_at, updated_at)
- APIKey (session_id, provider, key, created_at, updated_at)
- WebsiteAnalysis (session_id, website_url, analysis_date, writing_style, etc.)
- ResearchPreferences (session_id, research_depth, content_types, etc.)
Database Schema:
- ✅ User isolation via
user_idandsession_id - ✅ Proper relationships and foreign keys
- ✅ Timestamps for audit trail
- ✅ JSON fields for complex data
- ✅ Cascade deletes for cleanup
Required Changes
Phase 1: Database Layer (Priority 1)
File: backend/services/onboarding_database_service.py (NEW)
class OnboardingDatabaseService:
"""Database-backed onboarding service replacing JSON file storage."""
def get_or_create_session(self, user_id: str) -> OnboardingSession:
"""Get existing session or create new one."""
def get_progress(self, user_id: str) -> OnboardingProgress:
"""Load progress from database."""
def save_step_data(self, user_id: str, step_number: int, data: Dict):
"""Save step data to database."""
def mark_step_completed(self, user_id: str, step_number: int):
"""Mark step as completed in database."""
def get_step_data(self, user_id: str, step_number: int) -> Dict:
"""Retrieve step data from database."""
Phase 2: Refactor API Key Manager (Priority 1)
File: backend/services/api_key_manager.py
Changes:
- Remove JSON file operations (lines 214-242)
- Add database dependency injection
- Replace
save_progress()with database calls - Replace
load_progress()with database queries - Add user_id parameter to all methods
Before:
def mark_step_completed(self, step_number: int, data: Optional[Dict] = None):
# ... update in-memory state ...
self.save_progress() # Saves to JSON file
After:
def mark_step_completed(self, user_id: str, step_number: int, data: Optional[Dict] = None):
# ... update database ...
db_service.save_step_data(user_id, step_number, data)
db_service.mark_step_completed(user_id, step_number)
Phase 3: Update Endpoints (Priority 2)
Files to Update:
backend/api/onboarding_utils/endpoints_core.pybackend/api/onboarding_utils/step_management_service.pybackend/api/onboarding_utils/step3_routes.pybackend/api/onboarding_utils/step4_persona_routes.py
Changes:
- Pass
user_idfromget_current_userto all service calls - Remove file-based caching
- Use database queries for progress retrieval
Phase 4: Migration Script (Priority 3)
File: backend/scripts/migrate_onboarding_to_database.py (NEW)
def migrate_json_to_database():
"""
Migrate existing .onboarding_progress.json to database.
Only needed if production has existing data in JSON files.
"""
# Read JSON file
# Create database records
# Backup JSON file
# Delete JSON file
Implementation Plan
Step 1: Create Database Service (1-2 hours)
- Create
onboarding_database_service.py - Implement CRUD operations
- Add user isolation checks
- Write unit tests
Step 2: Refactor API Key Manager (2-3 hours)
- Remove JSON file operations
- Add database calls
- Update method signatures with user_id
- Test with database
Step 3: Update Endpoints (1-2 hours)
- Pass user_id to service calls
- Remove file-based logic
- Test each endpoint
Step 4: Testing (2-3 hours)
- Test user isolation
- Test data persistence across deployments
- Test concurrent users
- Test error handling
Step 5: Deployment (1 hour)
- Deploy to staging
- Run migration script if needed
- Deploy to production
- Monitor for issues
Total Estimated Time: 8-12 hours
Temporary Mitigation
Until this is fixed, we must:
- ✅ Add
.onboarding_progress.jsonto.gitignore - ✅ Document that onboarding data will be lost on deployment
- ⚠️ Warn users that onboarding must be completed in one session
- ⚠️ Consider using Render's persistent disk (expensive workaround)
Testing Checklist
After migration:
- User A completes onboarding
- User B completes onboarding
- Verify User A and User B data are separate
- Redeploy backend
- Verify both users' data persists
- User C starts onboarding
- Verify User C doesn't see User A or B data
- Test concurrent onboarding (multiple users at once)
- Verify API keys are stored securely
- Test onboarding restart (partial completion)
Security Considerations
Current (Insecure):
{
"steps": [
{
"step_number": 1,
"data": {
"api_keys": {
"gemini": "ACTUAL_API_KEY_HERE",
"exa": "ACTUAL_API_KEY_HERE"
}
}
}
]
}
After Migration (Secure):
- API keys in database with user isolation
- Encrypted at rest (if database supports it)
- Access controlled by user_id
- Audit trail via timestamps
References
- Database Models:
backend/models/onboarding.py - Current Implementation:
backend/services/api_key_manager.py - Endpoints:
backend/api/onboarding_utils/ - Issue tracking: GitHub Issue #XXX (to be created)
Priority
This must be fixed before:
- ❌ Going to production with real users
- ❌ Accepting paying customers
- ❌ Handling sensitive data
- ❌ Scaling to multiple instances
Acceptable to delay if:
- ✅ Still in alpha/beta with limited users
- ✅ Users aware of data loss on deployment
- ✅ Not handling production workloads yet
Conclusion
This is a critical architectural flaw that violates basic principles:
- User data isolation
- Data persistence
- Security best practices
- Scalability
Must be fixed immediately after current deployment stabilizes.