Files
ALwrity/docs/SESSION_ID_CLEANUP_SUMMARY.md

7.6 KiB

Session ID Cleanup Summary

Date: October 1, 2025
Issue: Frontend session ID confusion - unnecessary tracking when backend uses Clerk user ID


Problem Statement

The frontend was maintaining a separate sessionId state and passing it to the backend, but:

  • Backend authenticates via Clerk JWT tokens
  • User identity comes from current_user (auth token)
  • Session ID was never actually used for session management
  • Created confusion and unnecessary complexity

Solution Implemented

Frontend Changes

File: frontend/src/components/OnboardingWizard/Wizard.tsx

Removed:

const [sessionId, setSessionId] = useState<string>('');  // ❌ DELETED

Updated initialization:

// Before: setSessionId(session.session_id);
// After: Just log for debugging
console.log('Wizard: Initialized from cache:', {
  step: onboarding.current_step,
  progress: onboarding.completion_percentage,
  userId: session.session_id  // Just for logging
});

Updated component props:

// Before:
<CompetitorAnalysisStep 
  sessionId={sessionId}  // ❌ REMOVED
  userUrl={stepData?.website || ''}
  industryContext={stepData?.industryContext}
/>

// After:
<CompetitorAnalysisStep 
  userUrl={stepData?.website || ''}
  industryContext={stepData?.industryContext}
/>

File: frontend/src/components/OnboardingWizard/CompetitorAnalysisStep.tsx

Updated interface:

// Before:
interface CompetitorAnalysisStepProps {
  onContinue: (researchData?: any) => void;
  onBack: () => void;
  sessionId: string;  // ❌ REMOVED
  userUrl: string;
  industryContext?: string;
}

// After:
interface CompetitorAnalysisStepProps {
  onContinue: (researchData?: any) => void;
  onBack: () => void;
  // sessionId removed - backend uses authenticated user from Clerk token
  userUrl: string;
  industryContext?: string;
}

Updated API call:

// Before:
body: JSON.stringify({
  session_id: sessionId,  // ❌ REMOVED
  user_url: userUrl,
  industry_context: industryContext,
  num_results: 25,
  website_analysis_data: websiteAnalysisData
})

// After:
body: JSON.stringify({
  // session_id removed - backend gets user from auth token
  user_url: userUrl,
  industry_context: industryContext,
  num_results: 25,
  website_analysis_data: websiteAnalysisData
})

Updated dependencies:

// Before:
}, [sessionId, userUrl, industryContext]);

// After:
}, [userUrl, industryContext]);  // sessionId removed

Backend Changes

File: backend/api/onboarding_utils/step3_routes.py

Made session_id optional:

# Before:
class CompetitorDiscoveryRequest(BaseModel):
    session_id: str = Field(..., description="Onboarding session ID")

# After:
class CompetitorDiscoveryRequest(BaseModel):
    session_id: Optional[str] = Field(
        None, 
        description="Deprecated - user identification comes from auth token"
    )

Updated endpoint logic:

# Before:
logger.info(f"Starting competitor discovery for session {request.session_id}")
session_id = request.session_id if request.session_id else "default_session"

# After:
# Session ID is deprecated - we use authenticated user from token instead
session_id = request.session_id if request.session_id else "user_authenticated"
logger.info(f"Starting competitor discovery for URL: {request.user_url}")

How Authentication Actually Works

Request Flow:

1. Frontend makes API call with Clerk JWT token
   ↓
2. Backend middleware extracts token from Authorization header
   ↓
3. Token verified via JWKS (with 60s leeway for clock skew)
   ↓
4. User ID extracted from token claims (sub field)
   ↓
5. User object passed to endpoint via Depends(get_current_user)
   ↓
6. Backend uses Clerk user ID for all user-specific operations

User Session Management:

# backend/services/api_key_manager.py
def get_onboarding_progress_for_user(user_id: str) -> OnboardingProgress:
    """
    Uses Clerk user_id (from auth token) as the session identifier.
    No separate session ID needed!
    """
    progress_file = f".onboarding_progress_{safe_user_id}.json"
    return OnboardingProgress(progress_file=progress_file)

What Was Removed

Unnecessary Code:

  1. Frontend session state:

    • const [sessionId, setSessionId] = useState<string>('')
    • setSessionId(...) calls
    • sessionId prop passing
  2. localStorage session tracking:

    • No more localStorage.setItem('onboarding_session_id', ...)
    • No more localStorage.getItem('onboarding_session_id')
  3. API request session_id:

    • Removed from request body
    • Backend made it optional

Benefits

Code Quality:

  • Simpler: Less state to manage
  • Clearer: No confusion about what "session" means
  • Aligned: Matches actual backend architecture

Maintainability:

  • Fewer moving parts
  • Less chance of session tracking bugs
  • Clear authentication flow

Security:

  • Single source of truth (Clerk token)
  • No parallel session tracking
  • Reduced attack surface

Testing Checklist

  • Frontend compiles without errors
  • Onboarding wizard loads successfully
  • Step 3 (Competitor Analysis) works without sessionId
  • Backend accepts requests without session_id
  • Backend still accepts requests with session_id (backwards compat)
  • User progress persists correctly
  • No console errors about missing sessionId

Migration Notes

For Other Developers:

If you have code that uses sessionId:

DON'T:

// Don't pass sessionId anymore
<CompetitorAnalysisStep sessionId={someId} ... />

// Don't send session_id in API calls
fetch('/api/...', {
  body: JSON.stringify({ session_id: someId })
})

DO:

// Just pass the required props
<CompetitorAnalysisStep userUrl={url} industryContext={context} />

// Let backend get user from auth token
fetch('/api/...', {
  headers: { 'Authorization': `Bearer ${token}` },
  body: JSON.stringify({ /* no session_id */ })
})

Backwards Compatibility

Old Frontend Code:

If old frontend still sends session_id, it will:

  • Still work (backend accepts it as Optional)
  • Be ignored (backend uses auth token instead)
  • Log a warning (if needed, add deprecation warning)

API Contract:

  • Request: session_id is now optional
  • Response: session_id still included for compatibility
  • No breaking changes

This cleanup builds on:

  1. Batch API Endpoint - Reduced API calls (see: BATCH_API_IMPLEMENTATION_SUMMARY.md)
  2. Auth Fix - Clock skew resolution (see: CLOCK_SKEW_FIX.md)
  3. Code Review - Identified this issue (see: END_USER_FLOW_CODE_REVIEW.md)

Files Modified

Frontend (2 files):

  • frontend/src/components/OnboardingWizard/Wizard.tsx
  • frontend/src/components/OnboardingWizard/CompetitorAnalysisStep.tsx

Backend (1 file):

  • backend/api/onboarding_utils/step3_routes.py

Conclusion

Session ID successfully removed from frontend
Backend made backwards compatible
Code now aligns with actual architecture
User authentication via Clerk token only

The codebase is now cleaner, simpler, and more maintainable. The "session" is actually the authenticated Clerk user - no separate tracking needed!


Next Steps

  1. Test the changes end-to-end
  2. Monitor for any session-related errors
  3. Eventually remove session_id from backend responses (breaking change - schedule for v2.0)
  4. Update API documentation to reflect changes