Files
ALwrity/docs/POLLING_TIMEOUT_FIXES.md

5.9 KiB

Polling Timeout Issues - Fixed

🚨 Problem Identified

The research endpoint was timing out even with polling because:

  1. Frontend polling was using 60-second timeout for status checks
  2. Research operations were taking longer than 60 seconds
  3. Polling continued indefinitely after timeout instead of stopping
  4. No backend timeout protection for long-running operations

Solutions Implemented

1. Frontend Timeout Fixes

New Polling API Client:

  • Created pollingApiClient with 10-second timeout for status checks
  • Status checks should be quick, so 10 seconds is sufficient
  • Updated pollResearchStatus and pollOutlineStatus to use polling client

Enhanced Error Handling:

  • Improved timeout error messages in usePolling hook
  • Better distinction between timeout and other errors
  • Clear user messaging: "Request timeout - the research operation may still be running"

2. Backend Timeout Protection

Research Operation Timeout:

  • Added 5-minute timeout to research operations using asyncio.wait_for
  • Graceful timeout handling with clear error messages
  • Task status properly set to "failed" on timeout

Outline Generation Timeout:

  • Added 3-minute timeout to outline generation operations
  • Consistent timeout handling across all async operations

3. Improved User Experience

Better Error Messages:

  • Clear timeout messages: "Research operation timed out after 5 minutes"
  • Helpful suggestions: "Please try again with a simpler query"
  • Distinction between request timeout and operation timeout

Proper Polling Behavior:

  • Polling stops immediately on timeout
  • No more infinite polling loops
  • Clean error state management

🔧 Technical Implementation

Frontend Changes:

New API Client:

// pollingApiClient with 10-second timeout
export const pollingApiClient = axios.create({
  baseURL: 'http://localhost:8000',
  timeout: 10000, // 10 seconds for status checks
  headers: { 'Content-Type': 'application/json' }
});

Updated Polling Methods:

async pollResearchStatus(taskId: string): Promise<TaskStatusResponse> {
  const { data } = await pollingApiClient.get(`/api/blog/research/status/${taskId}`);
  return data;
}

Enhanced Error Handling:

if (errorMessage.includes('timeout') || errorMessage.includes('TIMEOUT')) {
  const timeoutMessage = 'Request timeout - the research operation may still be running. Please try again later.';
  setError(timeoutMessage);
  onError?.(timeoutMessage);
}

Backend Changes:

Research Operation Timeout:

try:
  # Add a timeout to the research operation (5 minutes)
  result = await asyncio.wait_for(
    service.research_with_progress(request, task_id),
    timeout=300  # 5 minutes timeout
  )
except asyncio.TimeoutError:
  await _update_progress(task_id, "⏰ Research operation timed out after 5 minutes. Please try again with a simpler query.")
  task_storage[task_id]["status"] = "failed"
  task_storage[task_id]["error"] = "Research operation timed out after 5 minutes"
  return

Outline Generation Timeout:

try:
  # Add a timeout to the outline generation operation (3 minutes)
  result = await asyncio.wait_for(
    service.generate_outline_with_progress(request, task_id),
    timeout=180  # 3 minutes timeout
  )
except asyncio.TimeoutError:
  await _update_progress(task_id, "⏰ Outline generation timed out after 3 minutes. Please try again.")
  task_storage[task_id]["status"] = "failed"
  task_storage[task_id]["error"] = "Outline generation timed out after 3 minutes"
  return

📊 Timeout Configuration

Frontend Timeouts:

  • Status Polling: 10 seconds (should be quick)
  • Regular API: 60 seconds (for normal operations)
  • AI Operations: 3 minutes (for AI processing)
  • Long Operations: 5 minutes (for SEO analysis)

Backend Timeouts:

  • Research Operations: 5 minutes (comprehensive research)
  • Outline Generation: 3 minutes (outline creation)
  • Task Cleanup: 1 hour (memory management)

🎯 Expected Behavior Now

Before (Broken):

  • Polling timed out after 60 seconds
  • Polling continued indefinitely
  • No backend timeout protection
  • Poor error messages

After (Fixed):

  • Status checks timeout in 10 seconds (quick response)
  • Research operations timeout in 5 minutes (reasonable limit)
  • Polling stops immediately on timeout
  • Clear error messages with helpful suggestions
  • Backend prevents runaway operations

🚀 User Experience

Normal Flow:

  1. User starts research → Task ID returned
  2. Frontend polls every 2 seconds with 10-second timeout
  3. Backend completes research within 5 minutes
  4. User sees progress messages and final results

Timeout Flow:

  1. User starts research → Task ID returned
  2. Research takes longer than 5 minutes
  3. Backend times out and sets task to "failed"
  4. Frontend receives timeout error and stops polling
  5. User sees clear message: "Research operation timed out after 5 minutes. Please try again with a simpler query."

📁 Files Modified

Frontend:

  • frontend/src/api/client.ts - Added pollingApiClient
  • frontend/src/services/blogWriterApi.ts - Updated to use polling client
  • frontend/src/hooks/usePolling.ts - Enhanced error handling

Backend:

  • backend/api/blog_writer/router.py - Added operation timeouts

🎉 Result

The polling system now works correctly with:

  • Proper timeout handling at both frontend and backend levels
  • No more infinite polling loops
  • Clear error messages for users
  • Reasonable timeout limits for different operations
  • Graceful failure handling with helpful suggestions

Users will now have a much better experience with the research system! 🎉