kunthawat/ALwrity

Fork 0

Files

ajaysi 0b63ae7fc1 AI Researcher and Video Studio implementation complete

2026-01-05 15:49:51 +05:30

21 KiB

Raw Permalink Blame History

Current Research Engine Architecture Overview

Date: 2025-01-29
Status: Authoritative Architecture Documentation

📋 Overview

This document provides a comprehensive overview of the current Research Engine architecture. This is the single source of truth for understanding how the research system works.

Note: For detailed implementation rules and patterns, see .cursor/rules/researcher-architecture.mdc

🏗️ High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        USER INTERFACE                            │
├─────────────────────────────────────────────────────────────────┤
│  ResearchWizard (3 Steps)                                        │
│  ├── Step 1: ResearchInput (Input + Intent & Options)          │
│  ├── Step 2: StepProgress (Progress/Polling)                   │
│  └── Step 3: StepResults (Tabbed Results Display)              │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      FRONTEND HOOKS                              │
├─────────────────────────────────────────────────────────────────┤
│  useIntentResearch                                              │
│  ├── analyzeIntent() → /api/research/intent/analyze            │
│  ├── confirmIntent() → Updates local state                      │
│  └── executeResearch() → /api/research/intent/research        │
│                                                                  │
│  useResearchExecution                                            │
│  ├── executeIntentResearch() → Intent-driven flow              │
│  └── executeTraditionalResearch() → Fallback flow              │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      API ENDPOINTS                               │
├─────────────────────────────────────────────────────────────────┤
│  POST /api/research/intent/analyze                              │
│  └── UnifiedResearchAnalyzer.analyze()                         │
│                                                                  │
│  POST /api/research/intent/research                            │
│  ├── ResearchEngine.research()                                  │
│  └── IntentAwareAnalyzer.analyze()                             │
│                                                                  │
│  POST /api/research/execute (Traditional - Fallback)           │
│  POST /api/research/start (Traditional - Async)                 │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    BACKEND SERVICES                              │
├─────────────────────────────────────────────────────────────────┤
│  UnifiedResearchAnalyzer                                         │
│  ├── Intent Inference                                           │
│  ├── Query Generation                                           │
│  └── Parameter Optimization (Exa/Tavily)                        │
│                                                                  │
│  ResearchEngine                                                  │
│  ├── Provider Selection (Exa → Tavily → Google)               │
│  ├── ExaService                                                 │
│  ├── TavilyService                                              │
│  └── GoogleSearchService                                        │
│                                                                  │
│  IntentAwareAnalyzer                                            │
│  └── Intent-Based Result Analysis                               │
│                                                                  │
│  ResearchPersonaService                                         │
│  └── Persona Generation/Retrieval                              │
└─────────────────────────────────────────────────────────────────┘

🔄 Data Flow

Intent-Driven Research Flow

1. User Input
   │
   ▼
2. Frontend: useIntentResearch.analyzeIntent()
   │
   ▼
3. API: POST /api/research/intent/analyze
   │
   ▼
4. Backend: UnifiedResearchAnalyzer.analyze()
   ├── Fetches Research Persona (if enabled)
   ├── Fetches Competitor Data (if enabled)
   ├── Single LLM Call:
   │   ├── Intent Inference
   │   ├── Query Generation (4-8 queries)
   │   └── Parameter Optimization (Exa/Tavily)
   └── Returns: Intent + Queries + Optimized Config
   │
   ▼
5. Frontend: IntentConfirmationPanel
   ├── Displays inferred intent (editable)
   ├── Shows suggested queries (selectable)
   └── Shows AI-optimized settings with justifications
   │
   ▼
6. User Confirms Intent
   │
   ▼
7. Frontend: useIntentResearch.executeResearch()
   │
   ▼
8. API: POST /api/research/intent/research
   │
   ▼
9. Backend: ResearchEngine.research()
   ├── Executes queries via Exa/Tavily/Google
   └── Returns raw results
   │
   ▼
10. Backend: IntentAwareAnalyzer.analyze()
    ├── Analyzes raw results based on intent
    ├── Extracts specific deliverables:
    │   ├── Statistics
    │   ├── Expert Quotes
    │   ├── Case Studies
    │   ├── Trends
    │   ├── Comparisons
    │   └── More...
    └── Returns: IntentDrivenResearchResult
    │
    ▼
11. Frontend: IntentResultsDisplay
    ├── Summary Tab
    ├── Deliverables Tab
    ├── Sources Tab
    └── Analysis Tab

📁 Component Structure

Backend Structure

backend/services/research/
├── core/
│   ├── research_engine.py           # Main orchestrator
│   ├── research_context.py          # Unified input schema
│   └── parameter_optimizer.py     # DEPRECATED (use unified analyzer)
│
├── intent/
│   ├── unified_research_analyzer.py # ⭐ Unified AI analyzer (intent + queries + params)
│   ├── research_intent_inference.py # Legacy (use unified)
│   ├── intent_query_generator.py    # Legacy (use unified)
│   ├── intent_aware_analyzer.py     # Result analysis based on intent
│   └── intent_prompt_builder.py     # LLM prompt builders
│
├── research_persona_service.py      # Research persona generation/retrieval
├── research_persona_prompt_builder.py # Persona generation prompts
├── exa_service.py                   # Exa API integration
├── tavily_service.py                 # Tavily API integration
└── google_search_service.py          # Google/Gemini grounding

Frontend Structure

frontend/src/components/Research/
├── ResearchWizard.tsx                # Main wizard orchestrator
├── steps/
│   ├── ResearchInput.tsx             # Step 1: Input + Intent & Options
│   ├── StepProgress.tsx              # Step 2: Progress/polling
│   ├── StepResults.tsx               # Step 3: Results display
│   ├── components/
│   │   ├── ResearchInputHeader.tsx   # Header with Advanced toggle
│   │   ├── ResearchInputContainer.tsx # Main input with Intent & Options button
│   │   ├── IntentConfirmationPanel.tsx # Intent display/edit panel
│   │   ├── IntentResultsDisplay.tsx # Tabbed results (Summary, Deliverables, Sources, Analysis)
│   │   ├── AdvancedOptionsSection.tsx # Exa/Tavily options
│   │   ├── ProviderChips.tsx         # Provider availability display
│   │   └── ... (other components)
│   ├── hooks/
│   │   ├── useResearchConfig.ts      # Config + persona loading
│   │   ├── useKeywordExpansion.ts    # Keyword expansion with persona
│   │   └── useResearchAngles.ts       # Research angles generation
│   └── utils/
│       ├── placeholders.ts           # Personalized placeholders
│       ├── industryDefaults.ts       # Industry-specific defaults
│       └── ...
└── hooks/
    ├── useResearchWizard.ts          # Wizard state management
    ├── useResearchExecution.ts       # Research execution orchestration
    └── useIntentResearch.ts          # Intent research flow

🔑 Key Components

1. UnifiedResearchAnalyzer

Purpose: Single AI call for intent + queries + params

Location: backend/services/research/intent/unified_research_analyzer.py

Key Features:

Combines intent inference, query generation, and parameter optimization
Reduces LLM calls from 2-3 to 1 (50% reduction)
Provides justifications for all parameter decisions
Uses research persona for context

Input:

user_input: string
keywords: List[str]
research_persona: ResearchPersona (optional)
competitor_data: List[Dict] (optional)
industry: string (optional)
target_audience: string (optional)
user_id: string (required for subscription checks)

Output:

intent: ResearchIntent
queries: List[ResearchQuery] (4-8 queries)
exa_config: Dict with settings + justifications
tavily_config: Dict with settings + justifications
recommended_provider: str
provider_justification: str

2. IntentAwareAnalyzer

Purpose: Analyzes results based on user intent

Location: backend/services/research/intent/intent_aware_analyzer.py

Key Features:

Extracts specific deliverables based on intent
Structures results by deliverable type
Provides credibility scores for sources
Identifies gaps and follow-up queries

Input:

raw_results: Dict (from Exa/Tavily/Google)
intent: ResearchIntent
research_persona: ResearchPersona (optional)
user_id: string (required for subscription checks)

Output:

IntentDrivenResearchResult with:
- Statistics, quotes, case studies, trends
- Comparisons, best practices, step-by-step guides
- Pros/cons, definitions, examples, predictions
- Executive summary, key takeaways, suggested outline
- Sources with credibility scores

3. ResearchEngine

Purpose: Orchestrates provider calls

Location: backend/services/research/core/research_engine.py

Key Features:

Provider priority: Exa → Tavily → Google
Handles provider availability
Manages async research tasks
Integrates with research persona

Provider Selection:

Exa (Primary): Semantic understanding, academic papers, competitor research
Tavily (Secondary): Real-time news, trending topics, quick facts
Google (Fallback): Basic factual queries via Gemini grounding

4. ResearchPersonaService

Purpose: Generates and retrieves research persona

Location: backend/services/research/research_persona_service.py

Key Features:

Generates persona from onboarding data (core persona, website analysis, competitor analysis)
Caches persona (7-day TTL)
Provides persona defaults for UI pre-filling

Persona Sources:

Core persona (onboarding step 1)
Website analysis (onboarding step 2)
Competitor analysis (onboarding step 3)

🔌 API Endpoints

Intent-Driven Endpoints

POST /api/research/intent/analyze
- Analyzes user input to understand intent
- Generates queries and optimizes parameters
- Returns intent, queries, and optimized config
POST /api/research/intent/research
- Executes research based on confirmed intent
- Returns structured deliverables

Traditional Endpoints (Fallback)

POST /api/research/execute
- Synchronous research execution
- Returns traditional research results
POST /api/research/start
- Asynchronous research execution
- Returns task_id for polling
GET /api/research/status/{task_id}
- Polls async research status
- Returns progress and results

Configuration Endpoints

GET /api/research/config
- Returns provider availability + persona defaults
GET /api/research/providers/status
- Returns provider availability only
GET /api/research/persona-defaults
- Returns persona defaults only

🎯 Key Patterns

Pattern 1: Unified Analysis

Always use UnifiedResearchAnalyzer for new intent-driven research:

from services.research.intent.unified_research_analyzer import UnifiedResearchAnalyzer

analyzer = UnifiedResearchAnalyzer()
result = await analyzer.analyze(
    user_input=user_input,
    keywords=keywords,
    research_persona=research_persona,
    user_id=user_id,  # Required
)

Pattern 2: Intent-Aware Analysis

Always analyze results based on intent:

from services.research.intent.intent_aware_analyzer import IntentAwareAnalyzer

analyzer = IntentAwareAnalyzer()
result = await analyzer.analyze(
    raw_results=raw_results,
    intent=research_intent,
    research_persona=research_persona,
    user_id=user_id,  # Required
)

Pattern 3: Provider Selection

Priority order: Exa → Tavily → Google

if provider_availability.exa_available:
    provider = "exa"
elif provider_availability.tavily_available:
    provider = "tavily"
else:
    provider = "google"

Pattern 4: Persona Integration

Always check for research persona:

from services.research.research_persona_service import ResearchPersonaService

persona_service = ResearchPersonaService(db)
research_persona = persona_service.get_or_generate(user_id)

Pattern 5: Subscription Checks

Always pass user_id to LLM calls:

result = llm_text_gen(
    prompt=prompt,
    json_struct=schema,
    user_id=user_id  # Required for subscription checks
)

🔄 Research Modes

Intent-Driven Research (Current - Recommended)

Flow: Intent Analysis → Confirmation → Execution → Intent-Aware Analysis

Benefits:

Understands user goals before searching
Delivers exactly what users need
Structured deliverables
50% reduction in LLM calls

Use When: User wants specific deliverables (statistics, quotes, case studies, etc.)

Traditional Research (Fallback)

Flow: Direct Execution → Generic Analysis

Benefits:

Faster for simple queries
No intent analysis overhead

Use When: Simple factual queries or when intent analysis fails

📊 Data Models

ResearchIntent

class ResearchIntent:
    primary_question: str
    secondary_questions: List[str]
    purpose: ResearchPurpose  # learn, create_content, make_decision, etc.
    content_output: ContentOutput  # blog, podcast, video, etc.
    expected_deliverables: List[ExpectedDeliverable]
    depth: ResearchDepthLevel  # overview, detailed, expert
    focus_areas: List[str]
    perspective: Optional[str]
    time_sensitivity: str
    confidence: float
    confidence_reason: Optional[str]
    great_example: Optional[str]
    needs_clarification: bool
    clarifying_questions: List[str]

ResearchQuery

class ResearchQuery:
    query: str
    purpose: ExpectedDeliverable
    provider: str  # "exa" | "tavily"
    priority: int  # 1-5
    expected_results: str
    justification: Optional[str]

IntentDrivenResearchResult

class IntentDrivenResearchResult:
    primary_answer: str
    secondary_answers: Dict[str, str]
    statistics: List[StatisticWithCitation]
    expert_quotes: List[ExpertQuote]
    case_studies: List[CaseStudySummary]
    trends: List[TrendAnalysis]
    comparisons: List[ComparisonTable]
    best_practices: List[str]
    step_by_step: List[str]
    pros_cons: Optional[ProsCons]
    definitions: Dict[str, str]
    examples: List[str]
    predictions: List[str]
    executive_summary: str
    key_takeaways: List[str]
    suggested_outline: List[str]
    sources: List[SourceWithRelevance]
    confidence: float
    gaps_identified: List[str]
    follow_up_queries: List[str]

🎨 UI Components

ResearchWizard

Purpose: Main wizard orchestrator

Steps:

ResearchInput: Input + Intent & Options button
StepProgress: Progress/polling for async research
StepResults: Tabbed results display

IntentConfirmationPanel

Purpose: Shows inferred intent and allows editing

Features:

Displays inferred intent (editable)
Shows suggested queries (selectable)
Displays AI-optimized settings with justifications
Advanced options for manual override

IntentResultsDisplay

Purpose: Tabbed results display

Tabs:

Summary: AI-generated overview
Deliverables: Extracted statistics, quotes, case studies, etc.
Sources: Citations with credibility scores
Analysis: Deep insights based on intent

🔐 Security & Subscription

Authentication

All endpoints require JWT authentication via get_current_user dependency.

Subscription Checks

All LLM calls must pass user_id for subscription and pre-flight validation:

result = llm_text_gen(
    prompt=prompt,
    json_struct=schema,
    user_id=user_id  # Required
)

Rate Limiting

Subject to subscription tier limits
Provider APIs (Exa/Tavily/Google) have their own rate limits

📈 Performance

Intent Analysis

Typical Time: 2-5 seconds
LLM Calls: 1 (unified analyzer)
Caching: Research persona cached (7-day TTL)

Research Execution

Typical Time: 10-30 seconds
Depends On: Provider, query count, result count
Async Support: Yes (via /api/research/start)

Result Analysis

Typical Time: 5-10 seconds
LLM Calls: 1 (intent-aware analyzer)

🔗 Integration Points

Blog Writer Integration

Research Engine can be imported by Blog Writer:

from services.research.core.research_engine import ResearchEngine
from services.research.core.research_context import ResearchContext

context = ResearchContext(
    query=blog_topic,
    keywords=blog_keywords,
    goal=ResearchGoal.FACTUAL,
    depth=ResearchDepth.COMPREHENSIVE,
)

engine = ResearchEngine()
result = await engine.research(context, user_id=user_id)

Frontend Integration

Research Wizard can be reused in other tools:

import { ResearchWizard } from '@/components/Research/ResearchWizard';

<ResearchWizard
  onComplete={(results) => {
    // Use results in blog/video generation
  }}
  initialKeywords={blogTopic}
  initialIndustry={userIndustry}
/>

Architecture Rules: .cursor/rules/researcher-architecture.mdc (Authoritative)
Intent-Driven Guide: INTENT_DRIVEN_RESEARCH_GUIDE.md
API Reference: INTENT_RESEARCH_API_REFERENCE.md
Documentation Review: DOCUMENTATION_REVIEW_AND_UPDATE_PLAN.md

✅ Best Practices

Always use UnifiedResearchAnalyzer for new intent-driven research
Always pass user_id to all LLM calls
Always use IntentAwareAnalyzer for result analysis
Check provider availability before using providers
Provide justifications for all AI-driven settings
Allow user overrides in Advanced Options
Never fallback to "General" - always use persona defaults

Status: Authoritative Architecture Documentation - Single Source of Truth

21 KiB Raw Permalink Blame History

Current Research Engine Architecture Overview

📋 Overview

🏗️ High-Level Architecture

🔄 Data Flow

Intent-Driven Research Flow

📁 Component Structure

Backend Structure

Frontend Structure

🔑 Key Components

1. UnifiedResearchAnalyzer

2. IntentAwareAnalyzer

3. ResearchEngine

4. ResearchPersonaService

🔌 API Endpoints

Intent-Driven Endpoints

Traditional Endpoints (Fallback)

Configuration Endpoints

🎯 Key Patterns

Pattern 1: Unified Analysis

Pattern 2: Intent-Aware Analysis

Pattern 3: Provider Selection

Pattern 4: Persona Integration

Pattern 5: Subscription Checks

🔄 Research Modes

Intent-Driven Research (Current - Recommended)

Traditional Research (Fallback)

📊 Data Models

ResearchIntent

ResearchQuery

IntentDrivenResearchResult

🎨 UI Components

ResearchWizard

IntentConfirmationPanel

IntentResultsDisplay

🔐 Security & Subscription

Authentication

Subscription Checks

Rate Limiting

📈 Performance

Intent Analysis

Research Execution

Result Analysis

🔗 Integration Points

Blog Writer Integration

Frontend Integration

📚 Related Documentation

✅ Best Practices

21 KiB

Raw Permalink Blame History