kunthawat/ALwrity

Fork 0

Files

ajaysi 8193cdba67 AI Analysis and Content Strategy fixes. Enhanced Strategy Routes refactoring.

2026-01-10 19:32:50 +05:30

19 KiB

Raw Permalink Blame History

Research Engine Codebase Review & Understanding

Date: 2025-01-29
Status: Comprehensive Codebase Review Summary

📋 Executive Summary

The ALwrity Research Engine is a fully functional, production-ready intent-driven research system that has evolved from a traditional keyword-based search to an AI-powered research assistant. The system uses a unified analyzer approach to reduce LLM calls by 50% while providing hyper-personalized research experiences based on user onboarding data.

🏗️ Architecture Overview

Current Architecture (Intent-Driven)

User Input → UnifiedResearchAnalyzer (Single AI Call)
           ├── Intent Inference
           ├── Query Generation (4-8 queries)
           └── Parameter Optimization (Exa/Tavily)
           ↓
Research Execution (Exa → Tavily → Google)
           ↓
IntentAwareAnalyzer (Result Analysis)
           ↓
Structured Deliverables (Statistics, Quotes, Case Studies, etc.)

Key Architectural Principles

Unified Analysis: Single LLM call for intent + queries + params (50% reduction)
Intent-Driven: Understand user goals before searching
Hyper-Personalization: Leverage research persona from onboarding data
Provider Priority: Exa → Tavily → Google (semantic → real-time → fallback)
Subscription-Aware: All AI calls go through llm_text_gen with user_id

📁 Code Structure

Backend Structure

backend/services/research/
├── core/
│   ├── research_engine.py           # Main orchestrator (standalone)
│   ├── research_context.py          # Unified input schema
│   └── parameter_optimizer.py     # DEPRECATED (use unified analyzer)
│
├── intent/
│   ├── unified_research_analyzer.py # ⭐ Unified AI analyzer (intent + queries + params)
│   ├── intent_aware_analyzer.py     # Result analysis based on intent
│   ├── unified_prompt_builder.py   # LLM prompt builders
│   ├── unified_schema_builder.py   # JSON schema builders
│   ├── unified_result_parser.py    # Result parsing utilities
│   ├── query_deduplicator.py       # Query deduplication logic
│   ├── research_intent_inference.py # Legacy (use unified)
│   └── intent_query_generator.py   # Legacy (use unified)
│
├── trends/
│   ├── google_trends_service.py    # Google Trends integration
│   └── rate_limiter.py              # Rate limiting for Trends API
│
├── research_persona_service.py      # Research persona generation/retrieval
├── research_persona_prompt_builder.py # Persona generation prompts
├── exa_service.py                  # Exa API integration
├── tavily_service.py                # Tavily API integration
└── google_search_service.py         # Google/Gemini grounding

backend/api/research/
├── router.py                        # Main router
└── handlers/
    ├── providers.py                 # Provider status endpoints
    ├── research.py                  # Traditional research endpoints
    ├── intent.py                    # Intent-driven endpoints
    └── projects.py                  # My Projects endpoints

Frontend Structure

frontend/src/components/Research/
├── ResearchWizard.tsx               # Main wizard orchestrator (3 steps)
├── steps/
│   ├── ResearchInput.tsx            # Step 1: Input + Intent & Options
│   ├── StepProgress.tsx             # Step 2: Progress/polling
│   ├── StepResults.tsx              # Step 3: Results display
│   ├── components/
│   │   ├── ResearchInputHeader.tsx  # Header with Advanced toggle
│   │   ├── ResearchInputContainer.tsx # Main input with Intent & Options button
│   │   ├── IntentConfirmationPanel.tsx # Intent display/edit panel
│   │   ├── IntentResultsDisplay.tsx # Tabbed results (Summary, Deliverables, Sources, Analysis)
│   │   ├── AdvancedOptionsSection.tsx # Exa/Tavily options
│   │   ├── ProviderChips.tsx        # Provider availability display
│   │   ├── PersonalizationIndicator.tsx # UI indicator for personalization
│   │   ├── PersonalizationBadge.tsx # Badge-style indicator
│   │   └── ... (other components)
│   ├── hooks/
│   │   ├── useResearchConfig.ts     # Config + persona loading
│   │   ├── useKeywordExpansion.ts   # Keyword expansion with persona
│   │   └── useResearchAngles.ts     # Research angles generation
│   └── utils/
│       ├── placeholders.ts          # Personalized placeholders
│       └── industryDefaults.ts     # Industry-specific defaults
└── hooks/
    ├── useResearchWizard.ts        # Wizard state management
    ├── useResearchExecution.ts      # Research execution orchestration
    └── useIntentResearch.ts         # Intent research flow

🔑 Key Components

1. UnifiedResearchAnalyzer ⭐

Location: backend/services/research/intent/unified_research_analyzer.py

Purpose: Single AI call that performs:

Intent inference (what user wants)
Query generation (4-8 targeted queries)
Parameter optimization (Exa/Tavily settings with justifications)

Key Features:

Reduces LLM calls from 2-3 to 1 (50% reduction)
Provides justifications for all parameter decisions
Uses research persona for context
Returns structured ResearchIntent, ResearchQuery[], and OptimizedConfig

Usage Pattern:

from services.research.intent.unified_research_analyzer import UnifiedResearchAnalyzer

analyzer = UnifiedResearchAnalyzer()
result = await analyzer.analyze(
    user_input=user_input,
    keywords=keywords,
    research_persona=research_persona,
    competitor_data=competitor_data,
    industry=industry,
    target_audience=target_audience,
    user_id=user_id,  # Required for subscription checks
)

2. IntentAwareAnalyzer

Location: backend/services/research/intent/intent_aware_analyzer.py

Purpose: Analyzes raw research results based on user intent to extract specific deliverables

Key Features:

Extracts statistics, quotes, case studies, trends, comparisons
Structures results by deliverable type
Provides credibility scores for sources
Identifies gaps and follow-up queries

Usage Pattern:

from services.research.intent.intent_aware_analyzer import IntentAwareAnalyzer

analyzer = IntentAwareAnalyzer()
result = await analyzer.analyze(
    raw_results=exa_tavily_results,
    intent=research_intent,
    research_persona=research_persona,
    user_id=user_id,  # Required for subscription checks
)

3. ResearchEngine

Location: backend/services/research/core/research_engine.py

Purpose: Orchestrates provider calls with priority order

Provider Priority:

Exa (Primary): Semantic understanding, academic papers, competitor research
Tavily (Secondary): Real-time news, trending topics, quick facts
Google (Fallback): Basic factual queries via Gemini grounding

4. ResearchPersonaService

Location: backend/services/research/research_persona_service.py

Purpose: Generates and retrieves research persona from onboarding data

Persona Sources:

Core persona (onboarding step 1)
Website analysis (onboarding step 2): writing_style, content_characteristics, content_type, style_patterns, crawl_result
Competitor analysis (onboarding step 3)

Features:

Caches persona (7-day TTL)
Provides persona defaults for UI pre-filling
Generates personalized presets, keywords, and research angles

🔌 API Endpoints

Intent-Driven Endpoints (Current - Recommended)

POST /api/research/intent/analyze
- Analyzes user input to understand intent
- Generates queries and optimizes parameters
- Returns intent, queries, and optimized config
- Performance: 2-5 seconds (single LLM call)
POST /api/research/intent/research
- Executes research based on confirmed intent
- Returns structured deliverables
- Performance: 10-30 seconds (depends on provider and query count)

Traditional Endpoints (Fallback)

POST /api/research/execute - Synchronous research execution
POST /api/research/start - Asynchronous research execution
GET /api/research/status/{task_id} - Poll async research status

Configuration Endpoints

GET /api/research/config - Provider availability + persona defaults
GET /api/research/providers/status - Provider availability only
GET /api/research/persona-defaults - Persona defaults only

🔄 Research Flow

Intent-Driven Research Flow (Current)

1. User Input
   User enters: "AI marketing tools"
   ↓

2. Intent Analysis (UnifiedResearchAnalyzer)
   POST /api/research/intent/analyze
   ├── Fetches Research Persona (if enabled)
   ├── Fetches Competitor Data (if enabled)
   └── Single LLM Call:
       ├── Intent Inference
       ├── Query Generation (4-8 queries)
       └── Parameter Optimization (Exa/Tavily)
   ↓

3. Intent Confirmation (Frontend)
   IntentConfirmationPanel displays:
   ├── Inferred intent (editable)
   ├── Suggested queries (selectable)
   └── AI-optimized settings with justifications
   ↓

4. Research Execution
   POST /api/research/intent/research
   ├── ResearchEngine executes queries (Exa → Tavily → Google)
   └── Returns raw results
   ↓

5. Intent-Aware Analysis
   IntentAwareAnalyzer analyzes results:
   ├── Extracts statistics, quotes, case studies
   ├── Structures by deliverable type
   └── Returns IntentDrivenResearchResult
   ↓

6. Results Display
   IntentResultsDisplay shows:
   ├── Summary Tab
   ├── Deliverables Tab
   ├── Sources Tab
   └── Analysis Tab

🎯 Key Features Implemented

✅ Completed Features

Intent-Driven Research Architecture
- UnifiedResearchAnalyzer (single AI call)
- IntentAwareAnalyzer (result analysis)
- 3-Step Wizard (ResearchInput → StepProgress → StepResults)
- IntentConfirmationPanel (review/edit intent)
Google Trends Integration
- Phase 1: Core Google Trends service
- Phase 2: Hybrid approach (automatic + on-demand)
- Phase 3: Enhanced UI with charts, export functionality
- Integrated into intent-driven research flow
Research Persona System
- Persona generation from onboarding data
- Persona defaults for UI pre-filling
- Caching (7-day TTL)
- UI indicators showing personalization
My Projects Feature
- Auto-save research projects upon completion
- Asset Library integration
- Restore functionality with full state persistence
UI/UX Enhancements
- QueryEditor redesign
- Google Trends keywords with chip-based UI
- Industry-specific placeholders
- Time-sensitive query handling
- Personalization indicators

📊 Data Models

ResearchIntent

class ResearchIntent:
    primary_question: str
    secondary_questions: List[str]
    purpose: ResearchPurpose  # learn, create_content, make_decision, etc.
    content_output: ContentOutput  # blog, podcast, video, etc.
    expected_deliverables: List[ExpectedDeliverable]
    depth: ResearchDepthLevel  # overview, detailed, expert
    focus_areas: List[str]
    perspective: Optional[str]
    time_sensitivity: str
    confidence: float
    confidence_reason: Optional[str]
    great_example: Optional[str]
    needs_clarification: bool
    clarifying_questions: List[str]

ResearchQuery

class ResearchQuery:
    query: str
    purpose: ExpectedDeliverable
    provider: str  # "exa" | "tavily"
    priority: int  # 1-5
    expected_results: str
    justification: Optional[str]

IntentDrivenResearchResult

class IntentDrivenResearchResult:
    primary_answer: str
    secondary_answers: Dict[str, str]
    statistics: List[StatisticWithCitation]
    expert_quotes: List[ExpertQuote]
    case_studies: List[CaseStudySummary]
    trends: List[TrendAnalysis]
    comparisons: List[ComparisonTable]
    best_practices: List[str]
    step_by_step: List[str]
    pros_cons: Optional[ProsCons]
    definitions: Dict[str, str]
    examples: List[str]
    predictions: List[str]
    executive_summary: str
    key_takeaways: List[str]
    suggested_outline: List[str]
    sources: List[SourceWithRelevance]
    confidence: float
    gaps_identified: List[str]
    follow_up_queries: List[str]

🎨 UI Components

ResearchWizard

Purpose: Main wizard orchestrator

Steps:

ResearchInput: Input + Intent & Options button
StepProgress: Progress/polling for async research
StepResults: Tabbed results display

IntentConfirmationPanel

Purpose: Shows inferred intent and allows editing

Features:

Displays inferred intent (editable)
Shows suggested queries (selectable)
Displays AI-optimized settings with justifications
Advanced options for manual override

IntentResultsDisplay

Purpose: Tabbed results display

Tabs:

Summary: AI-generated overview
Deliverables: Extracted statistics, quotes, case studies, etc.
Sources: Citations with credibility scores
Analysis: Deep insights based on intent

🔐 Security & Subscription

Authentication

All endpoints require JWT authentication via get_current_user dependency.

Subscription Checks

All LLM calls must pass user_id for subscription and pre-flight validation:

result = llm_text_gen(
    prompt=prompt,
    json_struct=schema,
    user_id=user_id  # Required
)

Rate Limiting

Subject to subscription tier limits
Provider APIs (Exa/Tavily/Google) have their own rate limits

📈 Performance

Intent Analysis

Typical Time: 2-5 seconds
LLM Calls: 1 (unified analyzer)
Caching: Research persona cached (7-day TTL)

Research Execution

Typical Time: 10-30 seconds
Depends On: Provider, query count, result count
Async Support: Yes (via /api/research/start)

Result Analysis

Typical Time: 5-10 seconds
LLM Calls: 1 (intent-aware analyzer)

🔗 Integration Points

Blog Writer Integration

Research Engine can be imported by Blog Writer:

from services.research.core.research_engine import ResearchEngine
from services.research.core.research_context import ResearchContext

context = ResearchContext(
    query=blog_topic,
    keywords=blog_keywords,
    goal=ResearchGoal.FACTUAL,
    depth=ResearchDepth.COMPREHENSIVE,
)

engine = ResearchEngine()
result = await engine.research(context, user_id=user_id)

Frontend Integration

Research Wizard can be reused in other tools:

import { ResearchWizard } from '@/components/Research/ResearchWizard';

<ResearchWizard
  onComplete={(results) => {
    // Use results in blog/video generation
  }}
  initialKeywords={blogTopic}
  initialIndustry={userIndustry}
/>

✅ Best Practices

Always use UnifiedResearchAnalyzer for new intent-driven research
Always pass user_id to all LLM calls
Always use IntentAwareAnalyzer for result analysis
Check provider availability before using providers
Provide justifications for all AI-driven settings
Allow user overrides in Advanced Options
Never fallback to "General" - always use persona defaults

🚫 Common Pitfalls to Avoid

❌ Rule-Based Parameter Optimization: Always use AI-driven optimization via UnifiedResearchAnalyzer
❌ Missing user_id: Always pass user_id to llm_text_gen for subscription checks
❌ Breaking Changes: Never modify Research Engine in a way that breaks existing tools (Blog Writer, etc.)
❌ Hardcoded Defaults: Always use persona defaults, never hardcode "General" values
❌ Multiple LLM Calls: Use unified analyzer instead of separate intent + query + params calls
❌ Ignoring Provider Availability: Always check provider availability before using
❌ Missing Justifications: Every AI-driven setting must have a justification for UI display

📋 Pending Items & TODOs

From Code Review

File Upload Logic (ResearchInput.tsx:396)
- TODO: Implement file upload logic for research input
- Status: Not started (low priority)

Documentation Gaps

Intent-Driven Research Documentation
- ✅ Comprehensive guide created (INTENT_DRIVEN_RESEARCH_GUIDE.md)
- ✅ API reference created (INTENT_RESEARCH_API_REFERENCE.md)
- ✅ Architecture overview created (CURRENT_ARCHITECTURE_OVERVIEW.md)
Outdated Documentation
- ⚠️ Some docs still reference old 4-step wizard
- ⚠️ Need to update implementation guides
- See DOCUMENTATION_REVIEW_AND_UPDATE_PLAN.md for details

🎯 Suggested Next Steps

Priority 1: Documentation Updates (High Value, Low Effort)

Update outdated implementation documentation
Create integration examples
Update component documentation

Priority 2: Dashboard Alert System Integration (Medium Value, Medium Effort)

Research cost alerts
Research efficiency alerts
Integration with billing dashboard alerts

Priority 3: Feature Enhancements (Variable Value, Variable Effort)

File upload for research input
Research templates
Research comparison
Advanced export options

Priority 4: Performance & Optimization (Low Value, High Effort)

Research result caching
Batch research operations

Current & Accurate

✅ CURRENT_ARCHITECTURE_OVERVIEW.md - Single source of truth
✅ INTENT_DRIVEN_RESEARCH_GUIDE.md - Comprehensive guide
✅ INTENT_RESEARCH_API_REFERENCE.md - Complete API docs
✅ .cursor/rules/researcher-architecture.mdc - Authoritative rules
✅ PHASE2_IMPLEMENTATION_SUMMARY.md - Persona enhancements
✅ PHASE3_AND_UI_INDICATORS_IMPLEMENTATION.md - Phase 3 features
✅ RESEARCH_PERSONA_DATA_SOURCES.md - Persona data sources

Outdated (Historical Reference Only)

⚠️ RESEARCH_WIZARD_IMPLEMENTATION.md - Describes old 4-step wizard
⚠️ RESEARCH_COMPONENT_INTEGRATION.md - Mentions old architecture
⚠️ PHASE1_IMPLEMENTATION_REVIEW.md - Missing intent-driven research
⚠️ RESEARCH_IMPROVEMENTS_SUMMARY.md - Missing intent-driven research
⚠️ COMPLETE_IMPLEMENTATION_SUMMARY.md - Missing intent-driven research

✅ Conclusion

The Research Engine is fully functional and production-ready. The system has evolved from a traditional keyword-based search to an AI-powered intent-driven research assistant with:

50% reduction in LLM calls (unified analyzer)
Hyper-personalization based on onboarding data
Structured deliverables (statistics, quotes, case studies, etc.)
Provider optimization (Exa → Tavily → Google)
UI indicators showing personalization
My Projects integration with Asset Library

Main Gaps:

Documentation updates (some outdated docs)
Alert system integration (cost/efficiency alerts)
Feature enhancements (file upload, templates, etc.)

Recommended Focus: Start with documentation updates (high value, low effort) followed by alert system integration (improves user experience and cost transparency).

Status: Codebase Review Complete - System is Production-Ready 🚀

19 KiB Raw Permalink Blame History