Files
ALwrity/docs/ALwrity Researcher/INTENT_DRIVEN_RESEARCH_GUIDE.md

637 lines
17 KiB
Markdown

# Intent-Driven Research Guide
**Date**: 2025-01-29
**Status**: Current Architecture Documentation
---
## 📋 Overview
Intent-driven research is the core innovation of the ALwrity Research Engine. Instead of generic keyword-based searches, the system **understands what users want to accomplish** before executing research, then delivers exactly what they need.
### Key Innovation
**Traditional Research**:
```
User Input → Search → Generic Results → User filters/analyzes
```
**Intent-Driven Research**:
```
User Input → AI Understands Intent → Targeted Queries → Intent-Aware Analysis → Structured Deliverables
```
---
## 🎯 Core Concepts
### 1. **Intent Inference**
Before searching, the AI analyzes user input to understand:
- **What question** needs answering
- **What purpose** (learn, create content, make decision, etc.)
- **What deliverables** are expected (statistics, quotes, case studies, etc.)
- **What depth** is needed (overview, detailed, expert)
### 2. **Unified Analysis**
A single AI call performs:
- Intent inference
- Query generation (4-8 targeted queries)
- Provider parameter optimization (Exa/Tavily settings with justifications)
### 3. **Intent-Aware Result Analysis**
Results are analyzed through the lens of user intent, extracting:
- Specific deliverables (statistics, quotes, case studies)
- Structured answers to user's questions
- Relevant sources with credibility scores
- Actionable insights
---
## 🔄 Research Flow
### Step 1: Intent Analysis
**User Action**: Enters keywords/topic and clicks "Intent & Options"
**What Happens**:
1. Frontend calls `/api/research/intent/analyze`
2. `UnifiedResearchAnalyzer` performs single AI call:
- Infers research intent
- Generates 4-8 targeted queries
- Optimizes Exa/Tavily parameters with justifications
- Recommends best provider
3. Returns `ResearchIntent`, `ResearchQuery[]`, and `OptimizedConfig`
**User Sees**:
- Inferred intent (editable)
- Suggested queries (selectable)
- AI-optimized provider settings with justifications
- Recommended provider
### Step 2: Intent Confirmation
**User Action**: Reviews and optionally edits intent, then confirms
**What Happens**:
- User can edit:
- Primary question
- Purpose
- Expected deliverables
- Depth level
- Content output type
- User selects which queries to execute
- User can override AI-optimized settings in Advanced Options
### Step 3: Research Execution
**User Action**: Clicks "Research" button
**What Happens**:
1. Frontend calls `/api/research/intent/research`
2. Backend executes selected queries via Exa/Tavily/Google
3. `IntentAwareAnalyzer` analyzes raw results based on intent
4. Extracts specific deliverables:
- Statistics with citations
- Expert quotes
- Case studies
- Trends
- Comparisons
- Best practices
- Step-by-step guides
- Pros/cons
- Definitions
- Examples
- Predictions
### Step 4: Results Display
**User Sees**: Tabbed results organized by deliverable type:
- **Summary**: AI-generated overview
- **Deliverables**: Extracted statistics, quotes, case studies, etc.
- **Sources**: Citations with credibility scores
- **Analysis**: Deep insights based on intent
---
## 🏗️ Architecture Components
### Backend Components
#### 1. UnifiedResearchAnalyzer
**Location**: `backend/services/research/intent/unified_research_analyzer.py`
**Purpose**: Single AI call for intent + queries + params
**Key Method**:
```python
async def analyze(
user_input: str,
keywords: Optional[List[str]] = None,
research_persona: Optional[ResearchPersona] = None,
competitor_data: Optional[List[Dict]] = None,
industry: Optional[str] = None,
target_audience: Optional[str] = None,
user_id: Optional[str] = None,
) -> Dict[str, Any]
```
**Returns**:
- `intent`: ResearchIntent object
- `queries`: List[ResearchQuery] (4-8 queries)
- `exa_config`: Dict with settings + justifications
- `tavily_config`: Dict with settings + justifications
- `recommended_provider`: str ("exa" | "tavily" | "google")
- `provider_justification`: str
**Benefits**:
- 50% reduction in LLM calls (from 2-3 calls to 1)
- Coherent reasoning across intent, queries, and params
- User-friendly justifications for all settings
#### 2. IntentAwareAnalyzer
**Location**: `backend/services/research/intent/intent_aware_analyzer.py`
**Purpose**: Analyzes raw results based on user intent
**Key Method**:
```python
async def analyze(
raw_results: Dict[str, Any],
intent: ResearchIntent,
research_persona: Optional[ResearchPersona] = None,
user_id: Optional[str] = None,
) -> IntentDrivenResearchResult
```
**Returns**: `IntentDrivenResearchResult` with:
- `primary_answer`: str
- `secondary_answers`: Dict[str, str]
- `statistics`: List[StatisticWithCitation]
- `expert_quotes`: List[ExpertQuote]
- `case_studies`: List[CaseStudySummary]
- `trends`: List[TrendAnalysis]
- `comparisons`: List[ComparisonTable]
- `best_practices`: List[str]
- `step_by_step`: List[str]
- `pros_cons`: ProsCons
- `definitions`: Dict[str, str]
- `examples`: List[str]
- `predictions`: List[str]
- `executive_summary`: str
- `key_takeaways`: List[str]
- `suggested_outline`: List[str]
- `sources`: List[SourceWithRelevance]
- `confidence`: float
- `gaps_identified`: List[str]
- `follow_up_queries`: List[str]
#### 3. Research Engine
**Location**: `backend/services/research/core/research_engine.py`
**Purpose**: Orchestrates provider calls (Exa → Tavily → Google)
**Provider Priority**:
1. **Exa** (Primary) - Semantic understanding, academic papers, competitor research
2. **Tavily** (Secondary) - Real-time news, trending topics, quick facts
3. **Google** (Fallback) - Basic factual queries via Gemini grounding
### Frontend Components
#### 1. ResearchWizard
**Location**: `frontend/src/components/Research/ResearchWizard.tsx`
**Purpose**: Main wizard orchestrator (3 steps)
**Steps**:
1. `ResearchInput` - Input + Intent & Options button
2. `StepProgress` - Progress/polling
3. `StepResults` - Results display
#### 2. ResearchInput
**Location**: `frontend/src/components/Research/steps/ResearchInput.tsx`
**Features**:
- Keyword/topic input
- "Intent & Options" button (enabled after 2+ words)
- Industry and target audience selection
- Advanced options toggle
#### 3. IntentConfirmationPanel
**Location**: `frontend/src/components/Research/steps/components/IntentConfirmationPanel.tsx`
**Purpose**: Shows inferred intent and allows editing
**Features**:
- Displays inferred intent (editable)
- Shows suggested queries (selectable)
- Displays AI-optimized provider settings with justifications
- Advanced options for manual override
- "Research" button to execute
#### 4. IntentResultsDisplay
**Location**: `frontend/src/components/Research/steps/components/IntentResultsDisplay.tsx`
**Purpose**: Tabbed results display
**Tabs**:
- **Summary**: AI-generated overview
- **Deliverables**: Extracted statistics, quotes, case studies, etc.
- **Sources**: Citations with credibility scores
- **Analysis**: Deep insights based on intent
#### 5. AdvancedOptionsSection
**Location**: `frontend/src/components/Research/steps/components/AdvancedOptionsSection.tsx`
**Purpose**: Shows AI-optimized Exa/Tavily settings with justifications
**Features**:
- Exa options (type, category, domains, date filters, etc.)
- Tavily options (topic, search depth, time range, etc.)
- Each setting shows AI justification in tooltip
- User can override any setting
### Frontend Hooks
#### 1. useIntentResearch
**Location**: `frontend/src/components/Research/hooks/useIntentResearch.ts`
**Purpose**: Manages intent-driven research flow
**Key Methods**:
- `analyzeIntent(userInput: string)` - Analyzes user input
- `confirmIntent(intent: ResearchIntent)` - Confirms/modifies intent
- `executeResearch(selectedQueries?: ResearchQuery[])` - Executes research
- `reset()` - Resets state
**State**:
- `userInput`: string
- `intent`: ResearchIntent | null
- `suggestedQueries`: ResearchQuery[]
- `selectedQueries`: ResearchQuery[]
- `isAnalyzing`: boolean
- `isResearching`: boolean
- `result`: IntentDrivenResearchResponse | null
#### 2. useResearchExecution
**Location**: `frontend/src/components/Research/hooks/useResearchExecution.ts`
**Purpose**: Handles research execution and polling
**Key Methods**:
- `executeIntentResearch(state, queries)` - Executes intent-driven research
- `executeTraditionalResearch(state)` - Executes traditional research (fallback)
- `pollStatus(taskId)` - Polls async research status
---
## 📡 API Endpoints
### 1. POST `/api/research/intent/analyze`
**Purpose**: Analyze user input to understand research intent
**Request**:
```typescript
{
user_input: string;
keywords?: string[];
use_persona?: boolean; // Default: true
use_competitor_data?: boolean; // Default: true
}
```
**Response**:
```typescript
{
success: boolean;
intent: ResearchIntent;
analysis_summary: string;
suggested_queries: ResearchQuery[];
suggested_keywords: string[];
suggested_angles: string[];
confidence_reason?: string;
great_example?: string;
optimized_config: {
provider: string;
provider_justification: string;
exa_type: string;
exa_type_justification: string;
exa_category?: string;
exa_category_justification?: string;
// ... more Exa settings with justifications
tavily_topic: string;
tavily_topic_justification: string;
tavily_search_depth: string;
tavily_search_depth_justification: string;
// ... more Tavily settings with justifications
};
recommended_provider: string;
error_message?: string;
}
```
**What It Does**:
1. Fetches research persona (if `use_persona: true`)
2. Fetches competitor data (if `use_competitor_data: true`)
3. Calls `UnifiedResearchAnalyzer.analyze()`
4. Returns intent, queries, and optimized config with justifications
### 2. POST `/api/research/intent/research`
**Purpose**: Execute research based on confirmed intent
**Request**:
```typescript
{
user_input: string;
confirmed_intent?: ResearchIntent; // If not provided, infers from user_input
selected_queries?: ResearchQuery[]; // If not provided, generates from intent
max_sources?: number; // Default: 10
include_domains?: string[];
exclude_domains?: string[];
skip_inference?: boolean; // Skip intent inference if intent provided
}
```
**Response**:
```typescript
{
success: boolean;
primary_answer: string;
secondary_answers: Dict<string, string>;
statistics: StatisticWithCitation[];
expert_quotes: ExpertQuote[];
case_studies: CaseStudySummary[];
trends: TrendAnalysis[];
comparisons: ComparisonTable[];
best_practices: string[];
step_by_step: string[];
pros_cons?: ProsCons;
definitions: Dict<string, string>;
examples: string[];
predictions: string[];
executive_summary: string;
key_takeaways: string[];
suggested_outline: string[];
sources: SourceWithRelevance[];
confidence: number;
gaps_identified: string[];
follow_up_queries: string[];
intent?: ResearchIntent;
error_message?: string;
}
```
**What It Does**:
1. Uses confirmed intent (or infers if not provided)
2. Uses selected queries (or generates if not provided)
3. Executes research via `ResearchEngine`
4. Analyzes results via `IntentAwareAnalyzer`
5. Returns structured deliverables
---
## 🎨 User Experience Flow
### Example: User wants to research "AI marketing tools"
#### Step 1: User Input
```
User enters: "AI marketing tools"
Clicks: "Intent & Options" button
```
#### Step 2: Intent Analysis
```
AI infers:
- Primary Question: "What are the best AI marketing tools available?"
- Purpose: "make_decision"
- Expected Deliverables: ["key_statistics", "case_studies", "comparisons", "best_practices"]
- Depth: "detailed"
- Content Output: "blog"
AI generates queries:
1. "best AI marketing tools 2024 comparison" (priority: 5)
2. "AI marketing tools statistics adoption rates" (priority: 4)
3. "AI marketing tools case studies ROI" (priority: 4)
4. "AI marketing automation platforms features" (priority: 3)
AI optimizes settings:
- Provider: Exa (semantic understanding needed)
- Exa Type: "neural" (for semantic matching)
- Exa Category: "company" (tool providers)
- Justification: "Neural search best for finding similar tools and comparisons"
```
#### Step 3: User Confirmation
```
User sees:
- Inferred intent (can edit)
- 4 suggested queries (can select/deselect)
- AI-optimized settings with justifications (can override)
User confirms and clicks "Research"
```
#### Step 4: Research Execution
```
Backend:
1. Executes 4 queries via Exa
2. Gets raw results (sources, content)
3. IntentAwareAnalyzer extracts:
- Statistics: "78% of marketers use AI tools"
- Case studies: "Company X increased ROI by 40%"
- Comparisons: Tool comparison table
- Best practices: "5 best practices for AI marketing"
```
#### Step 5: Results Display
```
User sees tabbed results:
- Summary: Overview of AI marketing tools landscape
- Deliverables: Statistics, quotes, case studies, comparisons
- Sources: Citations with credibility scores
- Analysis: Deep insights and recommendations
```
---
## 🔑 Key Patterns
### Pattern 1: Always Use UnifiedResearchAnalyzer
**✅ Correct**:
```python
from services.research.intent.unified_research_analyzer import UnifiedResearchAnalyzer
analyzer = UnifiedResearchAnalyzer()
result = await analyzer.analyze(
user_input=user_input,
keywords=keywords,
research_persona=research_persona,
user_id=user_id,
)
```
**❌ Incorrect** (Legacy - Don't Use):
```python
# Don't use separate intent inference + query generation
intent_service = ResearchIntentInference()
query_generator = IntentQueryGenerator()
# ... multiple LLM calls
```
### Pattern 2: Always Pass user_id
**✅ Correct**:
```python
result = llm_text_gen(
prompt=prompt,
json_struct=schema,
user_id=user_id # Required for subscription checks
)
```
**❌ Incorrect**:
```python
result = llm_text_gen(prompt=prompt, json_struct=schema) # Missing user_id
```
### Pattern 3: Intent-Aware Result Analysis
**✅ Correct**:
```python
from services.research.intent.intent_aware_analyzer import IntentAwareAnalyzer
analyzer = IntentAwareAnalyzer()
result = await analyzer.analyze(
raw_results=raw_results,
intent=research_intent,
research_persona=research_persona,
user_id=user_id,
)
```
**❌ Incorrect** (Generic Analysis):
```python
# Don't do generic analysis - always use intent
summary = analyze_generic(raw_results) # Wrong approach
```
---
## 🎯 Benefits
### 1. **50% Reduction in LLM Calls**
- Old: 2-3 separate calls (intent + queries + params)
- New: 1 unified call
### 2. **Better Results**
- Intent-aware analysis extracts exactly what users need
- Structured deliverables instead of generic summaries
### 3. **User-Friendly**
- AI justifications explain why settings were chosen
- Users can understand and override AI decisions
### 4. **Coherent Reasoning**
- Single AI call ensures intent, queries, and params are aligned
- No inconsistencies between intent and search strategy
---
## 🚀 Integration Examples
### Frontend: Using useIntentResearch Hook
```typescript
import { useIntentResearch } from '../hooks/useIntentResearch';
const MyComponent = () => {
const {
state,
analyzeIntent,
confirmIntent,
executeResearch,
isAnalyzing,
isResearching,
result,
} = useIntentResearch({
usePersona: true,
useCompetitorData: true,
maxSources: 10,
});
const handleAnalyze = async () => {
await analyzeIntent("AI marketing tools");
};
const handleResearch = async () => {
await executeResearch(state.selectedQueries);
};
return (
<div>
<button onClick={handleAnalyze} disabled={isAnalyzing}>
{isAnalyzing ? 'Analyzing...' : 'Intent & Options'}
</button>
{state.intent && (
<IntentConfirmationPanel
intentAnalysis={state.intent}
onConfirm={confirmIntent}
onExecute={handleResearch}
/>
)}
{result && <IntentResultsDisplay result={result} />}
</div>
);
};
```
### Backend: Using UnifiedResearchAnalyzer
```python
from services.research.intent.unified_research_analyzer import UnifiedResearchAnalyzer
async def analyze_user_request(user_input: str, user_id: str):
analyzer = UnifiedResearchAnalyzer()
result = await analyzer.analyze(
user_input=user_input,
keywords=extract_keywords(user_input),
research_persona=get_research_persona(user_id),
user_id=user_id,
)
return {
"intent": result["intent"],
"queries": result["queries"],
"exa_config": result["exa_config"],
"tavily_config": result["tavily_config"],
"recommended_provider": result["recommended_provider"],
}
```
---
## 📚 Related Documentation
- **Architecture Rules**: `.cursor/rules/researcher-architecture.mdc` (Authoritative source)
- **API Reference**: `INTENT_RESEARCH_API_REFERENCE.md`
- **Architecture Overview**: `CURRENT_ARCHITECTURE_OVERVIEW.md`
---
## ✅ Best Practices
1. **Always use UnifiedResearchAnalyzer** for new intent-driven research
2. **Always pass user_id** to all LLM calls for subscription checks
3. **Always use IntentAwareAnalyzer** for result analysis
4. **Provide justifications** for all AI-driven settings
5. **Allow user overrides** in Advanced Options
6. **Check provider availability** before suggesting/using providers
---
**Status**: Current Architecture - Use this as reference for intent-driven research implementation.