kunthawat/ALwrity

Fork 0

Files

ajaysi 8193cdba67 AI Analysis and Content Strategy fixes. Enhanced Strategy Routes refactoring.

2026-01-10 19:32:50 +05:30

14 KiB

Raw Permalink Blame History

Exa & Tavily Options Inference Guide

Date: 2025-01-29
Status: Current Implementation Review

Overview

When a user clicks "Intent & Options" button, the system uses AI to infer optimal Exa and Tavily API settings based on the user's research intent. This document explains how these options are generated.

Flow: Intent & Options Button Click

User clicks "Intent & Options"
  ↓
Frontend: intentResearchApi.analyzeIntent()
  ↓
Backend: /api/research/intent/analyze
  ↓
UnifiedResearchAnalyzer.analyze()
  ↓
Single LLM Call with unified_prompt_builder.py
  ↓
LLM Returns:
  - ResearchIntent (with purpose, depth, focus_areas, also_answering, etc.)
  - ResearchQueries (4-8 diverse queries)
  - exa_config (optimized Exa settings with justifications)
  - tavily_config (optimized Tavily settings with justifications)
  - recommended_provider
  ↓
Backend maps to optimized_config
  ↓
Frontend receives AnalyzeIntentResponse with optimized_config
  ↓
Frontend applies optimized_config to ResearchConfig
  ↓
User sees optimized Exa/Tavily options in AdvancedProviderOptionsSection

How Options Are Inferred

1. Time Sensitivity Rules

Based on: intent.time_sensitivity field

Time Sensitivity	Exa Settings	Tavily Settings
real_time	`startPublishedDate = current year`, `type = "auto" or "fast"`	`time_range = "day" or "week"`, `topic = "news"`
recent	`startPublishedDate = current year or last 6 months`	`time_range = "month" or "week"`
historical	No date filters, `type = "deep" or "neural"`	`time_range = "year" or null`, `topic = "general"`
evergreen	No date filters, `type = "deep"`	`time_range = null`, `topic = "general"`

Example:

User input: "Latest AI trends in 2025"
Time sensitivity inferred: real_time
Exa: startPublishedDate = "2025-01-01", type = "fast"
Tavily: time_range = "week", topic = "news"

2. Content Type Based on Focus Areas

Based on: intent.focus_areas field

Focus Area Keywords	Exa Category	Exa Type	Tavily Topic
"academic", "research", "studies"	`"research paper"`	`"deep" or "neural"`	`"general"`
	`includeDomains = ["arxiv.org", "nature.com", "pubmed.ncbi.nlm.nih.gov"]`
"companies", "competitors", "business"	`"company"`	`"auto" or "deep"`	`"general"`
"news", "trends", "current events"	`"news"` (if using Exa)	`"auto"`	`"news"`
			`search_depth = "advanced"`
"social", "twitter", "social media"	`"tweet"`	`"auto"`	`"general"`
"github", "code", "technical"	`"github"`	`"auto" or "deep"`	`"general"`

Example:

User input: "AI research papers on transformer architectures"
Focus areas inferred: ["academic", "research"]
Exa: category = "research paper", type = "deep", includeDomains = ["arxiv.org", "nature.com"]
Tavily: topic = "general"

3. Depth-Based Settings

Based on: intent.depth field (overview, detailed, expert)

Depth Level	Exa Settings	Tavily Settings
expert	`type = "deep"`, `context = true`, `contextMaxCharacters = 15000+`, `numResults = 20-50`	`search_depth = "advanced"`, `chunks_per_source = 3`, `max_results = 15-20`
detailed	`type = "auto" or "deep"`, `context = true`, `contextMaxCharacters = 10000+`, `numResults = 10-20`	`search_depth = "advanced" or "basic"`, `chunks_per_source = 3`, `max_results = 10-15`
overview	`type = "auto" or "fast"`, `numResults = 5-10`	`search_depth = "basic" or "fast"`, `max_results = 5-10`

Example:

User input: "Comprehensive analysis of quantum computing"
Depth inferred: expert
Exa: type = "deep", context = true, contextMaxCharacters = 15000, numResults = 30
Tavily: search_depth = "advanced", chunks_per_source = 3, max_results = 15

4. Query-Specific Settings

Based on: Primary query characteristics

Query Type	Exa Settings	Tavily Settings
Comprehensive (addresses multiple secondary questions/focus areas)	`type = "deep"`, `context = true`, `contextMaxCharacters = 15000+`	`search_depth = "advanced"`, `chunks_per_source = 3`
Simple factual	`type = "fast"`, `numResults = 5-10`	`search_depth = "ultra-fast"`, `max_results = 5`
Time-sensitive	Apply time filters based on urgency	Apply time_range based on urgency
Content-specific	Match category to content type	Match topic to content type

Example:

Primary query: "What are the best practices for React performance optimization?"
Query type: Comprehensive (needs detailed analysis)
Exa: type = "deep", context = true, contextMaxCharacters = 12000
Tavily: search_depth = "advanced", chunks_per_source = 3

5. Also Answering Topics Considerations

Based on: intent.also_answering field

Rules:

If also_answering topics need different time ranges:
- Use broader time_range in Tavily (e.g., "year" instead of "month")
- Don't apply strict date filters in Exa
If also_answering topics need different sources:
- Consider including additional domains in includeDomains
- Use more comprehensive search (type = "deep" in Exa)

Example:

Primary: "Latest AI trends"
Also answering: ["Historical AI development", "Future predictions"]
Exa: No strict date filters, type = "deep" for comprehensive coverage
Tavily: time_range = "year" to cover historical and recent

6. Provider Selection Logic

Based on: Combined analysis of all intent fields

Use EXA when:

Primary query needs semantic understanding
Focus areas include "academic", "research", "companies"
Depth = "expert" or "detailed"
Need comprehensive context (context = true)
Query targets specific content types (research papers, companies, GitHub)

Use TAVILY when:

Time sensitivity = "real_time" or "recent"
Focus areas include "news", "trends", "current events"
Need quick AI-generated answers
Primary query is about recent developments
Query needs real-time information

Example:

User input: "Latest news about AI regulation"
Provider selected: Tavily (real-time news focus)
Tavily: topic = "news", search_depth = "advanced", time_range = "week"

Exa Config Options Generated

The AI generates these Exa options with justifications:

Core Options

type: "auto" | "fast" | "deep" | "neural" | "keyword"
- Justification references: query complexity, depth, time sensitivity
category: "company" | "research paper" | "news" | "linkedin profile" | "github" | "tweet" | "personal site" | "pdf" | "financial report"
- Justification references: focus_areas, content type needed
numResults: 1-100
- Justification references: depth, query complexity, secondary questions count
includeDomains: Array of domain strings
- Justification references: focus_areas, content type requirements
startPublishedDate: Date string (YYYY-MM-DD)
- Justification references: time_sensitivity, query time requirements

Content Options

highlights: true | false
- Justification: Whether snippets are needed for quick scanning
context: true | false (required for type = "deep")
- Justification: Whether full context needed for RAG/AI processing
contextMaxCharacters: Number (if context = true)
- Justification: Depth requirements, query complexity

Advanced Options (if applicable)

additionalQueries: Array of query strings (only for type = "deep")
- Justification: Query variations needed for comprehensive coverage
livecrawl: "never" | "fallback" | "preferred" | "always"
- Justification: Freshness requirements based on time_sensitivity

Tavily Config Options Generated

The AI generates these Tavily options with justifications:

Core Options

topic: "general" | "news" | "finance"
- Justification references: focus_areas, content type
search_depth: "basic" | "advanced" | "fast" | "ultra-fast"
- Justification references: depth, query complexity, speed requirements
include_answer: true | false | "basic" | "advanced"
- Justification: Whether AI-generated answer is needed
time_range: "day" | "week" | "month" | "year" | null
- Justification references: time_sensitivity, query time requirements
max_results: 0-20
- Justification references: depth, query complexity

Advanced Options

chunks_per_source: 1-3 (only for search_depth = "advanced")
- Justification: Depth requirements, comprehensive coverage needs
include_raw_content: true | false | "markdown" | "text"
- Justification: Whether full content needed for analysis
country: Country code (only for topic = "general")
- Justification: Geographic relevance based on target_audience

Example: Complete Inference Flow

User Input

Keywords: "AI marketing tools for small businesses"
Purpose: create_content (user-selected)
Content Output: blog_post (user-selected)
Depth: detailed (user-selected)

AI Inference

Intent:
  - primary_question: "What are the best AI marketing tools for small businesses?"
  - secondary_questions: ["What are the pricing models?", "What features do they offer?"]
  - focus_areas: ["tools", "small business", "marketing automation"]
  - also_answering: ["How to choose the right tool", "Implementation best practices"]
  - time_sensitivity: "recent"
  - depth: "detailed"

Recommended Provider: EXA (needs comprehensive analysis, not just news)

Exa Config:
  - type: "auto"
    justification: "Balanced speed and quality for comprehensive tool research"
  - category: null (general search)
    justification: "Tools can be found across multiple content types"
  - numResults: 15
    justification: "Detailed depth requires more sources to cover tools, pricing, and features"
  - includeDomains: []
    justification: "No specific domain restrictions needed"
  - startPublishedDate: "2024-01-01"
    justification: "Recent time sensitivity requires current year data"
  - highlights: true
    justification: "Snippets help quickly identify relevant tools"
  - context: true
    justification: "Detailed depth requires full context for comprehensive analysis"
  - contextMaxCharacters: 10000
    justification: "Detailed depth needs substantial context per source"

Tavily Config:
  - topic: "general"
    justification: "General topic covers tools and business content"
  - search_depth: "advanced"
    justification: "Detailed depth requires comprehensive search"
  - include_answer: true
    justification: "AI-generated answers provide quick insights"
  - time_range: "year"
    justification: "Recent time sensitivity with also_answering topics needing broader coverage"
  - max_results: 12
    justification: "Detailed depth requires multiple sources"
  - chunks_per_source: 3
    justification: "Detailed depth needs comprehensive content per source"

Key Files

Backend

backend/services/research/intent/unified_prompt_builder.py
- Contains all optimization rules (lines 155-275)
- Defines how intent fields map to Exa/Tavily settings
backend/services/research/intent/unified_schema_builder.py
- Defines JSON schema for exa_config and tavily_config (lines 67-124)
- Specifies all available options and their types
backend/services/research/intent/unified_result_parser.py
- Extracts exa_config and tavily_config from LLM response (lines 205-206)
backend/api/research/handlers/intent.py
- Maps exa_config/tavily_config to optimized_config (lines 124-155)
- Returns optimized_config in AnalyzeIntentResponse

Frontend

frontend/src/components/Research/types/intent.types.ts
- Defines OptimizedConfig interface (lines 224-280)
- Includes all Exa/Tavily options with justifications
frontend/src/components/Research/steps/components/IntentConfirmationPanel/AdvancedProviderOptionsSection.tsx
- Displays optimized Exa/Tavily options
- Shows AI justifications for each option
frontend/src/components/Research/steps/ResearchInput.tsx
- Applies optimized_config to ResearchConfig (lines 464-512)

Current Implementation Status

✅ Fully Implemented

Time sensitivity → Exa/Tavily date filters
Focus areas → Exa category / Tavily topic
Depth → Exa type / Tavily search_depth
Query characteristics → Provider selection
Also answering → Broader time ranges

⚠️ Partially Implemented

Some Exa options are inferred but not all are exposed in UI
Some Tavily options are inferred but not all are exposed in UI
Advanced options (livecrawl, additionalQueries) are in schema but rarely used

📋 Options Available in Schema (May Not All Be Used)

Exa Options:

✅ type, category, numResults, includeDomains, startPublishedDate, highlights, context
⚠️ excludeDomains, contextMaxCharacters, additionalQueries, livecrawl

Tavily Options:

✅ topic, search_depth, include_answer, time_range, max_results, chunks_per_source
⚠️ start_date, end_date, include_raw_content, country, include_images, include_image_descriptions, include_favicon, auto_parameters

References

docs/ALwrity Researcher/EXA_INTEGRATION_ENHANCEMENTS.md - Exa search types and latency
docs/ALwrity Researcher/EXA_API_OPTIONS_AUDIT.md - Complete Exa API options comparison
docs/ALwrity Researcher/EXA_TAVILY_OPTIONS_DISPLAY_REVIEW.md - UI display review
docs/ALwrity Researcher/INTENT_DRIVEN_RESEARCH_IMPLEMENTATION_STATUS.md - Implementation status

Status: Current implementation infers Exa and Tavily options based on comprehensive intent analysis with detailed justifications.

14 KiB Raw Permalink Blame History

Exa & Tavily Options Inference Guide

Overview

Flow: Intent & Options Button Click

How Options Are Inferred

1. Time Sensitivity Rules

2. Content Type Based on Focus Areas

3. Depth-Based Settings

4. Query-Specific Settings

5. Also Answering Topics Considerations

6. Provider Selection Logic

Exa Config Options Generated

Core Options

Content Options

Advanced Options (if applicable)

Tavily Config Options Generated

Core Options

Advanced Options

Example: Complete Inference Flow

User Input

AI Inference

Key Files

Backend

Frontend

Current Implementation Status

✅ Fully Implemented

⚠️ Partially Implemented

📋 Options Available in Schema (May Not All Be Used)

References

14 KiB

Raw Permalink Blame History