Files
ALwrity/docs/ALwrity Researcher/RESEARCH_PERSONA_DATA_SOURCES.md
ajaysi b134e9dc7e Added video studio router and endpoints. Added research router and endpoints. Added youtube router and endpoints. Added onboarding utils router and endpoints. Added onboarding utils service. Added onboarding utils models. Added onboarding utils routes. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils. Added onboarding utils utils.
2026-01-01 17:56:25 +05:30

12 KiB

Research Persona Data Sources & Generated Fields

Overview

The Research Persona is an AI-generated profile that provides hyper-personalized research defaults, suggestions, and configurations based on a user's onboarding data. This document details what data is used to generate the persona and what fields are produced.


Data Sources Used for Generation

1. Website Analysis (website_analysis)

Source: Onboarding Step 2 - Website Analysis
Location: WebsiteAnalysis table in database
Key Fields Used:

  • website_url: User's website URL
  • writing_style: Tone, voice, complexity, engagement level
  • content_characteristics: Sentence structure, vocabulary, paragraph organization
  • target_audience: Demographics, expertise level, industry focus
  • content_type: Primary type, secondary types, purpose
  • recommended_settings: Writing tone, target audience, content type
  • style_patterns: Writing patterns analysis
  • style_guidelines: Generated guidelines

Usage: Extracts industry focus, target audience, content preferences, and writing style patterns to inform research defaults.

2. Core Persona (core_persona)

Source: Onboarding Step 4 - Persona Generation
Location: PersonaData.core_persona JSON field
Key Fields Used:

  • industry: User's primary industry
  • target_audience: Detailed audience description
  • interests: User's content interests and focus areas
  • pain_points: Challenges and needs
  • content_goals: What the user wants to achieve with content

Usage: Primary source for industry, audience, and content strategy insights.

3. Research Preferences (research_preferences)

Source: Onboarding Step 3 - Research Preferences
Location: ResearchPreferences table
Key Fields Used:

  • research_depth: "standard", "comprehensive", "basic"
  • content_types: Array of content types (e.g., ["blog", "social", "video"])
  • auto_research: Whether to auto-enable research
  • factual_content: Preference for factual vs. opinion-based content
  • writing_style: Inherited from website analysis
  • content_characteristics: Inherited from website analysis
  • target_audience: Inherited from website analysis

Usage: Determines default research mode, provider preferences, and content type focus.

4. Business Information (business_info)

Source: Constructed from persona data and website analysis
Key Fields Used:

  • industry: Extracted from core_persona.industry or website_analysis.target_audience.industry_focus
  • target_audience: Extracted from core_persona.target_audience or website_analysis.target_audience.demographics

Usage: Fallback and inference source when core persona data is minimal.

5. Competitor Analysis (Future Enhancement)

Source: Onboarding Step 3 - Competitor Discovery
Location: CompetitorAnalysis table
Status: Currently not used in persona generation, but available for future enhancements

Potential Usage: Could inform industry context, competitive landscape insights, and domain suggestions.


Generated Research Persona Fields

1. Smart Defaults

Field Type Description Source Priority
default_industry string User's primary industry 1. core_persona.industry
2. business_info.industry
3. website_analysis.target_audience.industry_focus
4. Inferred from content_types
default_target_audience string Detailed audience description 1. core_persona.target_audience
2. website_analysis.target_audience
3. business_info.target_audience
4. Default: "Professionals and content consumers"
default_research_mode string "basic" | "comprehensive" | "targeted" Based on research_preferences.research_depth and content_type preferences
default_provider string "exa" | "tavily" | "google" Based on user's typical research needs:
- Academic/research: "exa"
- News/current events: "tavily"
- General business: "exa"
- Default: "exa"

2. Keyword Intelligence

Field Type Description Generation Logic
suggested_keywords string[] 8-12 relevant keywords Generated from:
- User's industry
- Core persona interests
- Content goals
- Research preferences
keyword_expansion_patterns Dict<string, string[]> Mapping of keywords to expanded terms 10-15 patterns like:
{"AI": ["healthcare AI", "medical AI"], "tools": ["medical devices"]}
Focuses on industry-specific terminology

3. Exa Provider Optimization

Field Type Description Generation Logic
suggested_exa_domains string[] 4-6 authoritative domains Industry-specific authoritative sources:
- Healthcare: ["pubmed.gov", "nejm.org"]
- Finance: ["sec.gov", "bloomberg.com"]
- Tech: ["github.com", "stackoverflow.com"]
suggested_exa_category string? Exa content category Based on industry:
- Healthcare/Science: "research paper"
- Finance: "financial report"
- Tech/Business: "company" or "news"
- Social/Marketing: "tweet" or "linkedin profile"
- Default: null (all categories)
suggested_exa_search_type string? Exa search algorithm Based on content needs:
- Academic/research: "neural"
- Current news/trends: "fast"
- General research: "auto"
- Code/technical: "neural"

4. Tavily Provider Optimization

Field Type Description Generation Logic
suggested_tavily_topic string? "general" | "news" | "finance" Based on content type:
- Financial content: "finance"
- News/current events: "news"
- General research: "general"
suggested_tavily_search_depth string? "basic" | "advanced" | "fast" | "ultra-fast" Based on research needs:
- Quick overview: "basic"
- In-depth analysis: "advanced"
- Breaking news: "fast"
suggested_tavily_include_answer string? "false" | "basic" | "advanced" Based on query type:
- Factual queries: "advanced"
- Research summaries: "basic"
- Custom content: "false"
suggested_tavily_time_range string? "day" | "week" | "month" | "year" | null Based on recency needs:
- Breaking news: "day"
- Recent developments: "week"
- Industry analysis: "month"
- Historical: null
suggested_tavily_raw_content_format string? "false" | "markdown" | "text" Based on use case:
- Blog content: "markdown"
- Text extraction: "text"
- No raw content: "false"

5. Provider Selection Logic

Field Type Description Generation Logic
provider_recommendations Dict<string, string> Use case → provider mapping Example:
{"trends": "tavily", "deep_research": "exa", "factual": "google", "news": "tavily", "academic": "exa"}

6. Research Intelligence

Field Type Description Generation Logic
research_angles string[] 5-8 alternative research angles Generated from:
- User's pain points
- Industry trends
- Content goals
- Audience interests
Examples: "Compare {topic} tools", "{topic} ROI analysis"
query_enhancement_rules Dict<string, string> Templates for improving vague queries 5-8 enhancement patterns:
{"vague_ai": "Research: AI applications in {industry} for {audience}", "vague_tools": "Compare top {industry} tools"}

7. Research Presets

Field Type Description Generation Logic
recommended_presets ResearchPreset[] 3-5 personalized preset templates Each preset includes:
- name: Descriptive name
- keywords: Research query
- industry: User's industry
- target_audience: User's audience
- research_mode: "basic" | "comprehensive" | "targeted"
- config: Complete ResearchConfig object
- description: Brief explanation

8. Research Preferences (Structured)

Field Type Description Source
research_preferences Dict<string, any> Structured research preferences Extracted from onboarding:
- research_depth: From research_preferences.research_depth
- content_types: From research_preferences.content_types
- auto_research: From research_preferences.auto_research
- factual_content: From research_preferences.factual_content

9. Metadata

Field Type Description
generated_at string? ISO timestamp of generation
confidence_score float? Confidence score 0-1 (higher = richer data)
version string? Schema version (e.g., "1.0")

Data Collection Process

Step 1: Collect Onboarding Data

onboarding_data = {
    "website_analysis": get_website_analysis(user_id),
    "persona_data": get_persona_data(user_id),
    "research_preferences": get_research_preferences(user_id),
    "business_info": construct_business_info(persona_data, website_analysis)
}

Step 2: Build AI Prompt

The prompt includes:

  • All onboarding data (JSON formatted)
  • Detailed instructions for each field
  • Examples and use cases
  • Rules for handling minimal data scenarios

Step 3: LLM Generation

  • Uses structured JSON response format
  • Validates against ResearchPersona Pydantic model
  • Adds metadata (generated_at, confidence_score)

Step 4: Save to Database

  • Stored in PersonaData.research_persona JSON field
  • Cached with 7-day TTL
  • Timestamp stored in PersonaData.research_persona_generated_at

Handling Minimal Data Scenarios

When onboarding data is incomplete, the AI uses intelligent inference:

  1. Industry Inference:

    • From content_types: "blog" → "Content Marketing", "video" → "Video Content Creation"
    • From website_analysis.content_characteristics: Patterns suggest industry
    • Default: "Technology" or "Business Consulting"
  2. Target Audience Inference:

    • From writing_style: Complexity level suggests audience
    • From content_goals: Purpose suggests audience
    • Default: "Professionals and content consumers"
  3. Provider Defaults:

    • Always defaults to "exa" for content creators
    • Uses "tavily" only for news/current events focus
  4. Never Uses "General":

    • The prompt explicitly instructs to never use "General"
    • Always infers specific categories based on available context

Frontend Display

Currently Displayed Fields:

Default Settings (industry, audience, mode, provider)
Suggested Keywords
Research Angles
Recommended Presets
Metadata (generated_at, confidence_score, version)

Recently Added Fields (Enhanced Display):

Keyword Expansion Patterns
Exa Provider Settings (domains, category, search_type)
Tavily Provider Settings (topic, depth, answer, time_range, format)
Provider Recommendations
Query Enhancement Rules
Research Preferences (structured)


Future Enhancements

  1. Competitor Analysis Integration: Use competitor data to inform industry context and domain suggestions
  2. Research History: Learn from past research queries to improve suggestions
  3. A/B Testing: Test different persona generation strategies
  4. User Feedback Loop: Allow users to rate and improve persona suggestions
  5. Multi-Industry Support: Handle users with multiple industries/niches

API Endpoints

  • GET /api/research/persona-defaults: Get persona defaults (cached only)
  • GET /api/research/research-persona: Get or generate research persona
  • POST /api/research/research-persona?force_refresh=true: Force regenerate persona

  • Backend: backend/services/research/research_persona_service.py
  • Prompt Builder: backend/services/research/research_persona_prompt_builder.py
  • Models: backend/models/research_persona_models.py
  • API: backend/api/research_config.py
  • Frontend: frontend/src/pages/ResearchTest.tsx (Persona Details Modal)