Files
ALwrity/docs/ALwrity Researcher/GOOGLE_TRENDS_INTEGRATION_ANALYSIS.md

18 KiB

Google Trends Integration Analysis

Date: 2025-01-29
Status: Analysis Complete - Ready for Implementation


📋 Executive Summary

After reviewing the legacy Google Trends implementation and the current Research Engine codebase:

  • No Google Trends migration found in the new codebase
  • ⚠️ Legacy implementation has significant issues (not production-ready)
  • Pytrends offers comprehensive capabilities that align with user needs
  • 🎯 Integration points identified in the current researcher flow

🔍 Legacy Implementation Review

Current Legacy Code Issues

File: ToBeMigrated/ai_web_researcher/google_trends_researcher.py

Problems Identified:

  1. Visualization Issues:

    • Uses matplotlib.pyplot.show() - not suitable for web/API
    • No way to return chart data for frontend rendering
    • Hardcoded visualization that blocks execution
  2. Error Handling:

    • Basic try/except blocks
    • Returns empty DataFrames on error (silent failures)
    • No retry logic for rate limiting
  3. Rate Limiting:

    • Random sleeps (time.sleep(random.uniform(0.1, 0.6)))
    • No proper rate limiting strategy
    • Risk of getting blocked by Google
  4. Code Quality:

    • Mixed concerns (keyword clustering + trends in same file)
    • Hardcoded timeframes ('today 1-y', 'today 12-m')
    • No configuration management
    • FIXME comments indicating incomplete features
  5. Data Structure:

    • Returns pandas DataFrames directly
    • Not serializable for API responses
    • No standardized response format
  6. Missing Features:

    • No caching strategy
    • No async support
    • No integration with subscription system
    • No user_id tracking

What Works (Can Reuse):

Core pytrends usage patterns:

  • TrendReq() initialization
  • build_payload() method
  • interest_over_time() method
  • interest_by_region() method
  • related_topics() method
  • related_queries() method
  • trending_searches() method

Keyword expansion logic:

  • Google auto-suggestions fetching
  • Prefix/suffix expansion
  • Relevance scoring

Keyword clustering approach:

  • TF-IDF vectorization
  • K-means clustering
  • Silhouette scoring

📚 Pytrends Capabilities Review

Available Methods (from pytrends library):

  1. interest_over_time()

    • Historical indexed data
    • Shows when keyword was most searched
    • Returns time series data
  2. multirange_interest_over_time()

    • Similar to interest_over_time
    • Allows analysis across multiple date ranges
    • Better for comparing different time periods
  3. historical_hourly_interest()

    • Historical hourly data
    • Sends multiple requests (one week at a time)
    • More granular than daily data
  4. interest_by_region()

    • Geographic interest data
    • Shows where keyword is most searched
    • Returns data by country/region
  5. related_topics()

    • Related topics to keyword
    • Returns 'top' and 'rising' topics
    • Useful for content expansion
  6. related_queries()

    • Related search queries
    • Returns 'top' and 'rising' queries
    • Great for keyword research
  7. trending_searches()

    • Latest trending searches
    • Country-specific
    • Real-time trending topics
  8. top_charts()

    • Top charts for a given topic
    • Yearly charts
    • Category-specific
  9. suggestions()

    • Additional suggested keywords
    • Refines trend search
    • Auto-complete suggestions

Key Parameters:

  • timeframe: 'today 1-y', 'today 12-m', 'all', custom dates
  • geo: Country code (e.g., 'US', 'GB', 'IN')
  • hl: Language (e.g., 'en-US')
  • tz: Timezone offset (e.g., 360 for UTC-6)

🔍 Migration Status Check

Search Results:

No Google Trends implementation found in:

  • backend/services/research/ - No trends service
  • backend/api/research/ - No trends endpoints
  • Current codebase only mentions "trends" as a deliverable type, not actual Google Trends API

The codebase has:

  • ExpectedDeliverable.TRENDS enum value
  • TrendAnalysis model in research_intent_models.py
  • Intent-aware analyzer that can extract trends from research results
  • But NO actual Google Trends API integration

Conclusion: Google Trends has NOT been migrated to the new codebase. The current "trends" feature only extracts trend information from general research results, not from Google Trends API.


Current Researcher Flow:

Step 1: ResearchInput
  ├── User enters keywords/topic
  ├── Clicks "Intent & Options" button
  └── Intent analysis performed

Step 2: IntentConfirmationPanel
  ├── Shows inferred intent (editable)
  ├── Shows suggested queries
  ├── Shows AI-optimized settings
  └── User confirms and clicks "Research"

Step 3: Research Execution
  └── Research runs via Exa/Tavily/Google

Step 4: StepResults (IntentResultsDisplay)
  ├── Summary tab
  ├── Statistics tab
  ├── Expert Quotes tab
  ├── Case Studies tab
  ├── Trends tab (currently shows AI-extracted trends)
  └── Sources tab

When: During research execution, if intent includes trends

Flow:

  1. User enters keywords → Intent analysis
  2. If intent includes EXPLORE_TRENDS purpose OR TRENDS deliverable:
    • Automatically fetch Google Trends data in parallel
    • Merge with research results
  3. Display in "Trends" tab with Google Trends data

Pros:

  • Seamless user experience
  • No extra clicks
  • Trends data always available when relevant

Cons:

  • Additional API call (but can be cached)
  • Slightly longer execution time

Implementation:

  • Add to IntentAwareAnalyzer.analyze() method
  • Call Google Trends service if trends in expected_deliverables
  • Merge Google Trends data with AI-extracted trends

Option 2: On-Demand Button (Alternative)

When: After intent analysis, show "Analyze Trends" button

Flow:

  1. User enters keywords → Intent analysis
  2. IntentConfirmationPanel shows "Analyze Trends" button
  3. User clicks → Fetches Google Trends data
  4. Shows trends preview in panel
  5. User proceeds with research

Pros:

  • User control
  • Faster initial intent analysis
  • Can preview trends before research

Cons:

  • Extra user action
  • Trends not integrated with research results

Implementation:

  • Add button to IntentConfirmationPanel
  • Create endpoint: POST /api/research/trends/analyze
  • Show trends preview in panel

When: Always available as separate action

Flow:

  1. User enters keywords
  2. "Trends" button always visible
  3. Click → Opens trends analysis
  4. Separate from main research flow

Pros:

  • Clear separation
  • Can use independently
  • Simple UX

Cons:

  • Not integrated with research
  • Extra navigation
  • Less discoverable

Primary: Automatic Integration

For intent-driven research:

  • If purpose == EXPLORE_TRENDS OR TRENDS in expected_deliverables:
    • Automatically fetch Google Trends data
    • Include in research results
    • Display in "Trends" tab

Secondary: On-Demand Button

For all research:

  • Show "Analyze Trends" button in IntentConfirmationPanel
  • User can click to get trends even if not in intent
  • Preview trends before research execution

User Experience:

┌─────────────────────────────────────────────────────────┐
│  ResearchInput                                          │
│  ┌───────────────────────────────────────────────────┐ │
│  │ Keywords: "AI marketing tools"                   │ │
│  │ [Intent & Options]                                │ │
│  └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│  IntentConfirmationPanel                                │
│  ┌───────────────────────────────────────────────────┐ │
│  │ Intent: make_decision                            │ │
│  │ Deliverables: [comparisons, trends, statistics]  │ │
│  │                                                   │ │
│  │ [Analyze Trends] ← Always available              │ │
│  │ [Research] ← Will auto-include trends            │ │
│  └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│  Research Execution                                      │
│  ├── Exa/Tavily/Google search                           │
│  └── Google Trends (if trends in deliverables) ← AUTO  │
└─────────────────────────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│  IntentResultsDisplay                                    │
│  ┌───────────────────────────────────────────────────┐ │
│  │ [Summary] [Statistics] [Quotes] [Trends] [Sources]│ │
│  │                                                   │ │
│  │ Trends Tab:                                      │ │
│  │ ├── Interest Over Time (Chart)                   │ │
│  │ ├── Interest by Region (Map/Table)               │ │
│  │ ├── Related Topics (Top & Rising)                │ │
│  │ ├── Related Queries (Top & Rising)               │ │
│  │ └── AI-Extracted Trends (from research)          │ │
│  └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

🏗️ Implementation Plan

Phase 1: Core Service (Week 1)

Create: backend/services/research/trends/google_trends_service.py

Features:

  • Interest over time
  • Interest by region
  • Related topics
  • Related queries
  • Proper error handling
  • Rate limiting
  • Caching (24-hour TTL)
  • Async support

Phase 2: Integration (Week 1-2)

Enhance: IntentAwareAnalyzer

Changes:

  • Check if trends in expected_deliverables
  • Call Google Trends service
  • Merge with AI-extracted trends
  • Return enhanced trends data

Phase 3: API Endpoint (Week 2)

Create: POST /api/research/trends/analyze

Purpose: On-demand trends analysis

Request:

{
  "keywords": ["AI marketing tools"],
  "timeframe": "today 12-m",
  "geo": "US"
}

Response:

{
  "interest_over_time": [...],
  "interest_by_region": [...],
  "related_topics": {
    "top": [...],
    "rising": [...]
  },
  "related_queries": {
    "top": [...],
    "rising": [...]
  }
}

Phase 4: Frontend Integration (Week 2-3)

Enhance: IntentConfirmationPanel

  • Add "Analyze Trends" button
  • Show trends preview

Enhance: IntentResultsDisplay

  • Enhance "Trends" tab with Google Trends data
  • Add charts (interest over time)
  • Add regional map/table
  • Show related topics/queries

📊 Data Structure Design

class GoogleTrendsData(BaseModel):
    """Structured Google Trends data."""
    interest_over_time: List[Dict[str, Any]]  # Time series data
    interest_by_region: List[Dict[str, Any]]  # Geographic data
    related_topics: Dict[str, List[Dict[str, Any]]]  # {top: [...], rising: [...]}
    related_queries: Dict[str, List[Dict[str, Any]]]  # {top: [...], rising: [...]}
    trending_searches: Optional[List[str]] = None
    timeframe: str
    geo: str
    keywords: List[str]

Enhanced TrendAnalysis Model

class TrendAnalysis(BaseModel):
    """Enhanced trend analysis with Google Trends data."""
    trend: str
    direction: str
    evidence: List[str]
    impact: Optional[str]
    timeline: Optional[str]
    sources: List[str]
    
    # Google Trends specific
    google_trends_data: Optional[GoogleTrendsData] = None
    interest_score: Optional[float] = None  # 0-100 from Google Trends
    regional_interest: Optional[Dict[str, float]] = None
    related_topics: Optional[List[str]] = None
    related_queries: Optional[List[str]] = None

🔧 Technical Considerations

Rate Limiting

Pytrends Limitations:

  • Google Trends API is rate-limited
  • Recommended: 1 request per second
  • Pytrends handles some rate limiting internally

Our Strategy:

  • Cache all trends data (24-hour TTL)
  • Use async requests with delays
  • Batch multiple keywords in single request when possible
  • Implement retry logic with exponential backoff

Caching Strategy

# Cache key: f"google_trends:{keyword}:{timeframe}:{geo}"
# TTL: 24 hours (trends don't change frequently)
# Store: Interest over time, related topics/queries

Error Handling

  • Handle Google blocking (429 errors)
  • Handle invalid keywords
  • Handle missing data
  • Graceful degradation (return partial data if available)

Async Support

  • Use asyncio for non-blocking requests
  • Parallel requests for multiple keywords
  • Timeout handling (30 seconds max)

📈 User Value

For Content Creators:

  1. Timing Optimization:

    • See interest over time to time publication
    • Identify peak interest periods
    • Avoid publishing during low-interest periods
  2. Regional Targeting:

    • See which regions have highest interest
    • Tailor content for specific markets
    • Discover new audience opportunities
  3. Content Expansion:

    • Related topics → new article ideas
    • Related queries → FAQ sections
    • Rising topics → timely content opportunities

For Digital Marketers:

  1. Campaign Planning:

    • Trending searches → campaign topics
    • Interest by region → geo-targeting
    • Related queries → ad keywords
  2. SEO Strategy:

    • Related queries → long-tail keywords
    • Rising topics → content opportunities
    • Interest trends → content calendar

For Solopreneurs:

  1. Market Research:
    • Interest trends → market validation
    • Regional data → market expansion
    • Related topics → competitive landscape

Success Criteria

  • Google Trends service created and tested
  • Automatic integration working (when trends in intent)
  • On-demand button working in IntentConfirmationPanel
  • Trends tab enhanced with Google Trends data
  • Charts displaying correctly (interest over time)
  • Regional data displaying correctly
  • Caching working (24-hour TTL)
  • Rate limiting preventing blocks
  • Error handling graceful
  • User satisfaction with trends feature

🚀 Quick Start Implementation

Step 1: Create Service (2-3 days)

# backend/services/research/trends/google_trends_service.py
class GoogleTrendsService:
    async def get_interest_over_time(keywords, timeframe, geo)
    async def get_interest_by_region(keywords, geo)
    async def get_related_topics(keywords, timeframe)
    async def get_related_queries(keywords, timeframe)
    async def get_trending_searches(country)

Step 2: Integrate with IntentAwareAnalyzer (1-2 days)

  • Check for trends in deliverables
  • Call Google Trends service
  • Merge with AI-extracted trends

Step 3: Add API Endpoint (1 day)

  • POST /api/research/trends/analyze
  • Return structured trends data

Step 4: Frontend Integration (2-3 days)

  • Add "Analyze Trends" button
  • Enhance Trends tab
  • Add charts/visualizations

Total Estimate: 6-9 days for full implementation


📝 Next Steps

  1. Approve Approach: Confirm hybrid approach (automatic + on-demand)
  2. Set Up Dependencies: Add pytrends>=4.9.2 to requirements.txt
  3. Create Service: Start with google_trends_service.py
  4. Test Integration: Test with sample keywords
  5. Frontend Integration: Add UI components

Status: Analysis Complete - Ready for Implementation

Recommended Action: Start with Phase 1 (Core Service) - create google_trends_service.py with proper error handling, caching, and async support.