Files
ALwrity/docs/ALwrity Researcher/GOOGLE_TRENDS_INTEGRATION_ANALYSIS.md

579 lines
18 KiB
Markdown

# Google Trends Integration Analysis
**Date**: 2025-01-29
**Status**: Analysis Complete - Ready for Implementation
---
## 📋 Executive Summary
After reviewing the legacy Google Trends implementation and the current Research Engine codebase:
-**No Google Trends migration found** in the new codebase
- ⚠️ **Legacy implementation has significant issues** (not production-ready)
-**Pytrends offers comprehensive capabilities** that align with user needs
- 🎯 **Integration points identified** in the current researcher flow
---
## 🔍 Legacy Implementation Review
### Current Legacy Code Issues
**File**: `ToBeMigrated/ai_web_researcher/google_trends_researcher.py`
#### Problems Identified:
1. **Visualization Issues**:
- Uses `matplotlib.pyplot.show()` - not suitable for web/API
- No way to return chart data for frontend rendering
- Hardcoded visualization that blocks execution
2. **Error Handling**:
- Basic try/except blocks
- Returns empty DataFrames on error (silent failures)
- No retry logic for rate limiting
3. **Rate Limiting**:
- Random sleeps (`time.sleep(random.uniform(0.1, 0.6))`)
- No proper rate limiting strategy
- Risk of getting blocked by Google
4. **Code Quality**:
- Mixed concerns (keyword clustering + trends in same file)
- Hardcoded timeframes (`'today 1-y'`, `'today 12-m'`)
- No configuration management
- FIXME comments indicating incomplete features
5. **Data Structure**:
- Returns pandas DataFrames directly
- Not serializable for API responses
- No standardized response format
6. **Missing Features**:
- No caching strategy
- No async support
- No integration with subscription system
- No user_id tracking
#### What Works (Can Reuse):
**Core pytrends usage patterns**:
- `TrendReq()` initialization
- `build_payload()` method
- `interest_over_time()` method
- `interest_by_region()` method
- `related_topics()` method
- `related_queries()` method
- `trending_searches()` method
**Keyword expansion logic**:
- Google auto-suggestions fetching
- Prefix/suffix expansion
- Relevance scoring
**Keyword clustering approach**:
- TF-IDF vectorization
- K-means clustering
- Silhouette scoring
---
## 📚 Pytrends Capabilities Review
### Available Methods (from pytrends library):
1. **`interest_over_time()`**
- Historical indexed data
- Shows when keyword was most searched
- Returns time series data
2. **`multirange_interest_over_time()`**
- Similar to interest_over_time
- Allows analysis across multiple date ranges
- Better for comparing different time periods
3. **`historical_hourly_interest()`**
- Historical hourly data
- Sends multiple requests (one week at a time)
- More granular than daily data
4. **`interest_by_region()`**
- Geographic interest data
- Shows where keyword is most searched
- Returns data by country/region
5. **`related_topics()`**
- Related topics to keyword
- Returns 'top' and 'rising' topics
- Useful for content expansion
6. **`related_queries()`**
- Related search queries
- Returns 'top' and 'rising' queries
- Great for keyword research
7. **`trending_searches()`**
- Latest trending searches
- Country-specific
- Real-time trending topics
8. **`top_charts()`**
- Top charts for a given topic
- Yearly charts
- Category-specific
9. **`suggestions()`**
- Additional suggested keywords
- Refines trend search
- Auto-complete suggestions
### Key Parameters:
- **`timeframe`**: `'today 1-y'`, `'today 12-m'`, `'all'`, custom dates
- **`geo`**: Country code (e.g., 'US', 'GB', 'IN')
- **`hl`**: Language (e.g., 'en-US')
- **`tz`**: Timezone offset (e.g., 360 for UTC-6)
---
## 🔍 Migration Status Check
### Search Results:
**No Google Trends implementation found** in:
- `backend/services/research/` - No trends service
- `backend/api/research/` - No trends endpoints
- Current codebase only mentions "trends" as a deliverable type, not actual Google Trends API
### Current "Trends" References:
The codebase has:
- `ExpectedDeliverable.TRENDS` enum value
- `TrendAnalysis` model in `research_intent_models.py`
- Intent-aware analyzer that can extract trends from research results
- But **NO actual Google Trends API integration**
**Conclusion**: Google Trends has **NOT been migrated** to the new codebase. The current "trends" feature only extracts trend information from general research results, not from Google Trends API.
---
## 🎯 Where to Integrate Google Trends in User Flow
### Current Researcher Flow:
```
Step 1: ResearchInput
├── User enters keywords/topic
├── Clicks "Intent & Options" button
└── Intent analysis performed
Step 2: IntentConfirmationPanel
├── Shows inferred intent (editable)
├── Shows suggested queries
├── Shows AI-optimized settings
└── User confirms and clicks "Research"
Step 3: Research Execution
└── Research runs via Exa/Tavily/Google
Step 4: StepResults (IntentResultsDisplay)
├── Summary tab
├── Statistics tab
├── Expert Quotes tab
├── Case Studies tab
├── Trends tab (currently shows AI-extracted trends)
└── Sources tab
```
### Recommended Integration Points:
#### Option 1: Automatic Integration (Recommended) ⭐⭐⭐⭐⭐
**When**: During research execution, if intent includes trends
**Flow**:
1. User enters keywords → Intent analysis
2. If intent includes `EXPLORE_TRENDS` purpose OR `TRENDS` deliverable:
- Automatically fetch Google Trends data in parallel
- Merge with research results
3. Display in "Trends" tab with Google Trends data
**Pros**:
- Seamless user experience
- No extra clicks
- Trends data always available when relevant
**Cons**:
- Additional API call (but can be cached)
- Slightly longer execution time
**Implementation**:
- Add to `IntentAwareAnalyzer.analyze()` method
- Call Google Trends service if trends in expected_deliverables
- Merge Google Trends data with AI-extracted trends
#### Option 2: On-Demand Button (Alternative) ⭐⭐⭐⭐
**When**: After intent analysis, show "Analyze Trends" button
**Flow**:
1. User enters keywords → Intent analysis
2. `IntentConfirmationPanel` shows "Analyze Trends" button
3. User clicks → Fetches Google Trends data
4. Shows trends preview in panel
5. User proceeds with research
**Pros**:
- User control
- Faster initial intent analysis
- Can preview trends before research
**Cons**:
- Extra user action
- Trends not integrated with research results
**Implementation**:
- Add button to `IntentConfirmationPanel`
- Create endpoint: `POST /api/research/trends/analyze`
- Show trends preview in panel
#### Option 3: Separate Trends Tab (Alternative) ⭐⭐⭐
**When**: Always available as separate action
**Flow**:
1. User enters keywords
2. "Trends" button always visible
3. Click → Opens trends analysis
4. Separate from main research flow
**Pros**:
- Clear separation
- Can use independently
- Simple UX
**Cons**:
- Not integrated with research
- Extra navigation
- Less discoverable
---
## ✅ Recommended Approach: Hybrid (Option 1 + Option 2)
### Primary: Automatic Integration
**For intent-driven research**:
- If `purpose == EXPLORE_TRENDS` OR `TRENDS in expected_deliverables`:
- Automatically fetch Google Trends data
- Include in research results
- Display in "Trends" tab
### Secondary: On-Demand Button
**For all research**:
- Show "Analyze Trends" button in `IntentConfirmationPanel`
- User can click to get trends even if not in intent
- Preview trends before research execution
### User Experience:
```
┌─────────────────────────────────────────────────────────┐
│ ResearchInput │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Keywords: "AI marketing tools" │ │
│ │ [Intent & Options] │ │
│ └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ IntentConfirmationPanel │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Intent: make_decision │ │
│ │ Deliverables: [comparisons, trends, statistics] │ │
│ │ │ │
│ │ [Analyze Trends] ← Always available │ │
│ │ [Research] ← Will auto-include trends │ │
│ └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Research Execution │
│ ├── Exa/Tavily/Google search │
│ └── Google Trends (if trends in deliverables) ← AUTO │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ IntentResultsDisplay │
│ ┌───────────────────────────────────────────────────┐ │
│ │ [Summary] [Statistics] [Quotes] [Trends] [Sources]│ │
│ │ │ │
│ │ Trends Tab: │ │
│ │ ├── Interest Over Time (Chart) │ │
│ │ ├── Interest by Region (Map/Table) │ │
│ │ ├── Related Topics (Top & Rising) │ │
│ │ ├── Related Queries (Top & Rising) │ │
│ │ └── AI-Extracted Trends (from research) │ │
│ └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
```
---
## 🏗️ Implementation Plan
### Phase 1: Core Service (Week 1)
**Create**: `backend/services/research/trends/google_trends_service.py`
**Features**:
- Interest over time
- Interest by region
- Related topics
- Related queries
- Proper error handling
- Rate limiting
- Caching (24-hour TTL)
- Async support
### Phase 2: Integration (Week 1-2)
**Enhance**: `IntentAwareAnalyzer`
**Changes**:
- Check if trends in expected_deliverables
- Call Google Trends service
- Merge with AI-extracted trends
- Return enhanced trends data
### Phase 3: API Endpoint (Week 2)
**Create**: `POST /api/research/trends/analyze`
**Purpose**: On-demand trends analysis
**Request**:
```json
{
"keywords": ["AI marketing tools"],
"timeframe": "today 12-m",
"geo": "US"
}
```
**Response**:
```json
{
"interest_over_time": [...],
"interest_by_region": [...],
"related_topics": {
"top": [...],
"rising": [...]
},
"related_queries": {
"top": [...],
"rising": [...]
}
}
```
### Phase 4: Frontend Integration (Week 2-3)
**Enhance**: `IntentConfirmationPanel`
- Add "Analyze Trends" button
- Show trends preview
**Enhance**: `IntentResultsDisplay`
- Enhance "Trends" tab with Google Trends data
- Add charts (interest over time)
- Add regional map/table
- Show related topics/queries
---
## 📊 Data Structure Design
### Google Trends Response Model
```python
class GoogleTrendsData(BaseModel):
"""Structured Google Trends data."""
interest_over_time: List[Dict[str, Any]] # Time series data
interest_by_region: List[Dict[str, Any]] # Geographic data
related_topics: Dict[str, List[Dict[str, Any]]] # {top: [...], rising: [...]}
related_queries: Dict[str, List[Dict[str, Any]]] # {top: [...], rising: [...]}
trending_searches: Optional[List[str]] = None
timeframe: str
geo: str
keywords: List[str]
```
### Enhanced TrendAnalysis Model
```python
class TrendAnalysis(BaseModel):
"""Enhanced trend analysis with Google Trends data."""
trend: str
direction: str
evidence: List[str]
impact: Optional[str]
timeline: Optional[str]
sources: List[str]
# Google Trends specific
google_trends_data: Optional[GoogleTrendsData] = None
interest_score: Optional[float] = None # 0-100 from Google Trends
regional_interest: Optional[Dict[str, float]] = None
related_topics: Optional[List[str]] = None
related_queries: Optional[List[str]] = None
```
---
## 🔧 Technical Considerations
### Rate Limiting
**Pytrends Limitations**:
- Google Trends API is rate-limited
- Recommended: 1 request per second
- Pytrends handles some rate limiting internally
**Our Strategy**:
- Cache all trends data (24-hour TTL)
- Use async requests with delays
- Batch multiple keywords in single request when possible
- Implement retry logic with exponential backoff
### Caching Strategy
```python
# Cache key: f"google_trends:{keyword}:{timeframe}:{geo}"
# TTL: 24 hours (trends don't change frequently)
# Store: Interest over time, related topics/queries
```
### Error Handling
- Handle Google blocking (429 errors)
- Handle invalid keywords
- Handle missing data
- Graceful degradation (return partial data if available)
### Async Support
- Use `asyncio` for non-blocking requests
- Parallel requests for multiple keywords
- Timeout handling (30 seconds max)
---
## 📈 User Value
### For Content Creators:
1. **Timing Optimization**:
- See interest over time to time publication
- Identify peak interest periods
- Avoid publishing during low-interest periods
2. **Regional Targeting**:
- See which regions have highest interest
- Tailor content for specific markets
- Discover new audience opportunities
3. **Content Expansion**:
- Related topics → new article ideas
- Related queries → FAQ sections
- Rising topics → timely content opportunities
### For Digital Marketers:
1. **Campaign Planning**:
- Trending searches → campaign topics
- Interest by region → geo-targeting
- Related queries → ad keywords
2. **SEO Strategy**:
- Related queries → long-tail keywords
- Rising topics → content opportunities
- Interest trends → content calendar
### For Solopreneurs:
1. **Market Research**:
- Interest trends → market validation
- Regional data → market expansion
- Related topics → competitive landscape
---
## ✅ Success Criteria
- [ ] Google Trends service created and tested
- [ ] Automatic integration working (when trends in intent)
- [ ] On-demand button working in IntentConfirmationPanel
- [ ] Trends tab enhanced with Google Trends data
- [ ] Charts displaying correctly (interest over time)
- [ ] Regional data displaying correctly
- [ ] Caching working (24-hour TTL)
- [ ] Rate limiting preventing blocks
- [ ] Error handling graceful
- [ ] User satisfaction with trends feature
---
## 🚀 Quick Start Implementation
### Step 1: Create Service (2-3 days)
```python
# backend/services/research/trends/google_trends_service.py
class GoogleTrendsService:
async def get_interest_over_time(keywords, timeframe, geo)
async def get_interest_by_region(keywords, geo)
async def get_related_topics(keywords, timeframe)
async def get_related_queries(keywords, timeframe)
async def get_trending_searches(country)
```
### Step 2: Integrate with IntentAwareAnalyzer (1-2 days)
- Check for trends in deliverables
- Call Google Trends service
- Merge with AI-extracted trends
### Step 3: Add API Endpoint (1 day)
- `POST /api/research/trends/analyze`
- Return structured trends data
### Step 4: Frontend Integration (2-3 days)
- Add "Analyze Trends" button
- Enhance Trends tab
- Add charts/visualizations
**Total Estimate**: 6-9 days for full implementation
---
## 📝 Next Steps
1. **Approve Approach**: Confirm hybrid approach (automatic + on-demand)
2. **Set Up Dependencies**: Add `pytrends>=4.9.2` to requirements.txt
3. **Create Service**: Start with `google_trends_service.py`
4. **Test Integration**: Test with sample keywords
5. **Frontend Integration**: Add UI components
---
**Status**: Analysis Complete - Ready for Implementation
**Recommended Action**: Start with Phase 1 (Core Service) - create `google_trends_service.py` with proper error handling, caching, and async support.