579 lines
18 KiB
Markdown
579 lines
18 KiB
Markdown
# Google Trends Integration Analysis
|
|
|
|
**Date**: 2025-01-29
|
|
**Status**: Analysis Complete - Ready for Implementation
|
|
|
|
---
|
|
|
|
## 📋 Executive Summary
|
|
|
|
After reviewing the legacy Google Trends implementation and the current Research Engine codebase:
|
|
|
|
- ❌ **No Google Trends migration found** in the new codebase
|
|
- ⚠️ **Legacy implementation has significant issues** (not production-ready)
|
|
- ✅ **Pytrends offers comprehensive capabilities** that align with user needs
|
|
- 🎯 **Integration points identified** in the current researcher flow
|
|
|
|
---
|
|
|
|
## 🔍 Legacy Implementation Review
|
|
|
|
### Current Legacy Code Issues
|
|
|
|
**File**: `ToBeMigrated/ai_web_researcher/google_trends_researcher.py`
|
|
|
|
#### Problems Identified:
|
|
|
|
1. **Visualization Issues**:
|
|
- Uses `matplotlib.pyplot.show()` - not suitable for web/API
|
|
- No way to return chart data for frontend rendering
|
|
- Hardcoded visualization that blocks execution
|
|
|
|
2. **Error Handling**:
|
|
- Basic try/except blocks
|
|
- Returns empty DataFrames on error (silent failures)
|
|
- No retry logic for rate limiting
|
|
|
|
3. **Rate Limiting**:
|
|
- Random sleeps (`time.sleep(random.uniform(0.1, 0.6))`)
|
|
- No proper rate limiting strategy
|
|
- Risk of getting blocked by Google
|
|
|
|
4. **Code Quality**:
|
|
- Mixed concerns (keyword clustering + trends in same file)
|
|
- Hardcoded timeframes (`'today 1-y'`, `'today 12-m'`)
|
|
- No configuration management
|
|
- FIXME comments indicating incomplete features
|
|
|
|
5. **Data Structure**:
|
|
- Returns pandas DataFrames directly
|
|
- Not serializable for API responses
|
|
- No standardized response format
|
|
|
|
6. **Missing Features**:
|
|
- No caching strategy
|
|
- No async support
|
|
- No integration with subscription system
|
|
- No user_id tracking
|
|
|
|
#### What Works (Can Reuse):
|
|
|
|
✅ **Core pytrends usage patterns**:
|
|
- `TrendReq()` initialization
|
|
- `build_payload()` method
|
|
- `interest_over_time()` method
|
|
- `interest_by_region()` method
|
|
- `related_topics()` method
|
|
- `related_queries()` method
|
|
- `trending_searches()` method
|
|
|
|
✅ **Keyword expansion logic**:
|
|
- Google auto-suggestions fetching
|
|
- Prefix/suffix expansion
|
|
- Relevance scoring
|
|
|
|
✅ **Keyword clustering approach**:
|
|
- TF-IDF vectorization
|
|
- K-means clustering
|
|
- Silhouette scoring
|
|
|
|
---
|
|
|
|
## 📚 Pytrends Capabilities Review
|
|
|
|
### Available Methods (from pytrends library):
|
|
|
|
1. **`interest_over_time()`**
|
|
- Historical indexed data
|
|
- Shows when keyword was most searched
|
|
- Returns time series data
|
|
|
|
2. **`multirange_interest_over_time()`**
|
|
- Similar to interest_over_time
|
|
- Allows analysis across multiple date ranges
|
|
- Better for comparing different time periods
|
|
|
|
3. **`historical_hourly_interest()`**
|
|
- Historical hourly data
|
|
- Sends multiple requests (one week at a time)
|
|
- More granular than daily data
|
|
|
|
4. **`interest_by_region()`**
|
|
- Geographic interest data
|
|
- Shows where keyword is most searched
|
|
- Returns data by country/region
|
|
|
|
5. **`related_topics()`**
|
|
- Related topics to keyword
|
|
- Returns 'top' and 'rising' topics
|
|
- Useful for content expansion
|
|
|
|
6. **`related_queries()`**
|
|
- Related search queries
|
|
- Returns 'top' and 'rising' queries
|
|
- Great for keyword research
|
|
|
|
7. **`trending_searches()`**
|
|
- Latest trending searches
|
|
- Country-specific
|
|
- Real-time trending topics
|
|
|
|
8. **`top_charts()`**
|
|
- Top charts for a given topic
|
|
- Yearly charts
|
|
- Category-specific
|
|
|
|
9. **`suggestions()`**
|
|
- Additional suggested keywords
|
|
- Refines trend search
|
|
- Auto-complete suggestions
|
|
|
|
### Key Parameters:
|
|
|
|
- **`timeframe`**: `'today 1-y'`, `'today 12-m'`, `'all'`, custom dates
|
|
- **`geo`**: Country code (e.g., 'US', 'GB', 'IN')
|
|
- **`hl`**: Language (e.g., 'en-US')
|
|
- **`tz`**: Timezone offset (e.g., 360 for UTC-6)
|
|
|
|
---
|
|
|
|
## 🔍 Migration Status Check
|
|
|
|
### Search Results:
|
|
|
|
✅ **No Google Trends implementation found** in:
|
|
- `backend/services/research/` - No trends service
|
|
- `backend/api/research/` - No trends endpoints
|
|
- Current codebase only mentions "trends" as a deliverable type, not actual Google Trends API
|
|
|
|
### Current "Trends" References:
|
|
|
|
The codebase has:
|
|
- `ExpectedDeliverable.TRENDS` enum value
|
|
- `TrendAnalysis` model in `research_intent_models.py`
|
|
- Intent-aware analyzer that can extract trends from research results
|
|
- But **NO actual Google Trends API integration**
|
|
|
|
**Conclusion**: Google Trends has **NOT been migrated** to the new codebase. The current "trends" feature only extracts trend information from general research results, not from Google Trends API.
|
|
|
|
---
|
|
|
|
## 🎯 Where to Integrate Google Trends in User Flow
|
|
|
|
### Current Researcher Flow:
|
|
|
|
```
|
|
Step 1: ResearchInput
|
|
├── User enters keywords/topic
|
|
├── Clicks "Intent & Options" button
|
|
└── Intent analysis performed
|
|
|
|
Step 2: IntentConfirmationPanel
|
|
├── Shows inferred intent (editable)
|
|
├── Shows suggested queries
|
|
├── Shows AI-optimized settings
|
|
└── User confirms and clicks "Research"
|
|
|
|
Step 3: Research Execution
|
|
└── Research runs via Exa/Tavily/Google
|
|
|
|
Step 4: StepResults (IntentResultsDisplay)
|
|
├── Summary tab
|
|
├── Statistics tab
|
|
├── Expert Quotes tab
|
|
├── Case Studies tab
|
|
├── Trends tab (currently shows AI-extracted trends)
|
|
└── Sources tab
|
|
```
|
|
|
|
### Recommended Integration Points:
|
|
|
|
#### Option 1: Automatic Integration (Recommended) ⭐⭐⭐⭐⭐
|
|
|
|
**When**: During research execution, if intent includes trends
|
|
|
|
**Flow**:
|
|
1. User enters keywords → Intent analysis
|
|
2. If intent includes `EXPLORE_TRENDS` purpose OR `TRENDS` deliverable:
|
|
- Automatically fetch Google Trends data in parallel
|
|
- Merge with research results
|
|
3. Display in "Trends" tab with Google Trends data
|
|
|
|
**Pros**:
|
|
- Seamless user experience
|
|
- No extra clicks
|
|
- Trends data always available when relevant
|
|
|
|
**Cons**:
|
|
- Additional API call (but can be cached)
|
|
- Slightly longer execution time
|
|
|
|
**Implementation**:
|
|
- Add to `IntentAwareAnalyzer.analyze()` method
|
|
- Call Google Trends service if trends in expected_deliverables
|
|
- Merge Google Trends data with AI-extracted trends
|
|
|
|
#### Option 2: On-Demand Button (Alternative) ⭐⭐⭐⭐
|
|
|
|
**When**: After intent analysis, show "Analyze Trends" button
|
|
|
|
**Flow**:
|
|
1. User enters keywords → Intent analysis
|
|
2. `IntentConfirmationPanel` shows "Analyze Trends" button
|
|
3. User clicks → Fetches Google Trends data
|
|
4. Shows trends preview in panel
|
|
5. User proceeds with research
|
|
|
|
**Pros**:
|
|
- User control
|
|
- Faster initial intent analysis
|
|
- Can preview trends before research
|
|
|
|
**Cons**:
|
|
- Extra user action
|
|
- Trends not integrated with research results
|
|
|
|
**Implementation**:
|
|
- Add button to `IntentConfirmationPanel`
|
|
- Create endpoint: `POST /api/research/trends/analyze`
|
|
- Show trends preview in panel
|
|
|
|
#### Option 3: Separate Trends Tab (Alternative) ⭐⭐⭐
|
|
|
|
**When**: Always available as separate action
|
|
|
|
**Flow**:
|
|
1. User enters keywords
|
|
2. "Trends" button always visible
|
|
3. Click → Opens trends analysis
|
|
4. Separate from main research flow
|
|
|
|
**Pros**:
|
|
- Clear separation
|
|
- Can use independently
|
|
- Simple UX
|
|
|
|
**Cons**:
|
|
- Not integrated with research
|
|
- Extra navigation
|
|
- Less discoverable
|
|
|
|
---
|
|
|
|
## ✅ Recommended Approach: Hybrid (Option 1 + Option 2)
|
|
|
|
### Primary: Automatic Integration
|
|
|
|
**For intent-driven research**:
|
|
- If `purpose == EXPLORE_TRENDS` OR `TRENDS in expected_deliverables`:
|
|
- Automatically fetch Google Trends data
|
|
- Include in research results
|
|
- Display in "Trends" tab
|
|
|
|
### Secondary: On-Demand Button
|
|
|
|
**For all research**:
|
|
- Show "Analyze Trends" button in `IntentConfirmationPanel`
|
|
- User can click to get trends even if not in intent
|
|
- Preview trends before research execution
|
|
|
|
### User Experience:
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ ResearchInput │
|
|
│ ┌───────────────────────────────────────────────────┐ │
|
|
│ │ Keywords: "AI marketing tools" │ │
|
|
│ │ [Intent & Options] │ │
|
|
│ └───────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ IntentConfirmationPanel │
|
|
│ ┌───────────────────────────────────────────────────┐ │
|
|
│ │ Intent: make_decision │ │
|
|
│ │ Deliverables: [comparisons, trends, statistics] │ │
|
|
│ │ │ │
|
|
│ │ [Analyze Trends] ← Always available │ │
|
|
│ │ [Research] ← Will auto-include trends │ │
|
|
│ └───────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Research Execution │
|
|
│ ├── Exa/Tavily/Google search │
|
|
│ └── Google Trends (if trends in deliverables) ← AUTO │
|
|
└─────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ IntentResultsDisplay │
|
|
│ ┌───────────────────────────────────────────────────┐ │
|
|
│ │ [Summary] [Statistics] [Quotes] [Trends] [Sources]│ │
|
|
│ │ │ │
|
|
│ │ Trends Tab: │ │
|
|
│ │ ├── Interest Over Time (Chart) │ │
|
|
│ │ ├── Interest by Region (Map/Table) │ │
|
|
│ │ ├── Related Topics (Top & Rising) │ │
|
|
│ │ ├── Related Queries (Top & Rising) │ │
|
|
│ │ └── AI-Extracted Trends (from research) │ │
|
|
│ └───────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## 🏗️ Implementation Plan
|
|
|
|
### Phase 1: Core Service (Week 1)
|
|
|
|
**Create**: `backend/services/research/trends/google_trends_service.py`
|
|
|
|
**Features**:
|
|
- Interest over time
|
|
- Interest by region
|
|
- Related topics
|
|
- Related queries
|
|
- Proper error handling
|
|
- Rate limiting
|
|
- Caching (24-hour TTL)
|
|
- Async support
|
|
|
|
### Phase 2: Integration (Week 1-2)
|
|
|
|
**Enhance**: `IntentAwareAnalyzer`
|
|
|
|
**Changes**:
|
|
- Check if trends in expected_deliverables
|
|
- Call Google Trends service
|
|
- Merge with AI-extracted trends
|
|
- Return enhanced trends data
|
|
|
|
### Phase 3: API Endpoint (Week 2)
|
|
|
|
**Create**: `POST /api/research/trends/analyze`
|
|
|
|
**Purpose**: On-demand trends analysis
|
|
|
|
**Request**:
|
|
```json
|
|
{
|
|
"keywords": ["AI marketing tools"],
|
|
"timeframe": "today 12-m",
|
|
"geo": "US"
|
|
}
|
|
```
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"interest_over_time": [...],
|
|
"interest_by_region": [...],
|
|
"related_topics": {
|
|
"top": [...],
|
|
"rising": [...]
|
|
},
|
|
"related_queries": {
|
|
"top": [...],
|
|
"rising": [...]
|
|
}
|
|
}
|
|
```
|
|
|
|
### Phase 4: Frontend Integration (Week 2-3)
|
|
|
|
**Enhance**: `IntentConfirmationPanel`
|
|
- Add "Analyze Trends" button
|
|
- Show trends preview
|
|
|
|
**Enhance**: `IntentResultsDisplay`
|
|
- Enhance "Trends" tab with Google Trends data
|
|
- Add charts (interest over time)
|
|
- Add regional map/table
|
|
- Show related topics/queries
|
|
|
|
---
|
|
|
|
## 📊 Data Structure Design
|
|
|
|
### Google Trends Response Model
|
|
|
|
```python
|
|
class GoogleTrendsData(BaseModel):
|
|
"""Structured Google Trends data."""
|
|
interest_over_time: List[Dict[str, Any]] # Time series data
|
|
interest_by_region: List[Dict[str, Any]] # Geographic data
|
|
related_topics: Dict[str, List[Dict[str, Any]]] # {top: [...], rising: [...]}
|
|
related_queries: Dict[str, List[Dict[str, Any]]] # {top: [...], rising: [...]}
|
|
trending_searches: Optional[List[str]] = None
|
|
timeframe: str
|
|
geo: str
|
|
keywords: List[str]
|
|
```
|
|
|
|
### Enhanced TrendAnalysis Model
|
|
|
|
```python
|
|
class TrendAnalysis(BaseModel):
|
|
"""Enhanced trend analysis with Google Trends data."""
|
|
trend: str
|
|
direction: str
|
|
evidence: List[str]
|
|
impact: Optional[str]
|
|
timeline: Optional[str]
|
|
sources: List[str]
|
|
|
|
# Google Trends specific
|
|
google_trends_data: Optional[GoogleTrendsData] = None
|
|
interest_score: Optional[float] = None # 0-100 from Google Trends
|
|
regional_interest: Optional[Dict[str, float]] = None
|
|
related_topics: Optional[List[str]] = None
|
|
related_queries: Optional[List[str]] = None
|
|
```
|
|
|
|
---
|
|
|
|
## 🔧 Technical Considerations
|
|
|
|
### Rate Limiting
|
|
|
|
**Pytrends Limitations**:
|
|
- Google Trends API is rate-limited
|
|
- Recommended: 1 request per second
|
|
- Pytrends handles some rate limiting internally
|
|
|
|
**Our Strategy**:
|
|
- Cache all trends data (24-hour TTL)
|
|
- Use async requests with delays
|
|
- Batch multiple keywords in single request when possible
|
|
- Implement retry logic with exponential backoff
|
|
|
|
### Caching Strategy
|
|
|
|
```python
|
|
# Cache key: f"google_trends:{keyword}:{timeframe}:{geo}"
|
|
# TTL: 24 hours (trends don't change frequently)
|
|
# Store: Interest over time, related topics/queries
|
|
```
|
|
|
|
### Error Handling
|
|
|
|
- Handle Google blocking (429 errors)
|
|
- Handle invalid keywords
|
|
- Handle missing data
|
|
- Graceful degradation (return partial data if available)
|
|
|
|
### Async Support
|
|
|
|
- Use `asyncio` for non-blocking requests
|
|
- Parallel requests for multiple keywords
|
|
- Timeout handling (30 seconds max)
|
|
|
|
---
|
|
|
|
## 📈 User Value
|
|
|
|
### For Content Creators:
|
|
|
|
1. **Timing Optimization**:
|
|
- See interest over time to time publication
|
|
- Identify peak interest periods
|
|
- Avoid publishing during low-interest periods
|
|
|
|
2. **Regional Targeting**:
|
|
- See which regions have highest interest
|
|
- Tailor content for specific markets
|
|
- Discover new audience opportunities
|
|
|
|
3. **Content Expansion**:
|
|
- Related topics → new article ideas
|
|
- Related queries → FAQ sections
|
|
- Rising topics → timely content opportunities
|
|
|
|
### For Digital Marketers:
|
|
|
|
1. **Campaign Planning**:
|
|
- Trending searches → campaign topics
|
|
- Interest by region → geo-targeting
|
|
- Related queries → ad keywords
|
|
|
|
2. **SEO Strategy**:
|
|
- Related queries → long-tail keywords
|
|
- Rising topics → content opportunities
|
|
- Interest trends → content calendar
|
|
|
|
### For Solopreneurs:
|
|
|
|
1. **Market Research**:
|
|
- Interest trends → market validation
|
|
- Regional data → market expansion
|
|
- Related topics → competitive landscape
|
|
|
|
---
|
|
|
|
## ✅ Success Criteria
|
|
|
|
- [ ] Google Trends service created and tested
|
|
- [ ] Automatic integration working (when trends in intent)
|
|
- [ ] On-demand button working in IntentConfirmationPanel
|
|
- [ ] Trends tab enhanced with Google Trends data
|
|
- [ ] Charts displaying correctly (interest over time)
|
|
- [ ] Regional data displaying correctly
|
|
- [ ] Caching working (24-hour TTL)
|
|
- [ ] Rate limiting preventing blocks
|
|
- [ ] Error handling graceful
|
|
- [ ] User satisfaction with trends feature
|
|
|
|
---
|
|
|
|
## 🚀 Quick Start Implementation
|
|
|
|
### Step 1: Create Service (2-3 days)
|
|
|
|
```python
|
|
# backend/services/research/trends/google_trends_service.py
|
|
class GoogleTrendsService:
|
|
async def get_interest_over_time(keywords, timeframe, geo)
|
|
async def get_interest_by_region(keywords, geo)
|
|
async def get_related_topics(keywords, timeframe)
|
|
async def get_related_queries(keywords, timeframe)
|
|
async def get_trending_searches(country)
|
|
```
|
|
|
|
### Step 2: Integrate with IntentAwareAnalyzer (1-2 days)
|
|
|
|
- Check for trends in deliverables
|
|
- Call Google Trends service
|
|
- Merge with AI-extracted trends
|
|
|
|
### Step 3: Add API Endpoint (1 day)
|
|
|
|
- `POST /api/research/trends/analyze`
|
|
- Return structured trends data
|
|
|
|
### Step 4: Frontend Integration (2-3 days)
|
|
|
|
- Add "Analyze Trends" button
|
|
- Enhance Trends tab
|
|
- Add charts/visualizations
|
|
|
|
**Total Estimate**: 6-9 days for full implementation
|
|
|
|
---
|
|
|
|
## 📝 Next Steps
|
|
|
|
1. **Approve Approach**: Confirm hybrid approach (automatic + on-demand)
|
|
2. **Set Up Dependencies**: Add `pytrends>=4.9.2` to requirements.txt
|
|
3. **Create Service**: Start with `google_trends_service.py`
|
|
4. **Test Integration**: Test with sample keywords
|
|
5. **Frontend Integration**: Add UI components
|
|
|
|
---
|
|
|
|
**Status**: Analysis Complete - Ready for Implementation
|
|
|
|
**Recommended Action**: Start with Phase 1 (Core Service) - create `google_trends_service.py` with proper error handling, caching, and async support.
|