AI Researcher and Video Studio implementation complete
This commit is contained in:
578
docs/ALwrity Researcher/GOOGLE_TRENDS_INTEGRATION_ANALYSIS.md
Normal file
578
docs/ALwrity Researcher/GOOGLE_TRENDS_INTEGRATION_ANALYSIS.md
Normal file
@@ -0,0 +1,578 @@
|
||||
# Google Trends Integration Analysis
|
||||
|
||||
**Date**: 2025-01-29
|
||||
**Status**: Analysis Complete - Ready for Implementation
|
||||
|
||||
---
|
||||
|
||||
## 📋 Executive Summary
|
||||
|
||||
After reviewing the legacy Google Trends implementation and the current Research Engine codebase:
|
||||
|
||||
- ❌ **No Google Trends migration found** in the new codebase
|
||||
- ⚠️ **Legacy implementation has significant issues** (not production-ready)
|
||||
- ✅ **Pytrends offers comprehensive capabilities** that align with user needs
|
||||
- 🎯 **Integration points identified** in the current researcher flow
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Legacy Implementation Review
|
||||
|
||||
### Current Legacy Code Issues
|
||||
|
||||
**File**: `ToBeMigrated/ai_web_researcher/google_trends_researcher.py`
|
||||
|
||||
#### Problems Identified:
|
||||
|
||||
1. **Visualization Issues**:
|
||||
- Uses `matplotlib.pyplot.show()` - not suitable for web/API
|
||||
- No way to return chart data for frontend rendering
|
||||
- Hardcoded visualization that blocks execution
|
||||
|
||||
2. **Error Handling**:
|
||||
- Basic try/except blocks
|
||||
- Returns empty DataFrames on error (silent failures)
|
||||
- No retry logic for rate limiting
|
||||
|
||||
3. **Rate Limiting**:
|
||||
- Random sleeps (`time.sleep(random.uniform(0.1, 0.6))`)
|
||||
- No proper rate limiting strategy
|
||||
- Risk of getting blocked by Google
|
||||
|
||||
4. **Code Quality**:
|
||||
- Mixed concerns (keyword clustering + trends in same file)
|
||||
- Hardcoded timeframes (`'today 1-y'`, `'today 12-m'`)
|
||||
- No configuration management
|
||||
- FIXME comments indicating incomplete features
|
||||
|
||||
5. **Data Structure**:
|
||||
- Returns pandas DataFrames directly
|
||||
- Not serializable for API responses
|
||||
- No standardized response format
|
||||
|
||||
6. **Missing Features**:
|
||||
- No caching strategy
|
||||
- No async support
|
||||
- No integration with subscription system
|
||||
- No user_id tracking
|
||||
|
||||
#### What Works (Can Reuse):
|
||||
|
||||
✅ **Core pytrends usage patterns**:
|
||||
- `TrendReq()` initialization
|
||||
- `build_payload()` method
|
||||
- `interest_over_time()` method
|
||||
- `interest_by_region()` method
|
||||
- `related_topics()` method
|
||||
- `related_queries()` method
|
||||
- `trending_searches()` method
|
||||
|
||||
✅ **Keyword expansion logic**:
|
||||
- Google auto-suggestions fetching
|
||||
- Prefix/suffix expansion
|
||||
- Relevance scoring
|
||||
|
||||
✅ **Keyword clustering approach**:
|
||||
- TF-IDF vectorization
|
||||
- K-means clustering
|
||||
- Silhouette scoring
|
||||
|
||||
---
|
||||
|
||||
## 📚 Pytrends Capabilities Review
|
||||
|
||||
### Available Methods (from pytrends library):
|
||||
|
||||
1. **`interest_over_time()`**
|
||||
- Historical indexed data
|
||||
- Shows when keyword was most searched
|
||||
- Returns time series data
|
||||
|
||||
2. **`multirange_interest_over_time()`**
|
||||
- Similar to interest_over_time
|
||||
- Allows analysis across multiple date ranges
|
||||
- Better for comparing different time periods
|
||||
|
||||
3. **`historical_hourly_interest()`**
|
||||
- Historical hourly data
|
||||
- Sends multiple requests (one week at a time)
|
||||
- More granular than daily data
|
||||
|
||||
4. **`interest_by_region()`**
|
||||
- Geographic interest data
|
||||
- Shows where keyword is most searched
|
||||
- Returns data by country/region
|
||||
|
||||
5. **`related_topics()`**
|
||||
- Related topics to keyword
|
||||
- Returns 'top' and 'rising' topics
|
||||
- Useful for content expansion
|
||||
|
||||
6. **`related_queries()`**
|
||||
- Related search queries
|
||||
- Returns 'top' and 'rising' queries
|
||||
- Great for keyword research
|
||||
|
||||
7. **`trending_searches()`**
|
||||
- Latest trending searches
|
||||
- Country-specific
|
||||
- Real-time trending topics
|
||||
|
||||
8. **`top_charts()`**
|
||||
- Top charts for a given topic
|
||||
- Yearly charts
|
||||
- Category-specific
|
||||
|
||||
9. **`suggestions()`**
|
||||
- Additional suggested keywords
|
||||
- Refines trend search
|
||||
- Auto-complete suggestions
|
||||
|
||||
### Key Parameters:
|
||||
|
||||
- **`timeframe`**: `'today 1-y'`, `'today 12-m'`, `'all'`, custom dates
|
||||
- **`geo`**: Country code (e.g., 'US', 'GB', 'IN')
|
||||
- **`hl`**: Language (e.g., 'en-US')
|
||||
- **`tz`**: Timezone offset (e.g., 360 for UTC-6)
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Migration Status Check
|
||||
|
||||
### Search Results:
|
||||
|
||||
✅ **No Google Trends implementation found** in:
|
||||
- `backend/services/research/` - No trends service
|
||||
- `backend/api/research/` - No trends endpoints
|
||||
- Current codebase only mentions "trends" as a deliverable type, not actual Google Trends API
|
||||
|
||||
### Current "Trends" References:
|
||||
|
||||
The codebase has:
|
||||
- `ExpectedDeliverable.TRENDS` enum value
|
||||
- `TrendAnalysis` model in `research_intent_models.py`
|
||||
- Intent-aware analyzer that can extract trends from research results
|
||||
- But **NO actual Google Trends API integration**
|
||||
|
||||
**Conclusion**: Google Trends has **NOT been migrated** to the new codebase. The current "trends" feature only extracts trend information from general research results, not from Google Trends API.
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Where to Integrate Google Trends in User Flow
|
||||
|
||||
### Current Researcher Flow:
|
||||
|
||||
```
|
||||
Step 1: ResearchInput
|
||||
├── User enters keywords/topic
|
||||
├── Clicks "Intent & Options" button
|
||||
└── Intent analysis performed
|
||||
|
||||
Step 2: IntentConfirmationPanel
|
||||
├── Shows inferred intent (editable)
|
||||
├── Shows suggested queries
|
||||
├── Shows AI-optimized settings
|
||||
└── User confirms and clicks "Research"
|
||||
|
||||
Step 3: Research Execution
|
||||
└── Research runs via Exa/Tavily/Google
|
||||
|
||||
Step 4: StepResults (IntentResultsDisplay)
|
||||
├── Summary tab
|
||||
├── Statistics tab
|
||||
├── Expert Quotes tab
|
||||
├── Case Studies tab
|
||||
├── Trends tab (currently shows AI-extracted trends)
|
||||
└── Sources tab
|
||||
```
|
||||
|
||||
### Recommended Integration Points:
|
||||
|
||||
#### Option 1: Automatic Integration (Recommended) ⭐⭐⭐⭐⭐
|
||||
|
||||
**When**: During research execution, if intent includes trends
|
||||
|
||||
**Flow**:
|
||||
1. User enters keywords → Intent analysis
|
||||
2. If intent includes `EXPLORE_TRENDS` purpose OR `TRENDS` deliverable:
|
||||
- Automatically fetch Google Trends data in parallel
|
||||
- Merge with research results
|
||||
3. Display in "Trends" tab with Google Trends data
|
||||
|
||||
**Pros**:
|
||||
- Seamless user experience
|
||||
- No extra clicks
|
||||
- Trends data always available when relevant
|
||||
|
||||
**Cons**:
|
||||
- Additional API call (but can be cached)
|
||||
- Slightly longer execution time
|
||||
|
||||
**Implementation**:
|
||||
- Add to `IntentAwareAnalyzer.analyze()` method
|
||||
- Call Google Trends service if trends in expected_deliverables
|
||||
- Merge Google Trends data with AI-extracted trends
|
||||
|
||||
#### Option 2: On-Demand Button (Alternative) ⭐⭐⭐⭐
|
||||
|
||||
**When**: After intent analysis, show "Analyze Trends" button
|
||||
|
||||
**Flow**:
|
||||
1. User enters keywords → Intent analysis
|
||||
2. `IntentConfirmationPanel` shows "Analyze Trends" button
|
||||
3. User clicks → Fetches Google Trends data
|
||||
4. Shows trends preview in panel
|
||||
5. User proceeds with research
|
||||
|
||||
**Pros**:
|
||||
- User control
|
||||
- Faster initial intent analysis
|
||||
- Can preview trends before research
|
||||
|
||||
**Cons**:
|
||||
- Extra user action
|
||||
- Trends not integrated with research results
|
||||
|
||||
**Implementation**:
|
||||
- Add button to `IntentConfirmationPanel`
|
||||
- Create endpoint: `POST /api/research/trends/analyze`
|
||||
- Show trends preview in panel
|
||||
|
||||
#### Option 3: Separate Trends Tab (Alternative) ⭐⭐⭐
|
||||
|
||||
**When**: Always available as separate action
|
||||
|
||||
**Flow**:
|
||||
1. User enters keywords
|
||||
2. "Trends" button always visible
|
||||
3. Click → Opens trends analysis
|
||||
4. Separate from main research flow
|
||||
|
||||
**Pros**:
|
||||
- Clear separation
|
||||
- Can use independently
|
||||
- Simple UX
|
||||
|
||||
**Cons**:
|
||||
- Not integrated with research
|
||||
- Extra navigation
|
||||
- Less discoverable
|
||||
|
||||
---
|
||||
|
||||
## ✅ Recommended Approach: Hybrid (Option 1 + Option 2)
|
||||
|
||||
### Primary: Automatic Integration
|
||||
|
||||
**For intent-driven research**:
|
||||
- If `purpose == EXPLORE_TRENDS` OR `TRENDS in expected_deliverables`:
|
||||
- Automatically fetch Google Trends data
|
||||
- Include in research results
|
||||
- Display in "Trends" tab
|
||||
|
||||
### Secondary: On-Demand Button
|
||||
|
||||
**For all research**:
|
||||
- Show "Analyze Trends" button in `IntentConfirmationPanel`
|
||||
- User can click to get trends even if not in intent
|
||||
- Preview trends before research execution
|
||||
|
||||
### User Experience:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ ResearchInput │
|
||||
│ ┌───────────────────────────────────────────────────┐ │
|
||||
│ │ Keywords: "AI marketing tools" │ │
|
||||
│ │ [Intent & Options] │ │
|
||||
│ └───────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ IntentConfirmationPanel │
|
||||
│ ┌───────────────────────────────────────────────────┐ │
|
||||
│ │ Intent: make_decision │ │
|
||||
│ │ Deliverables: [comparisons, trends, statistics] │ │
|
||||
│ │ │ │
|
||||
│ │ [Analyze Trends] ← Always available │ │
|
||||
│ │ [Research] ← Will auto-include trends │ │
|
||||
│ └───────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Research Execution │
|
||||
│ ├── Exa/Tavily/Google search │
|
||||
│ └── Google Trends (if trends in deliverables) ← AUTO │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ IntentResultsDisplay │
|
||||
│ ┌───────────────────────────────────────────────────┐ │
|
||||
│ │ [Summary] [Statistics] [Quotes] [Trends] [Sources]│ │
|
||||
│ │ │ │
|
||||
│ │ Trends Tab: │ │
|
||||
│ │ ├── Interest Over Time (Chart) │ │
|
||||
│ │ ├── Interest by Region (Map/Table) │ │
|
||||
│ │ ├── Related Topics (Top & Rising) │ │
|
||||
│ │ ├── Related Queries (Top & Rising) │ │
|
||||
│ │ └── AI-Extracted Trends (from research) │ │
|
||||
│ └───────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Implementation Plan
|
||||
|
||||
### Phase 1: Core Service (Week 1)
|
||||
|
||||
**Create**: `backend/services/research/trends/google_trends_service.py`
|
||||
|
||||
**Features**:
|
||||
- Interest over time
|
||||
- Interest by region
|
||||
- Related topics
|
||||
- Related queries
|
||||
- Proper error handling
|
||||
- Rate limiting
|
||||
- Caching (24-hour TTL)
|
||||
- Async support
|
||||
|
||||
### Phase 2: Integration (Week 1-2)
|
||||
|
||||
**Enhance**: `IntentAwareAnalyzer`
|
||||
|
||||
**Changes**:
|
||||
- Check if trends in expected_deliverables
|
||||
- Call Google Trends service
|
||||
- Merge with AI-extracted trends
|
||||
- Return enhanced trends data
|
||||
|
||||
### Phase 3: API Endpoint (Week 2)
|
||||
|
||||
**Create**: `POST /api/research/trends/analyze`
|
||||
|
||||
**Purpose**: On-demand trends analysis
|
||||
|
||||
**Request**:
|
||||
```json
|
||||
{
|
||||
"keywords": ["AI marketing tools"],
|
||||
"timeframe": "today 12-m",
|
||||
"geo": "US"
|
||||
}
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"interest_over_time": [...],
|
||||
"interest_by_region": [...],
|
||||
"related_topics": {
|
||||
"top": [...],
|
||||
"rising": [...]
|
||||
},
|
||||
"related_queries": {
|
||||
"top": [...],
|
||||
"rising": [...]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 4: Frontend Integration (Week 2-3)
|
||||
|
||||
**Enhance**: `IntentConfirmationPanel`
|
||||
- Add "Analyze Trends" button
|
||||
- Show trends preview
|
||||
|
||||
**Enhance**: `IntentResultsDisplay`
|
||||
- Enhance "Trends" tab with Google Trends data
|
||||
- Add charts (interest over time)
|
||||
- Add regional map/table
|
||||
- Show related topics/queries
|
||||
|
||||
---
|
||||
|
||||
## 📊 Data Structure Design
|
||||
|
||||
### Google Trends Response Model
|
||||
|
||||
```python
|
||||
class GoogleTrendsData(BaseModel):
|
||||
"""Structured Google Trends data."""
|
||||
interest_over_time: List[Dict[str, Any]] # Time series data
|
||||
interest_by_region: List[Dict[str, Any]] # Geographic data
|
||||
related_topics: Dict[str, List[Dict[str, Any]]] # {top: [...], rising: [...]}
|
||||
related_queries: Dict[str, List[Dict[str, Any]]] # {top: [...], rising: [...]}
|
||||
trending_searches: Optional[List[str]] = None
|
||||
timeframe: str
|
||||
geo: str
|
||||
keywords: List[str]
|
||||
```
|
||||
|
||||
### Enhanced TrendAnalysis Model
|
||||
|
||||
```python
|
||||
class TrendAnalysis(BaseModel):
|
||||
"""Enhanced trend analysis with Google Trends data."""
|
||||
trend: str
|
||||
direction: str
|
||||
evidence: List[str]
|
||||
impact: Optional[str]
|
||||
timeline: Optional[str]
|
||||
sources: List[str]
|
||||
|
||||
# Google Trends specific
|
||||
google_trends_data: Optional[GoogleTrendsData] = None
|
||||
interest_score: Optional[float] = None # 0-100 from Google Trends
|
||||
regional_interest: Optional[Dict[str, float]] = None
|
||||
related_topics: Optional[List[str]] = None
|
||||
related_queries: Optional[List[str]] = None
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Technical Considerations
|
||||
|
||||
### Rate Limiting
|
||||
|
||||
**Pytrends Limitations**:
|
||||
- Google Trends API is rate-limited
|
||||
- Recommended: 1 request per second
|
||||
- Pytrends handles some rate limiting internally
|
||||
|
||||
**Our Strategy**:
|
||||
- Cache all trends data (24-hour TTL)
|
||||
- Use async requests with delays
|
||||
- Batch multiple keywords in single request when possible
|
||||
- Implement retry logic with exponential backoff
|
||||
|
||||
### Caching Strategy
|
||||
|
||||
```python
|
||||
# Cache key: f"google_trends:{keyword}:{timeframe}:{geo}"
|
||||
# TTL: 24 hours (trends don't change frequently)
|
||||
# Store: Interest over time, related topics/queries
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
- Handle Google blocking (429 errors)
|
||||
- Handle invalid keywords
|
||||
- Handle missing data
|
||||
- Graceful degradation (return partial data if available)
|
||||
|
||||
### Async Support
|
||||
|
||||
- Use `asyncio` for non-blocking requests
|
||||
- Parallel requests for multiple keywords
|
||||
- Timeout handling (30 seconds max)
|
||||
|
||||
---
|
||||
|
||||
## 📈 User Value
|
||||
|
||||
### For Content Creators:
|
||||
|
||||
1. **Timing Optimization**:
|
||||
- See interest over time to time publication
|
||||
- Identify peak interest periods
|
||||
- Avoid publishing during low-interest periods
|
||||
|
||||
2. **Regional Targeting**:
|
||||
- See which regions have highest interest
|
||||
- Tailor content for specific markets
|
||||
- Discover new audience opportunities
|
||||
|
||||
3. **Content Expansion**:
|
||||
- Related topics → new article ideas
|
||||
- Related queries → FAQ sections
|
||||
- Rising topics → timely content opportunities
|
||||
|
||||
### For Digital Marketers:
|
||||
|
||||
1. **Campaign Planning**:
|
||||
- Trending searches → campaign topics
|
||||
- Interest by region → geo-targeting
|
||||
- Related queries → ad keywords
|
||||
|
||||
2. **SEO Strategy**:
|
||||
- Related queries → long-tail keywords
|
||||
- Rising topics → content opportunities
|
||||
- Interest trends → content calendar
|
||||
|
||||
### For Solopreneurs:
|
||||
|
||||
1. **Market Research**:
|
||||
- Interest trends → market validation
|
||||
- Regional data → market expansion
|
||||
- Related topics → competitive landscape
|
||||
|
||||
---
|
||||
|
||||
## ✅ Success Criteria
|
||||
|
||||
- [ ] Google Trends service created and tested
|
||||
- [ ] Automatic integration working (when trends in intent)
|
||||
- [ ] On-demand button working in IntentConfirmationPanel
|
||||
- [ ] Trends tab enhanced with Google Trends data
|
||||
- [ ] Charts displaying correctly (interest over time)
|
||||
- [ ] Regional data displaying correctly
|
||||
- [ ] Caching working (24-hour TTL)
|
||||
- [ ] Rate limiting preventing blocks
|
||||
- [ ] Error handling graceful
|
||||
- [ ] User satisfaction with trends feature
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Quick Start Implementation
|
||||
|
||||
### Step 1: Create Service (2-3 days)
|
||||
|
||||
```python
|
||||
# backend/services/research/trends/google_trends_service.py
|
||||
class GoogleTrendsService:
|
||||
async def get_interest_over_time(keywords, timeframe, geo)
|
||||
async def get_interest_by_region(keywords, geo)
|
||||
async def get_related_topics(keywords, timeframe)
|
||||
async def get_related_queries(keywords, timeframe)
|
||||
async def get_trending_searches(country)
|
||||
```
|
||||
|
||||
### Step 2: Integrate with IntentAwareAnalyzer (1-2 days)
|
||||
|
||||
- Check for trends in deliverables
|
||||
- Call Google Trends service
|
||||
- Merge with AI-extracted trends
|
||||
|
||||
### Step 3: Add API Endpoint (1 day)
|
||||
|
||||
- `POST /api/research/trends/analyze`
|
||||
- Return structured trends data
|
||||
|
||||
### Step 4: Frontend Integration (2-3 days)
|
||||
|
||||
- Add "Analyze Trends" button
|
||||
- Enhance Trends tab
|
||||
- Add charts/visualizations
|
||||
|
||||
**Total Estimate**: 6-9 days for full implementation
|
||||
|
||||
---
|
||||
|
||||
## 📝 Next Steps
|
||||
|
||||
1. **Approve Approach**: Confirm hybrid approach (automatic + on-demand)
|
||||
2. **Set Up Dependencies**: Add `pytrends>=4.9.2` to requirements.txt
|
||||
3. **Create Service**: Start with `google_trends_service.py`
|
||||
4. **Test Integration**: Test with sample keywords
|
||||
5. **Frontend Integration**: Add UI components
|
||||
|
||||
---
|
||||
|
||||
**Status**: Analysis Complete - Ready for Implementation
|
||||
|
||||
**Recommended Action**: Start with Phase 1 (Core Service) - create `google_trends_service.py` with proper error handling, caching, and async support.
|
||||
Reference in New Issue
Block a user