515 lines
16 KiB
Markdown
515 lines
16 KiB
Markdown
# Legacy Features Migration Analysis
|
|
|
|
**Date**: 2025-01-29
|
|
**Status**: Analysis Complete - Ready for Implementation Planning
|
|
|
|
---
|
|
|
|
## 📋 Executive Summary
|
|
|
|
After reviewing the legacy `ai_web_researcher` folder, I've identified **high-value features** that would significantly enhance the Research Engine for content creators, digital marketing professionals, and solopreneurs. This document provides a prioritized migration plan.
|
|
|
|
**Key Finding**: Several legacy features address critical gaps in the current Research Engine, particularly around **trend analysis**, **keyword research**, and **competitive intelligence**.
|
|
|
|
---
|
|
|
|
## 🎯 User Value Assessment
|
|
|
|
### Content Creators Need:
|
|
- ✅ **Trending topics** to create timely content
|
|
- ✅ **Keyword research** to optimize for SEO
|
|
- ✅ **Related queries** to expand content ideas
|
|
- ✅ **Interest over time** to time content publication
|
|
- ✅ **Regional insights** to target specific audiences
|
|
|
|
### Digital Marketing Professionals Need:
|
|
- ✅ **SERP analysis** to understand competition
|
|
- ✅ **People Also Ask** to optimize content structure
|
|
- ✅ **Trending searches** for campaign planning
|
|
- ✅ **Keyword clustering** for content strategy
|
|
- ✅ **Competitor analysis** via web crawling
|
|
|
|
### Solopreneurs Need:
|
|
- ✅ **Quick trend insights** without expensive tools
|
|
- ✅ **Keyword suggestions** for content planning
|
|
- ✅ **Market research** for business decisions
|
|
- ✅ **Academic research** for thought leadership
|
|
- ✅ **Financial data** for business content
|
|
|
|
---
|
|
|
|
## 🔍 Legacy Features Analysis
|
|
|
|
### 1. Google Trends Researcher ⭐⭐⭐⭐⭐ (HIGHEST PRIORITY)
|
|
|
|
**File**: `google_trends_researcher.py`
|
|
|
|
**Features**:
|
|
- Interest over time analysis
|
|
- Interest by region
|
|
- Related topics (top & rising)
|
|
- Related queries (top & rising)
|
|
- Trending searches (country-specific)
|
|
- Realtime trends
|
|
- Keyword auto-suggestions expansion
|
|
- Keyword clustering (K-means with TF-IDF)
|
|
- Google auto-suggestions with relevance scores
|
|
|
|
**Value for Users**:
|
|
- **Content Creators**: Identify trending topics, optimal publication timing, regional targeting
|
|
- **Marketers**: Campaign planning, audience insights, keyword opportunities
|
|
- **Solopreneurs**: Market research, content calendar planning, audience discovery
|
|
|
|
**Migration Priority**: **P0 - Critical**
|
|
|
|
**Integration Points**:
|
|
- Add to `IntentAwareAnalyzer` as a deliverable type: `trends_analysis`
|
|
- Create new service: `backend/services/research/trends/google_trends_service.py`
|
|
- Add endpoint: `POST /api/research/trends/analyze`
|
|
- Add to `IntentResultsDisplay` as new tab: "Trends"
|
|
|
|
**Implementation Complexity**: Medium (requires pytrends integration, rate limiting)
|
|
|
|
---
|
|
|
|
### 2. Google SERP Search ⭐⭐⭐⭐ (HIGH PRIORITY)
|
|
|
|
**File**: `google_serp_search.py`
|
|
|
|
**Features**:
|
|
- Organic search results with position tracking
|
|
- People Also Ask (PAA) extraction
|
|
- Related Searches extraction
|
|
- Serper.dev integration (fallback to SerpApi)
|
|
|
|
**Value for Users**:
|
|
- **Content Creators**: Understand search competition, find content gaps, optimize for featured snippets
|
|
- **Marketers**: SEO analysis, content gap identification, competitor research
|
|
- **Solopreneurs**: Understand search landscape, find opportunities
|
|
|
|
**Migration Priority**: **P1 - High**
|
|
|
|
**Integration Points**:
|
|
- Enhance `ResearchEngine` with SERP analysis
|
|
- Add to `IntentAwareAnalyzer` deliverables: `serp_analysis`, `people_also_ask`, `related_searches`
|
|
- Create service: `backend/services/research/serp/google_serp_service.py`
|
|
- Add to results: SERP insights section
|
|
|
|
**Implementation Complexity**: Low (Serper.dev API is straightforward)
|
|
|
|
**Note**: Current system uses Google/Gemini grounding, but SERP provides structured competitive data
|
|
|
|
---
|
|
|
|
### 3. Keyword Research & Clustering ⭐⭐⭐⭐ (HIGH PRIORITY)
|
|
|
|
**File**: `google_trends_researcher.py` (keyword functions)
|
|
|
|
**Features**:
|
|
- Google auto-suggestions expansion (prefixes & suffixes)
|
|
- Keyword clustering using K-means + TF-IDF
|
|
- Relevance scoring
|
|
- Keyword grouping by themes
|
|
|
|
**Value for Users**:
|
|
- **Content Creators**: Content cluster strategy, keyword expansion, topic grouping
|
|
- **Marketers**: SEO keyword research, content pillar planning, keyword mapping
|
|
- **Solopreneurs**: Content planning, SEO optimization
|
|
|
|
**Migration Priority**: **P1 - High**
|
|
|
|
**Integration Points**:
|
|
- Enhance `UnifiedResearchAnalyzer` to include keyword expansion
|
|
- Add to `IntentAwareAnalyzer`: `keyword_clusters`, `expanded_keywords`
|
|
- Create service: `backend/services/research/keywords/keyword_research_service.py`
|
|
- Add to `ResearchInput`: "Expand Keywords" button
|
|
- Display in results: Keyword clusters visualization
|
|
|
|
**Implementation Complexity**: Medium (requires ML libraries: sklearn, TF-IDF vectorization)
|
|
|
|
---
|
|
|
|
### 4. ArXiv Scholarly Research ⭐⭐⭐ (MEDIUM PRIORITY)
|
|
|
|
**File**: `arxiv_schlorly_research.py`
|
|
|
|
**Features**:
|
|
- Academic paper search
|
|
- Citation network analysis
|
|
- Paper clustering by topic
|
|
- Research paper metadata extraction
|
|
- AI-powered query expansion for academic searches
|
|
|
|
**Value for Users**:
|
|
- **Content Creators**: Thought leadership content, data-backed articles, research citations
|
|
- **Marketers**: B2B content, whitepapers, authoritative sources
|
|
- **Solopreneurs**: Expert positioning, research-backed content
|
|
|
|
**Migration Priority**: **P2 - Medium**
|
|
|
|
**Integration Points**:
|
|
- Add as new provider option: "Academic" mode
|
|
- Create service: `backend/services/research/academic/arxiv_service.py`
|
|
- Add to `ResearchContext`: `include_academic: bool`
|
|
- Add to results: Academic sources section
|
|
|
|
**Implementation Complexity**: Medium (arXiv API integration, citation parsing)
|
|
|
|
**Note**: Valuable for B2B and technical content creators
|
|
|
|
---
|
|
|
|
### 5. Finance Data Researcher ⭐⭐⭐ (MEDIUM PRIORITY - NICHE)
|
|
|
|
**File**: `finance_data_researcher.py`
|
|
|
|
**Features**:
|
|
- Stock data analysis (yfinance)
|
|
- Technical indicators (MACD, RSI, Bollinger Bands, etc.)
|
|
- Market trend analysis
|
|
- Financial data visualization
|
|
|
|
**Value for Users**:
|
|
- **Content Creators**: Finance/business content, market analysis articles
|
|
- **Marketers**: Financial services content, market insights
|
|
- **Solopreneurs**: Business research, market analysis
|
|
|
|
**Migration Priority**: **P2 - Medium (Niche)**
|
|
|
|
**Integration Points**:
|
|
- Create specialized service: `backend/services/research/finance/finance_data_service.py`
|
|
- Add as optional deliverable: `financial_analysis`
|
|
- Only enable for finance/business industry
|
|
|
|
**Implementation Complexity**: Low (yfinance is straightforward)
|
|
|
|
**Note**: Very niche - only valuable for finance content creators
|
|
|
|
---
|
|
|
|
### 6. Firecrawl Web Crawler ⭐⭐⭐ (MEDIUM PRIORITY)
|
|
|
|
**File**: `firecrawl_web_crawler.py`
|
|
|
|
**Features**:
|
|
- Website crawling (depth-based)
|
|
- URL scraping
|
|
- Structured data extraction (schema-based)
|
|
- Multi-page scraping
|
|
|
|
**Value for Users**:
|
|
- **Content Creators**: Competitor content analysis, inspiration gathering
|
|
- **Marketers**: Competitive intelligence, content gap analysis
|
|
- **Solopreneurs**: Market research, competitor analysis
|
|
|
|
**Migration Priority**: **P2 - Medium**
|
|
|
|
**Integration Points**:
|
|
- Enhance competitor analysis in `ResearchEngine`
|
|
- Create service: `backend/services/research/crawler/firecrawl_service.py`
|
|
- Add to research persona: competitor website analysis
|
|
- Use for onboarding competitor analysis step
|
|
|
|
**Implementation Complexity**: Low (Firecrawl API is simple)
|
|
|
|
**Note**: Could enhance existing competitor analysis feature
|
|
|
|
---
|
|
|
|
### 7. Metaphor AI Integration ⭐⭐ (LOW PRIORITY)
|
|
|
|
**File**: `metaphor_basic_neural_web_search.py`
|
|
|
|
**Features**:
|
|
- Semantic search via Metaphor AI
|
|
- Related article discovery
|
|
|
|
**Value for Users**:
|
|
- Similar to Exa (semantic search)
|
|
- Could be alternative provider
|
|
|
|
**Migration Priority**: **P3 - Low**
|
|
|
|
**Note**: Current system already has Exa for semantic search. Metaphor would be redundant unless Exa has limitations.
|
|
|
|
---
|
|
|
|
## 📊 Migration Priority Matrix
|
|
|
|
| Feature | User Value | Implementation Effort | Priority | Timeline |
|
|
|---------|------------|----------------------|----------|----------|
|
|
| **Google Trends** | ⭐⭐⭐⭐⭐ | Medium | **P0** | Phase 1 |
|
|
| **SERP Analysis** | ⭐⭐⭐⭐ | Low | **P1** | Phase 1 |
|
|
| **Keyword Research** | ⭐⭐⭐⭐ | Medium | **P1** | Phase 1 |
|
|
| **ArXiv Research** | ⭐⭐⭐ | Medium | **P2** | Phase 2 |
|
|
| **Firecrawl** | ⭐⭐⭐ | Low | **P2** | Phase 2 |
|
|
| **Finance Data** | ⭐⭐⭐ | Low | **P2** | Phase 3 (Niche) |
|
|
| **Metaphor AI** | ⭐⭐ | Low | **P3** | Future |
|
|
|
|
---
|
|
|
|
## 🎯 Recommended Migration Plan
|
|
|
|
### Phase 1: High-Impact Features (Weeks 1-4)
|
|
|
|
#### 1.1 Google Trends Integration
|
|
**Goal**: Enable trend analysis for all research queries
|
|
|
|
**Tasks**:
|
|
- [ ] Create `backend/services/research/trends/google_trends_service.py`
|
|
- [ ] Integrate pytrends library
|
|
- [ ] Add trend analysis to `IntentAwareAnalyzer`
|
|
- [ ] Create API endpoint: `POST /api/research/trends/analyze`
|
|
- [ ] Add "Trends" tab to `IntentResultsDisplay`
|
|
- [ ] Add trend visualizations (interest over time, by region)
|
|
- [ ] Add related topics/queries to results
|
|
|
|
**Deliverables**:
|
|
- Interest over time charts
|
|
- Regional interest data
|
|
- Related topics (top & rising)
|
|
- Related queries (top & rising)
|
|
- Trending searches integration
|
|
|
|
#### 1.2 SERP Analysis Enhancement
|
|
**Goal**: Provide competitive search insights
|
|
|
|
**Tasks**:
|
|
- [ ] Create `backend/services/research/serp/google_serp_service.py`
|
|
- [ ] Integrate Serper.dev API
|
|
- [ ] Add SERP analysis to `IntentAwareAnalyzer`
|
|
- [ ] Extract People Also Ask questions
|
|
- [ ] Extract Related Searches
|
|
- [ ] Add SERP insights to results display
|
|
|
|
**Deliverables**:
|
|
- People Also Ask questions
|
|
- Related Searches
|
|
- Top organic results analysis
|
|
- SERP position insights
|
|
|
|
#### 1.3 Keyword Research & Clustering
|
|
**Goal**: Enhanced keyword expansion and clustering
|
|
|
|
**Tasks**:
|
|
- [ ] Create `backend/services/research/keywords/keyword_research_service.py`
|
|
- [ ] Implement Google auto-suggestions expansion
|
|
- [ ] Implement keyword clustering (K-means + TF-IDF)
|
|
- [ ] Add keyword expansion to `UnifiedResearchAnalyzer`
|
|
- [ ] Add keyword clusters to results
|
|
- [ ] Create keyword visualization component
|
|
|
|
**Deliverables**:
|
|
- Expanded keyword suggestions
|
|
- Keyword clusters with themes
|
|
- Relevance scores
|
|
- Keyword grouping visualization
|
|
|
|
### Phase 2: Specialized Features (Weeks 5-8)
|
|
|
|
#### 2.1 ArXiv Academic Research
|
|
**Tasks**:
|
|
- [ ] Create `backend/services/research/academic/arxiv_service.py`
|
|
- [ ] Integrate arXiv API
|
|
- [ ] Add academic mode to research options
|
|
- [ ] Citation network analysis
|
|
- [ ] Academic sources in results
|
|
|
|
#### 2.2 Firecrawl Integration
|
|
**Tasks**:
|
|
- [ ] Create `backend/services/research/crawler/firecrawl_service.py`
|
|
- [ ] Enhance competitor analysis
|
|
- [ ] Add website crawling to research persona generation
|
|
- [ ] Structured data extraction
|
|
|
|
### Phase 3: Niche Features (Weeks 9-12)
|
|
|
|
#### 3.1 Finance Data Research
|
|
**Tasks**:
|
|
- [ ] Create `backend/services/research/finance/finance_data_service.py`
|
|
- [ ] Add finance mode (industry-specific)
|
|
- [ ] Financial analysis deliverables
|
|
- [ ] Market trend visualizations
|
|
|
|
---
|
|
|
|
## 🏗️ Architecture Integration
|
|
|
|
### New Service Structure
|
|
|
|
```
|
|
backend/services/research/
|
|
├── trends/
|
|
│ └── google_trends_service.py # NEW
|
|
├── serp/
|
|
│ └── google_serp_service.py # NEW
|
|
├── keywords/
|
|
│ └── keyword_research_service.py # NEW
|
|
├── academic/
|
|
│ └── arxiv_service.py # NEW
|
|
├── crawler/
|
|
│ └── firecrawl_service.py # NEW
|
|
└── finance/
|
|
└── finance_data_service.py # NEW
|
|
```
|
|
|
|
### Enhanced IntentAwareAnalyzer
|
|
|
|
Add new deliverable types:
|
|
- `trends_analysis`: Google Trends data
|
|
- `serp_analysis`: SERP insights
|
|
- `keyword_clusters`: Clustered keywords
|
|
- `academic_sources`: ArXiv papers
|
|
- `financial_analysis`: Market data
|
|
|
|
### New API Endpoints
|
|
|
|
```
|
|
POST /api/research/trends/analyze # Google Trends analysis
|
|
POST /api/research/keywords/expand # Keyword expansion
|
|
POST /api/research/keywords/cluster # Keyword clustering
|
|
POST /api/research/serp/analyze # SERP analysis
|
|
POST /api/research/academic/search # Academic search
|
|
```
|
|
|
|
---
|
|
|
|
## 💡 User Experience Enhancements
|
|
|
|
### Research Input Enhancements
|
|
|
|
1. **"Analyze Trends" Button**: After intent analysis, show trends button
|
|
2. **"Expand Keywords" Button**: Generate keyword clusters
|
|
3. **"SERP Insights" Toggle**: Include SERP analysis in research
|
|
4. **Research Mode Selector**:
|
|
- Standard (current)
|
|
- Academic (ArXiv)
|
|
- Finance (Market data)
|
|
- Competitive (SERP + Firecrawl)
|
|
|
|
### Results Display Enhancements
|
|
|
|
1. **New Tab: "Trends"**
|
|
- Interest over time chart
|
|
- Regional interest map
|
|
- Related topics/queries
|
|
- Trending searches
|
|
|
|
2. **Enhanced "Sources" Tab**
|
|
- SERP position indicators
|
|
- Academic source badges
|
|
- Source credibility scores
|
|
|
|
3. **New Section: "Keyword Clusters"**
|
|
- Visual keyword grouping
|
|
- Cluster themes
|
|
- Keyword relevance scores
|
|
|
|
4. **New Section: "SERP Insights"**
|
|
- People Also Ask questions
|
|
- Related Searches
|
|
- Top competitor analysis
|
|
|
|
---
|
|
|
|
## 📈 Expected User Value
|
|
|
|
### For Content Creators:
|
|
- ✅ **50% faster** content planning with trend insights
|
|
- ✅ **Better SEO** with keyword clusters and SERP analysis
|
|
- ✅ **Timely content** with interest over time data
|
|
- ✅ **Regional targeting** with geographic insights
|
|
|
|
### For Digital Marketers:
|
|
- ✅ **Competitive intelligence** via SERP analysis
|
|
- ✅ **Content gap identification** via People Also Ask
|
|
- ✅ **Campaign planning** with trending searches
|
|
- ✅ **Keyword strategy** with clustering
|
|
|
|
### For Solopreneurs:
|
|
- ✅ **Market research** without expensive tools
|
|
- ✅ **Content ideas** from related queries
|
|
- ✅ **Audience insights** from regional data
|
|
- ✅ **SEO optimization** with keyword research
|
|
|
|
---
|
|
|
|
## 🔧 Implementation Considerations
|
|
|
|
### Dependencies to Add
|
|
|
|
```python
|
|
# requirements.txt additions
|
|
pytrends>=4.9.2 # Google Trends
|
|
serper>=1.0.0 # SERP API
|
|
scikit-learn>=1.3.0 # Keyword clustering
|
|
arxiv>=2.1.0 # Academic research
|
|
yfinance>=0.2.0 # Finance data
|
|
firecrawl-py>=0.0.1 # Web crawling
|
|
```
|
|
|
|
### Rate Limiting
|
|
|
|
- **Google Trends**: 1 request per second (pytrends handles this)
|
|
- **Serper.dev**: Check API limits
|
|
- **ArXiv**: 3 requests per second
|
|
- **Firecrawl**: Check API limits
|
|
|
|
### Caching Strategy
|
|
|
|
- Cache Google Trends data (24-hour TTL)
|
|
- Cache SERP results (1-hour TTL)
|
|
- Cache keyword clusters (7-day TTL)
|
|
- Cache academic searches (30-day TTL)
|
|
|
|
---
|
|
|
|
## ✅ Success Metrics
|
|
|
|
### Phase 1 Success Criteria:
|
|
- [ ] Google Trends integrated and working
|
|
- [ ] SERP analysis providing insights
|
|
- [ ] Keyword clustering generating useful groups
|
|
- [ ] Users can access trends in research results
|
|
- [ ] 80%+ user satisfaction with new features
|
|
|
|
### Phase 2 Success Criteria:
|
|
- [ ] Academic research mode available
|
|
- [ ] Firecrawl enhancing competitor analysis
|
|
- [ ] Niche users (B2B, finance) finding value
|
|
|
|
---
|
|
|
|
## 🚀 Quick Wins (Can Start Immediately)
|
|
|
|
1. **Google Trends Basic Integration** (2-3 days)
|
|
- Interest over time
|
|
- Related queries
|
|
- Add to results display
|
|
|
|
2. **SERP People Also Ask** (1-2 days)
|
|
- Extract PAA questions
|
|
- Add to deliverables
|
|
- Display in results
|
|
|
|
3. **Keyword Auto-Suggestions** (1-2 days)
|
|
- Google auto-suggestions
|
|
- Add to keyword expansion
|
|
- Display in research input
|
|
|
|
---
|
|
|
|
## 📝 Next Steps
|
|
|
|
1. **Review & Approve**: Get stakeholder approval on priority features
|
|
2. **Phase 1 Planning**: Detailed task breakdown for Phase 1
|
|
3. **API Keys**: Set up Serper.dev, Firecrawl accounts
|
|
4. **Dependencies**: Add required libraries to requirements.txt
|
|
5. **Start Implementation**: Begin with Google Trends (highest value)
|
|
|
|
---
|
|
|
|
**Status**: Analysis Complete - Ready for Implementation Planning
|
|
|
|
**Recommended Action**: Start with Phase 1 (Google Trends + SERP + Keywords) for maximum user value.
|