16 KiB
Legacy Features Migration Analysis
Date: 2025-01-29
Status: Analysis Complete - Ready for Implementation Planning
📋 Executive Summary
After reviewing the legacy ai_web_researcher folder, I've identified high-value features that would significantly enhance the Research Engine for content creators, digital marketing professionals, and solopreneurs. This document provides a prioritized migration plan.
Key Finding: Several legacy features address critical gaps in the current Research Engine, particularly around trend analysis, keyword research, and competitive intelligence.
🎯 User Value Assessment
Content Creators Need:
- ✅ Trending topics to create timely content
- ✅ Keyword research to optimize for SEO
- ✅ Related queries to expand content ideas
- ✅ Interest over time to time content publication
- ✅ Regional insights to target specific audiences
Digital Marketing Professionals Need:
- ✅ SERP analysis to understand competition
- ✅ People Also Ask to optimize content structure
- ✅ Trending searches for campaign planning
- ✅ Keyword clustering for content strategy
- ✅ Competitor analysis via web crawling
Solopreneurs Need:
- ✅ Quick trend insights without expensive tools
- ✅ Keyword suggestions for content planning
- ✅ Market research for business decisions
- ✅ Academic research for thought leadership
- ✅ Financial data for business content
🔍 Legacy Features Analysis
1. Google Trends Researcher ⭐⭐⭐⭐⭐ (HIGHEST PRIORITY)
File: google_trends_researcher.py
Features:
- Interest over time analysis
- Interest by region
- Related topics (top & rising)
- Related queries (top & rising)
- Trending searches (country-specific)
- Realtime trends
- Keyword auto-suggestions expansion
- Keyword clustering (K-means with TF-IDF)
- Google auto-suggestions with relevance scores
Value for Users:
- Content Creators: Identify trending topics, optimal publication timing, regional targeting
- Marketers: Campaign planning, audience insights, keyword opportunities
- Solopreneurs: Market research, content calendar planning, audience discovery
Migration Priority: P0 - Critical
Integration Points:
- Add to
IntentAwareAnalyzeras a deliverable type:trends_analysis - Create new service:
backend/services/research/trends/google_trends_service.py - Add endpoint:
POST /api/research/trends/analyze - Add to
IntentResultsDisplayas new tab: "Trends"
Implementation Complexity: Medium (requires pytrends integration, rate limiting)
2. Google SERP Search ⭐⭐⭐⭐ (HIGH PRIORITY)
File: google_serp_search.py
Features:
- Organic search results with position tracking
- People Also Ask (PAA) extraction
- Related Searches extraction
- Serper.dev integration (fallback to SerpApi)
Value for Users:
- Content Creators: Understand search competition, find content gaps, optimize for featured snippets
- Marketers: SEO analysis, content gap identification, competitor research
- Solopreneurs: Understand search landscape, find opportunities
Migration Priority: P1 - High
Integration Points:
- Enhance
ResearchEnginewith SERP analysis - Add to
IntentAwareAnalyzerdeliverables:serp_analysis,people_also_ask,related_searches - Create service:
backend/services/research/serp/google_serp_service.py - Add to results: SERP insights section
Implementation Complexity: Low (Serper.dev API is straightforward)
Note: Current system uses Google/Gemini grounding, but SERP provides structured competitive data
3. Keyword Research & Clustering ⭐⭐⭐⭐ (HIGH PRIORITY)
File: google_trends_researcher.py (keyword functions)
Features:
- Google auto-suggestions expansion (prefixes & suffixes)
- Keyword clustering using K-means + TF-IDF
- Relevance scoring
- Keyword grouping by themes
Value for Users:
- Content Creators: Content cluster strategy, keyword expansion, topic grouping
- Marketers: SEO keyword research, content pillar planning, keyword mapping
- Solopreneurs: Content planning, SEO optimization
Migration Priority: P1 - High
Integration Points:
- Enhance
UnifiedResearchAnalyzerto include keyword expansion - Add to
IntentAwareAnalyzer:keyword_clusters,expanded_keywords - Create service:
backend/services/research/keywords/keyword_research_service.py - Add to
ResearchInput: "Expand Keywords" button - Display in results: Keyword clusters visualization
Implementation Complexity: Medium (requires ML libraries: sklearn, TF-IDF vectorization)
4. ArXiv Scholarly Research ⭐⭐⭐ (MEDIUM PRIORITY)
File: arxiv_schlorly_research.py
Features:
- Academic paper search
- Citation network analysis
- Paper clustering by topic
- Research paper metadata extraction
- AI-powered query expansion for academic searches
Value for Users:
- Content Creators: Thought leadership content, data-backed articles, research citations
- Marketers: B2B content, whitepapers, authoritative sources
- Solopreneurs: Expert positioning, research-backed content
Migration Priority: P2 - Medium
Integration Points:
- Add as new provider option: "Academic" mode
- Create service:
backend/services/research/academic/arxiv_service.py - Add to
ResearchContext:include_academic: bool - Add to results: Academic sources section
Implementation Complexity: Medium (arXiv API integration, citation parsing)
Note: Valuable for B2B and technical content creators
5. Finance Data Researcher ⭐⭐⭐ (MEDIUM PRIORITY - NICHE)
File: finance_data_researcher.py
Features:
- Stock data analysis (yfinance)
- Technical indicators (MACD, RSI, Bollinger Bands, etc.)
- Market trend analysis
- Financial data visualization
Value for Users:
- Content Creators: Finance/business content, market analysis articles
- Marketers: Financial services content, market insights
- Solopreneurs: Business research, market analysis
Migration Priority: P2 - Medium (Niche)
Integration Points:
- Create specialized service:
backend/services/research/finance/finance_data_service.py - Add as optional deliverable:
financial_analysis - Only enable for finance/business industry
Implementation Complexity: Low (yfinance is straightforward)
Note: Very niche - only valuable for finance content creators
6. Firecrawl Web Crawler ⭐⭐⭐ (MEDIUM PRIORITY)
File: firecrawl_web_crawler.py
Features:
- Website crawling (depth-based)
- URL scraping
- Structured data extraction (schema-based)
- Multi-page scraping
Value for Users:
- Content Creators: Competitor content analysis, inspiration gathering
- Marketers: Competitive intelligence, content gap analysis
- Solopreneurs: Market research, competitor analysis
Migration Priority: P2 - Medium
Integration Points:
- Enhance competitor analysis in
ResearchEngine - Create service:
backend/services/research/crawler/firecrawl_service.py - Add to research persona: competitor website analysis
- Use for onboarding competitor analysis step
Implementation Complexity: Low (Firecrawl API is simple)
Note: Could enhance existing competitor analysis feature
7. Metaphor AI Integration ⭐⭐ (LOW PRIORITY)
File: metaphor_basic_neural_web_search.py
Features:
- Semantic search via Metaphor AI
- Related article discovery
Value for Users:
- Similar to Exa (semantic search)
- Could be alternative provider
Migration Priority: P3 - Low
Note: Current system already has Exa for semantic search. Metaphor would be redundant unless Exa has limitations.
📊 Migration Priority Matrix
| Feature | User Value | Implementation Effort | Priority | Timeline |
|---|---|---|---|---|
| Google Trends | ⭐⭐⭐⭐⭐ | Medium | P0 | Phase 1 |
| SERP Analysis | ⭐⭐⭐⭐ | Low | P1 | Phase 1 |
| Keyword Research | ⭐⭐⭐⭐ | Medium | P1 | Phase 1 |
| ArXiv Research | ⭐⭐⭐ | Medium | P2 | Phase 2 |
| Firecrawl | ⭐⭐⭐ | Low | P2 | Phase 2 |
| Finance Data | ⭐⭐⭐ | Low | P2 | Phase 3 (Niche) |
| Metaphor AI | ⭐⭐ | Low | P3 | Future |
🎯 Recommended Migration Plan
Phase 1: High-Impact Features (Weeks 1-4)
1.1 Google Trends Integration
Goal: Enable trend analysis for all research queries
Tasks:
- Create
backend/services/research/trends/google_trends_service.py - Integrate pytrends library
- Add trend analysis to
IntentAwareAnalyzer - Create API endpoint:
POST /api/research/trends/analyze - Add "Trends" tab to
IntentResultsDisplay - Add trend visualizations (interest over time, by region)
- Add related topics/queries to results
Deliverables:
- Interest over time charts
- Regional interest data
- Related topics (top & rising)
- Related queries (top & rising)
- Trending searches integration
1.2 SERP Analysis Enhancement
Goal: Provide competitive search insights
Tasks:
- Create
backend/services/research/serp/google_serp_service.py - Integrate Serper.dev API
- Add SERP analysis to
IntentAwareAnalyzer - Extract People Also Ask questions
- Extract Related Searches
- Add SERP insights to results display
Deliverables:
- People Also Ask questions
- Related Searches
- Top organic results analysis
- SERP position insights
1.3 Keyword Research & Clustering
Goal: Enhanced keyword expansion and clustering
Tasks:
- Create
backend/services/research/keywords/keyword_research_service.py - Implement Google auto-suggestions expansion
- Implement keyword clustering (K-means + TF-IDF)
- Add keyword expansion to
UnifiedResearchAnalyzer - Add keyword clusters to results
- Create keyword visualization component
Deliverables:
- Expanded keyword suggestions
- Keyword clusters with themes
- Relevance scores
- Keyword grouping visualization
Phase 2: Specialized Features (Weeks 5-8)
2.1 ArXiv Academic Research
Tasks:
- Create
backend/services/research/academic/arxiv_service.py - Integrate arXiv API
- Add academic mode to research options
- Citation network analysis
- Academic sources in results
2.2 Firecrawl Integration
Tasks:
- Create
backend/services/research/crawler/firecrawl_service.py - Enhance competitor analysis
- Add website crawling to research persona generation
- Structured data extraction
Phase 3: Niche Features (Weeks 9-12)
3.1 Finance Data Research
Tasks:
- Create
backend/services/research/finance/finance_data_service.py - Add finance mode (industry-specific)
- Financial analysis deliverables
- Market trend visualizations
🏗️ Architecture Integration
New Service Structure
backend/services/research/
├── trends/
│ └── google_trends_service.py # NEW
├── serp/
│ └── google_serp_service.py # NEW
├── keywords/
│ └── keyword_research_service.py # NEW
├── academic/
│ └── arxiv_service.py # NEW
├── crawler/
│ └── firecrawl_service.py # NEW
└── finance/
└── finance_data_service.py # NEW
Enhanced IntentAwareAnalyzer
Add new deliverable types:
trends_analysis: Google Trends dataserp_analysis: SERP insightskeyword_clusters: Clustered keywordsacademic_sources: ArXiv papersfinancial_analysis: Market data
New API Endpoints
POST /api/research/trends/analyze # Google Trends analysis
POST /api/research/keywords/expand # Keyword expansion
POST /api/research/keywords/cluster # Keyword clustering
POST /api/research/serp/analyze # SERP analysis
POST /api/research/academic/search # Academic search
💡 User Experience Enhancements
Research Input Enhancements
- "Analyze Trends" Button: After intent analysis, show trends button
- "Expand Keywords" Button: Generate keyword clusters
- "SERP Insights" Toggle: Include SERP analysis in research
- Research Mode Selector:
- Standard (current)
- Academic (ArXiv)
- Finance (Market data)
- Competitive (SERP + Firecrawl)
Results Display Enhancements
-
New Tab: "Trends"
- Interest over time chart
- Regional interest map
- Related topics/queries
- Trending searches
-
Enhanced "Sources" Tab
- SERP position indicators
- Academic source badges
- Source credibility scores
-
New Section: "Keyword Clusters"
- Visual keyword grouping
- Cluster themes
- Keyword relevance scores
-
New Section: "SERP Insights"
- People Also Ask questions
- Related Searches
- Top competitor analysis
📈 Expected User Value
For Content Creators:
- ✅ 50% faster content planning with trend insights
- ✅ Better SEO with keyword clusters and SERP analysis
- ✅ Timely content with interest over time data
- ✅ Regional targeting with geographic insights
For Digital Marketers:
- ✅ Competitive intelligence via SERP analysis
- ✅ Content gap identification via People Also Ask
- ✅ Campaign planning with trending searches
- ✅ Keyword strategy with clustering
For Solopreneurs:
- ✅ Market research without expensive tools
- ✅ Content ideas from related queries
- ✅ Audience insights from regional data
- ✅ SEO optimization with keyword research
🔧 Implementation Considerations
Dependencies to Add
# requirements.txt additions
pytrends>=4.9.2 # Google Trends
serper>=1.0.0 # SERP API
scikit-learn>=1.3.0 # Keyword clustering
arxiv>=2.1.0 # Academic research
yfinance>=0.2.0 # Finance data
firecrawl-py>=0.0.1 # Web crawling
Rate Limiting
- Google Trends: 1 request per second (pytrends handles this)
- Serper.dev: Check API limits
- ArXiv: 3 requests per second
- Firecrawl: Check API limits
Caching Strategy
- Cache Google Trends data (24-hour TTL)
- Cache SERP results (1-hour TTL)
- Cache keyword clusters (7-day TTL)
- Cache academic searches (30-day TTL)
✅ Success Metrics
Phase 1 Success Criteria:
- Google Trends integrated and working
- SERP analysis providing insights
- Keyword clustering generating useful groups
- Users can access trends in research results
- 80%+ user satisfaction with new features
Phase 2 Success Criteria:
- Academic research mode available
- Firecrawl enhancing competitor analysis
- Niche users (B2B, finance) finding value
🚀 Quick Wins (Can Start Immediately)
-
Google Trends Basic Integration (2-3 days)
- Interest over time
- Related queries
- Add to results display
-
SERP People Also Ask (1-2 days)
- Extract PAA questions
- Add to deliverables
- Display in results
-
Keyword Auto-Suggestions (1-2 days)
- Google auto-suggestions
- Add to keyword expansion
- Display in research input
📝 Next Steps
- Review & Approve: Get stakeholder approval on priority features
- Phase 1 Planning: Detailed task breakdown for Phase 1
- API Keys: Set up Serper.dev, Firecrawl accounts
- Dependencies: Add required libraries to requirements.txt
- Start Implementation: Begin with Google Trends (highest value)
Status: Analysis Complete - Ready for Implementation Planning
Recommended Action: Start with Phase 1 (Google Trends + SERP + Keywords) for maximum user value.