566 lines
18 KiB
Markdown
566 lines
18 KiB
Markdown
# Codebase Organization & Service Reusability Analysis
|
|
|
|
**Date**: 2025-01-29
|
|
**Status**: Comprehensive Codebase Structure Analysis
|
|
|
|
---
|
|
|
|
## 📋 Overview
|
|
|
|
This document provides a comprehensive analysis of:
|
|
1. **Codebase Organization**: How features are organized across folders
|
|
2. **Service Architecture**: How Exa, Tavily, and Google Search services are structured
|
|
3. **Reusability Analysis**: Whether these services are reusable or tightly integrated
|
|
|
|
---
|
|
|
|
## 🏗️ Codebase Organization
|
|
|
|
### High-Level Structure
|
|
|
|
```
|
|
AI-Writer/
|
|
├── backend/
|
|
│ ├── api/ # API endpoints (FastAPI routers)
|
|
│ ├── services/ # Business logic & service layer
|
|
│ ├── models/ # Database models & schemas
|
|
│ ├── middleware/ # Request/response middleware
|
|
│ ├── utils/ # Utility functions
|
|
│ └── database/ # Database migrations
|
|
│
|
|
├── frontend/
|
|
│ └── src/
|
|
│ ├── components/ # React components
|
|
│ ├── services/ # Frontend API clients
|
|
│ ├── hooks/ # React hooks
|
|
│ └── utils/ # Frontend utilities
|
|
│
|
|
└── docs/ # Documentation
|
|
```
|
|
|
|
---
|
|
|
|
## 📁 Feature Organization by Folder
|
|
|
|
### Backend Services (`backend/services/`)
|
|
|
|
#### **Research Services** (`backend/services/research/`)
|
|
**Purpose**: Core research engine and provider services
|
|
|
|
```
|
|
research/
|
|
├── core/ # Core research engine (standalone)
|
|
│ ├── research_engine.py # Main orchestrator
|
|
│ ├── research_context.py # Unified input schema
|
|
│ └── parameter_optimizer.py # AI-driven parameter optimization
|
|
│
|
|
├── intent/ # Intent-driven research
|
|
│ ├── unified_research_analyzer.py # Single AI call for intent+queries+params
|
|
│ ├── intent_aware_analyzer.py # Result analysis based on intent
|
|
│ └── ...
|
|
│
|
|
├── trends/ # Google Trends integration
|
|
│ └── google_trends_service.py
|
|
│
|
|
├── exa_service.py # ⭐ Reusable Exa API service
|
|
├── tavily_service.py # ⭐ Reusable Tavily API service
|
|
├── google_search_service.py # ⭐ Reusable Google Search service
|
|
│
|
|
├── research_persona_service.py # Persona generation/retrieval
|
|
└── research_persona_prompt_builder.py
|
|
```
|
|
|
|
**Key Features**:
|
|
- Standalone research engine (`ResearchEngine`)
|
|
- Provider services (Exa, Tavily, Google)
|
|
- Intent-driven research system
|
|
- Research persona system
|
|
|
|
---
|
|
|
|
#### **Blog Writer Services** (`backend/services/blog_writer/`)
|
|
**Purpose**: Blog content generation
|
|
|
|
```
|
|
blog_writer/
|
|
├── core/
|
|
│ └── blog_writer_service.py # Main blog generation service
|
|
│
|
|
├── research/ # Blog-specific research providers
|
|
│ ├── research_service.py # Blog research orchestrator
|
|
│ ├── exa_provider.py # Blog-specific Exa wrapper
|
|
│ ├── tavily_provider.py # Blog-specific Tavily wrapper
|
|
│ ├── google_provider.py # Blog-specific Google wrapper
|
|
│ └── research_strategies.py # Research strategies per mode
|
|
│
|
|
├── outline/ # Outline generation
|
|
├── content/ # Content generation
|
|
└── seo/ # SEO optimization
|
|
```
|
|
|
|
**Key Features**:
|
|
- Uses `services.research` services (reusable)
|
|
- Has blog-specific wrappers for providers
|
|
- Research strategies for different blog modes
|
|
|
|
---
|
|
|
|
#### **Other Feature Services**
|
|
|
|
| Service Folder | Purpose | Research Integration |
|
|
|---------------|---------|---------------------|
|
|
| `podcast/` | Podcast generation | Can use Research Engine |
|
|
| `story_writer/` | Story generation | Can use Research Engine |
|
|
| `youtube/` | YouTube content | Can use Research Engine |
|
|
| `linkedin/` | LinkedIn content | Uses GoogleSearchService |
|
|
| `onboarding/` | User onboarding | Uses ExaService for competitor discovery |
|
|
| `content_planning/` | Content planning | Can use Research Engine |
|
|
| `scheduler/` | Task scheduling | Can use Research Engine |
|
|
|
|
---
|
|
|
|
### Backend API (`backend/api/`)
|
|
|
|
#### **Research API** (`backend/api/research/`)
|
|
**Purpose**: Research endpoints
|
|
|
|
```
|
|
api/research/
|
|
├── router.py # Main router
|
|
└── handlers/
|
|
├── providers.py # Provider status endpoints
|
|
├── research.py # Traditional research endpoints
|
|
├── intent.py # Intent-driven endpoints
|
|
└── projects.py # My Projects endpoints
|
|
```
|
|
|
|
**Endpoints**:
|
|
- `POST /api/research/intent/analyze` - Intent analysis
|
|
- `POST /api/research/intent/research` - Intent-driven research
|
|
- `POST /api/research/execute` - Traditional research
|
|
- `GET /api/research/config` - Configuration
|
|
|
|
---
|
|
|
|
#### **Other API Modules**
|
|
|
|
| API Folder | Purpose | Research Integration |
|
|
|-----------|---------|---------------------|
|
|
| `blog_writer/` | Blog endpoints | Uses blog_writer services |
|
|
| `podcast/` | Podcast endpoints | Can use Research Engine |
|
|
| `story_writer/` | Story endpoints | Can use Research Engine |
|
|
| `onboarding_utils/` | Onboarding utilities | Uses ExaService for competitor discovery |
|
|
|
|
---
|
|
|
|
### Frontend Components (`frontend/src/components/`)
|
|
|
|
#### **Research Components** (`frontend/src/components/Research/`)
|
|
**Purpose**: Research UI components
|
|
|
|
```
|
|
Research/
|
|
├── ResearchWizard.tsx # Main wizard orchestrator
|
|
├── steps/
|
|
│ ├── ResearchInput.tsx # Step 1: Input + Intent & Options
|
|
│ ├── StepProgress.tsx # Step 2: Progress/polling
|
|
│ ├── StepResults.tsx # Step 3: Results display
|
|
│ └── components/ # Sub-components
|
|
│ ├── IntentConfirmationPanel.tsx
|
|
│ ├── IntentResultsDisplay.tsx
|
|
│ └── ...
|
|
├── hooks/
|
|
│ ├── useResearchWizard.ts # Wizard state management
|
|
│ ├── useResearchExecution.ts # Research execution
|
|
│ └── useIntentResearch.ts # Intent research flow
|
|
└── types/
|
|
├── research.types.ts # Research types
|
|
└── intent.types.ts # Intent types
|
|
```
|
|
|
|
---
|
|
|
|
## 🔌 Service Architecture: Exa, Tavily, Google Search
|
|
|
|
### Service Design Pattern
|
|
|
|
All three services follow a **similar design pattern**:
|
|
|
|
1. **Standalone Service Classes**: Each service is a self-contained class
|
|
2. **Lazy Initialization**: Services check for API keys on initialization
|
|
3. **Error Handling**: Graceful degradation when API keys are missing
|
|
4. **Standardized Interface**: Similar method signatures across services
|
|
|
|
---
|
|
|
|
### 1. ExaService (`backend/services/research/exa_service.py`)
|
|
|
|
**Design**: ✅ **Reusable Service**
|
|
|
|
```python
|
|
class ExaService:
|
|
"""
|
|
Service for competitor discovery and analysis using the Exa API.
|
|
Uses neural search to find semantically similar websites and content.
|
|
"""
|
|
|
|
def __init__(self):
|
|
"""Initialize with API credentials from environment."""
|
|
self.api_key = os.getenv("EXA_API_KEY")
|
|
self.exa = None
|
|
self.enabled = False
|
|
self._try_initialize()
|
|
|
|
async def discover_competitors(...) -> Dict[str, Any]:
|
|
"""Discover competitors for a given website."""
|
|
|
|
async def discover_social_media_accounts(...) -> Dict[str, Any]:
|
|
"""Discover social media accounts."""
|
|
|
|
async def analyze_competitor_content(...) -> Dict[str, Any]:
|
|
"""Analyze competitor content."""
|
|
```
|
|
|
|
**Key Features**:
|
|
- ✅ **Standalone**: No dependencies on Research Engine
|
|
- ✅ **Reusable**: Can be imported by any module
|
|
- ✅ **Focused**: Primarily for competitor discovery
|
|
- ✅ **Flexible**: Supports various search parameters
|
|
|
|
**Current Usage**:
|
|
1. **Research Engine**: Uses for research queries
|
|
2. **Onboarding**: Uses for competitor discovery (Step 3)
|
|
3. **Blog Writer**: Uses via blog-specific wrapper (`exa_provider.py`)
|
|
|
|
---
|
|
|
|
### 2. TavilyService (`backend/services/research/tavily_service.py`)
|
|
|
|
**Design**: ✅ **Reusable Service**
|
|
|
|
```python
|
|
class TavilyService:
|
|
"""
|
|
Service for web search and research using the Tavily API.
|
|
Provides AI-powered search with real-time information retrieval.
|
|
"""
|
|
|
|
def __init__(self):
|
|
"""Initialize with API credentials from environment."""
|
|
self.api_key = os.getenv("TAVILY_API_KEY")
|
|
self.base_url = "https://api.tavily.com"
|
|
self.enabled = False
|
|
self._try_initialize()
|
|
|
|
async def search(...) -> Dict[str, Any]:
|
|
"""Execute a search query using Tavily API."""
|
|
|
|
async def search_industry_trends(...) -> Dict[str, Any]:
|
|
"""Search for current industry trends."""
|
|
|
|
async def discover_competitors(...) -> Dict[str, Any]:
|
|
"""Discover competitors using Tavily search."""
|
|
```
|
|
|
|
**Key Features**:
|
|
- ✅ **Standalone**: No dependencies on Research Engine
|
|
- ✅ **Reusable**: Can be imported by any module
|
|
- ✅ **Flexible**: Supports various search parameters (topic, depth, time_range, etc.)
|
|
- ✅ **Real-time**: Optimized for current information
|
|
|
|
**Current Usage**:
|
|
1. **Research Engine**: Uses for research queries
|
|
2. **Blog Writer**: Uses via blog-specific wrapper (`tavily_provider.py`)
|
|
|
|
---
|
|
|
|
### 3. GoogleSearchService (`backend/services/research/google_search_service.py`)
|
|
|
|
**Design**: ✅ **Reusable Service**
|
|
|
|
```python
|
|
class GoogleSearchService:
|
|
"""
|
|
Service for conducting real industry research using Google Custom Search API.
|
|
Provides current, relevant industry information for content grounding.
|
|
"""
|
|
|
|
def __init__(self):
|
|
"""Initialize with API credentials from environment."""
|
|
self.api_key = os.getenv("GOOGLE_SEARCH_API_KEY")
|
|
self.search_engine_id = os.getenv("GOOGLE_SEARCH_ENGINE_ID")
|
|
self.enabled = False
|
|
|
|
async def search_industry_trends(...) -> List[Dict[str, Any]]:
|
|
"""Search for current industry trends and insights."""
|
|
```
|
|
|
|
**Key Features**:
|
|
- ✅ **Standalone**: No dependencies on Research Engine
|
|
- ✅ **Reusable**: Can be imported by any module
|
|
- ✅ **Focused**: Industry trend research
|
|
- ✅ **Credibility Scoring**: Built-in source credibility assessment
|
|
|
|
**Current Usage**:
|
|
1. **Research Engine**: Uses as fallback provider
|
|
2. **LinkedIn Service**: Uses for industry research
|
|
|
|
---
|
|
|
|
## 🔄 Reusability Analysis
|
|
|
|
### ✅ **Services ARE Reusable**
|
|
|
|
All three services (Exa, Tavily, Google Search) are **designed to be reusable**:
|
|
|
|
#### **Evidence of Reusability**:
|
|
|
|
1. **Standalone Design**:
|
|
- No dependencies on Research Engine
|
|
- Self-contained initialization
|
|
- Independent error handling
|
|
|
|
2. **Multiple Usage Points**:
|
|
```python
|
|
# Used in Research Engine
|
|
from services.research.exa_service import ExaService
|
|
|
|
# Used in Onboarding
|
|
from services.research.exa_service import ExaService
|
|
|
|
# Used in Blog Writer (via wrapper)
|
|
from services.research.tavily_service import TavilyService
|
|
|
|
# Used in LinkedIn Service
|
|
from services.research import GoogleSearchService
|
|
```
|
|
|
|
3. **Standardized Interface**:
|
|
- Similar method signatures
|
|
- Consistent return formats
|
|
- Environment-based configuration
|
|
|
|
4. **Export Structure**:
|
|
```python
|
|
# backend/services/research/__init__.py
|
|
from .google_search_service import GoogleSearchService
|
|
from .exa_service import ExaService
|
|
from .tavily_service import TavilyService
|
|
|
|
__all__ = [
|
|
"GoogleSearchService",
|
|
"ExaService",
|
|
"TavilyService",
|
|
# ... other exports
|
|
]
|
|
```
|
|
|
|
---
|
|
|
|
### ⚠️ **Integration Patterns**
|
|
|
|
While services are reusable, they are used in different ways:
|
|
|
|
#### **1. Direct Usage** (Most Reusable)
|
|
```python
|
|
# Direct import and use
|
|
from services.research.exa_service import ExaService
|
|
|
|
exa = ExaService()
|
|
result = await exa.discover_competitors(user_url)
|
|
```
|
|
|
|
**Used By**:
|
|
- Onboarding (competitor discovery)
|
|
- Research Engine (research queries)
|
|
|
|
---
|
|
|
|
#### **2. Wrapper Pattern** (Blog Writer)
|
|
```python
|
|
# Blog Writer uses wrappers for blog-specific logic
|
|
from services.research.tavily_service import TavilyService
|
|
|
|
class TavilyResearchProvider:
|
|
def __init__(self):
|
|
self.tavily = TavilyService() # Reuses service
|
|
|
|
async def search(self, prompt, topic, ...):
|
|
# Blog-specific logic + TavilyService
|
|
return await self.tavily.search(...)
|
|
```
|
|
|
|
**Why Wrappers?**:
|
|
- Blog-specific research strategies
|
|
- Blog-specific result formatting
|
|
- Blog-specific error handling
|
|
- Maintains compatibility with existing blog writer code
|
|
|
|
**Location**: `backend/services/blog_writer/research/tavily_provider.py`
|
|
|
|
---
|
|
|
|
#### **3. Engine Orchestration** (Research Engine)
|
|
```python
|
|
# Research Engine orchestrates providers
|
|
from services.research.exa_service import ExaService
|
|
from services.research.tavily_service import TavilyService
|
|
from services.research.google_search_service import GoogleSearchService
|
|
|
|
class ResearchEngine:
|
|
def __init__(self):
|
|
self._exa_provider = ExaService()
|
|
self._tavily_provider = TavilyService()
|
|
self._google_provider = GoogleSearchService()
|
|
|
|
async def research(self, context: ResearchContext):
|
|
# Orchestrates providers based on priority
|
|
if self.exa_available:
|
|
return await self._exa_provider.search(...)
|
|
elif self.tavily_available:
|
|
return await self._tavily_provider.search(...)
|
|
else:
|
|
return await self._google_provider.search_industry_trends(...)
|
|
```
|
|
|
|
**Why Orchestration?**:
|
|
- Provider priority management
|
|
- Fallback logic
|
|
- Unified interface for all tools
|
|
- Research persona integration
|
|
|
|
---
|
|
|
|
## 📊 Service Reusability Matrix
|
|
|
|
| Service | Standalone | Reusable | Current Usage | Integration Pattern |
|
|
|---------|-----------|----------|---------------|-------------------|
|
|
| **ExaService** | ✅ Yes | ✅ Yes | Research Engine, Onboarding, Blog Writer | Direct + Wrapper |
|
|
| **TavilyService** | ✅ Yes | ✅ Yes | Research Engine, Blog Writer | Direct + Wrapper |
|
|
| **GoogleSearchService** | ✅ Yes | ✅ Yes | Research Engine, LinkedIn Service | Direct |
|
|
|
|
---
|
|
|
|
## 🎯 Key Insights
|
|
|
|
### ✅ **Services Are Reusable**
|
|
|
|
1. **No Tight Coupling**: Services don't depend on Research Engine
|
|
2. **Standardized Interface**: Consistent method signatures
|
|
3. **Multiple Usage Points**: Used across different modules
|
|
4. **Environment-Based Config**: No hardcoded dependencies
|
|
|
|
### ⚠️ **Integration Patterns Vary**
|
|
|
|
1. **Direct Usage**: Simple import and use (most reusable)
|
|
2. **Wrapper Pattern**: Blog-specific wrappers (maintains compatibility)
|
|
3. **Engine Orchestration**: Research Engine coordinates providers (unified interface)
|
|
|
|
### 🔄 **Architecture Evolution**
|
|
|
|
**Current State**:
|
|
- Services are reusable ✅
|
|
- Research Engine provides unified interface ✅
|
|
- Blog Writer uses wrappers for compatibility ✅
|
|
|
|
**Future Recommendations**:
|
|
- Consider migrating Blog Writer to use Research Engine directly
|
|
- Standardize on Research Engine for all tools
|
|
- Keep services as low-level building blocks
|
|
|
|
---
|
|
|
|
## 📝 Usage Examples
|
|
|
|
### Example 1: Direct Usage (Onboarding)
|
|
|
|
```python
|
|
# backend/api/onboarding_utils/step3_research_service.py
|
|
from services.research.exa_service import ExaService
|
|
|
|
exa_service = ExaService()
|
|
result = await exa_service.discover_competitors(
|
|
user_url=user_url,
|
|
num_results=10,
|
|
industry_context=industry
|
|
)
|
|
```
|
|
|
|
### Example 2: Wrapper Pattern (Blog Writer)
|
|
|
|
```python
|
|
# backend/services/blog_writer/research/tavily_provider.py
|
|
from services.research.tavily_service import TavilyService
|
|
|
|
class TavilyResearchProvider:
|
|
def __init__(self):
|
|
self.tavily = TavilyService() # Reuses service
|
|
|
|
async def search(self, research_prompt, topic, industry, ...):
|
|
# Blog-specific query building
|
|
query = self._build_blog_query(research_prompt, topic, industry)
|
|
|
|
# Use TavilyService
|
|
result = await self.tavily.search(
|
|
query=query,
|
|
topic="general",
|
|
search_depth="advanced",
|
|
max_results=config.max_sources
|
|
)
|
|
|
|
# Blog-specific result formatting
|
|
return self._format_blog_results(result)
|
|
```
|
|
|
|
### Example 3: Engine Orchestration (Research Engine)
|
|
|
|
```python
|
|
# backend/services/research/core/research_engine.py
|
|
from services.research.exa_service import ExaService
|
|
from services.research.tavily_service import TavilyService
|
|
|
|
class ResearchEngine:
|
|
def __init__(self):
|
|
self._exa_provider = ExaService()
|
|
self._tavily_provider = TavilyService()
|
|
|
|
async def research(self, context: ResearchContext, user_id: str):
|
|
# Get optimized config
|
|
config = self.optimizer.optimize(context)
|
|
|
|
# Execute based on provider priority
|
|
if config.provider == ResearchProvider.EXA:
|
|
return await self._execute_exa_research(context, config, user_id)
|
|
elif config.provider == ResearchProvider.TAVILY:
|
|
return await self._execute_tavily_research(context, config, user_id)
|
|
else:
|
|
return await self._execute_google_research(context, config, user_id)
|
|
```
|
|
|
|
---
|
|
|
|
## ✅ Conclusion
|
|
|
|
### **Services ARE Reusable** ✅
|
|
|
|
- **ExaService**: ✅ Reusable, used in Research Engine, Onboarding, Blog Writer
|
|
- **TavilyService**: ✅ Reusable, used in Research Engine, Blog Writer
|
|
- **GoogleSearchService**: ✅ Reusable, used in Research Engine, LinkedIn Service
|
|
|
|
### **Integration Patterns**:
|
|
|
|
1. **Direct Usage**: Simple import and use (most reusable)
|
|
2. **Wrapper Pattern**: Blog-specific wrappers (maintains compatibility)
|
|
3. **Engine Orchestration**: Research Engine coordinates providers (unified interface)
|
|
|
|
### **Architecture Benefits**:
|
|
|
|
- ✅ **Modularity**: Services are independent building blocks
|
|
- ✅ **Reusability**: Can be used by any module
|
|
- ✅ **Flexibility**: Different integration patterns for different needs
|
|
- ✅ **Maintainability**: Changes to services don't break consumers
|
|
|
|
---
|
|
|
|
**Status**: Services are well-designed for reusability with flexible integration patterns 🚀
|