Files
ALwrity/docs/ALwrity Researcher/CODEBASE_ORGANIZATION_AND_SERVICE_REUSABILITY.md

18 KiB

Codebase Organization & Service Reusability Analysis

Date: 2025-01-29
Status: Comprehensive Codebase Structure Analysis


📋 Overview

This document provides a comprehensive analysis of:

  1. Codebase Organization: How features are organized across folders
  2. Service Architecture: How Exa, Tavily, and Google Search services are structured
  3. Reusability Analysis: Whether these services are reusable or tightly integrated

🏗️ Codebase Organization

High-Level Structure

AI-Writer/
├── backend/
│   ├── api/                    # API endpoints (FastAPI routers)
│   ├── services/               # Business logic & service layer
│   ├── models/                 # Database models & schemas
│   ├── middleware/             # Request/response middleware
│   ├── utils/                  # Utility functions
│   └── database/               # Database migrations
│
├── frontend/
│   └── src/
│       ├── components/         # React components
│       ├── services/            # Frontend API clients
│       ├── hooks/               # React hooks
│       └── utils/               # Frontend utilities
│
└── docs/                        # Documentation

📁 Feature Organization by Folder

Backend Services (backend/services/)

Research Services (backend/services/research/)

Purpose: Core research engine and provider services

research/
├── core/                        # Core research engine (standalone)
│   ├── research_engine.py       # Main orchestrator
│   ├── research_context.py      # Unified input schema
│   └── parameter_optimizer.py  # AI-driven parameter optimization
│
├── intent/                      # Intent-driven research
│   ├── unified_research_analyzer.py  # Single AI call for intent+queries+params
│   ├── intent_aware_analyzer.py      # Result analysis based on intent
│   └── ...
│
├── trends/                      # Google Trends integration
│   └── google_trends_service.py
│
├── exa_service.py               # ⭐ Reusable Exa API service
├── tavily_service.py             # ⭐ Reusable Tavily API service
├── google_search_service.py     # ⭐ Reusable Google Search service
│
├── research_persona_service.py  # Persona generation/retrieval
└── research_persona_prompt_builder.py

Key Features:

  • Standalone research engine (ResearchEngine)
  • Provider services (Exa, Tavily, Google)
  • Intent-driven research system
  • Research persona system

Blog Writer Services (backend/services/blog_writer/)

Purpose: Blog content generation

blog_writer/
├── core/
│   └── blog_writer_service.py   # Main blog generation service
│
├── research/                    # Blog-specific research providers
│   ├── research_service.py      # Blog research orchestrator
│   ├── exa_provider.py          # Blog-specific Exa wrapper
│   ├── tavily_provider.py       # Blog-specific Tavily wrapper
│   ├── google_provider.py       # Blog-specific Google wrapper
│   └── research_strategies.py   # Research strategies per mode
│
├── outline/                     # Outline generation
├── content/                     # Content generation
└── seo/                         # SEO optimization

Key Features:

  • Uses services.research services (reusable)
  • Has blog-specific wrappers for providers
  • Research strategies for different blog modes

Other Feature Services

Service Folder Purpose Research Integration
podcast/ Podcast generation Can use Research Engine
story_writer/ Story generation Can use Research Engine
youtube/ YouTube content Can use Research Engine
linkedin/ LinkedIn content Uses GoogleSearchService
onboarding/ User onboarding Uses ExaService for competitor discovery
content_planning/ Content planning Can use Research Engine
scheduler/ Task scheduling Can use Research Engine

Backend API (backend/api/)

Research API (backend/api/research/)

Purpose: Research endpoints

api/research/
├── router.py                    # Main router
└── handlers/
    ├── providers.py             # Provider status endpoints
    ├── research.py               # Traditional research endpoints
    ├── intent.py                 # Intent-driven endpoints
    └── projects.py               # My Projects endpoints

Endpoints:

  • POST /api/research/intent/analyze - Intent analysis
  • POST /api/research/intent/research - Intent-driven research
  • POST /api/research/execute - Traditional research
  • GET /api/research/config - Configuration

Other API Modules

API Folder Purpose Research Integration
blog_writer/ Blog endpoints Uses blog_writer services
podcast/ Podcast endpoints Can use Research Engine
story_writer/ Story endpoints Can use Research Engine
onboarding_utils/ Onboarding utilities Uses ExaService for competitor discovery

Frontend Components (frontend/src/components/)

Research Components (frontend/src/components/Research/)

Purpose: Research UI components

Research/
├── ResearchWizard.tsx           # Main wizard orchestrator
├── steps/
│   ├── ResearchInput.tsx        # Step 1: Input + Intent & Options
│   ├── StepProgress.tsx         # Step 2: Progress/polling
│   ├── StepResults.tsx          # Step 3: Results display
│   └── components/              # Sub-components
│       ├── IntentConfirmationPanel.tsx
│       ├── IntentResultsDisplay.tsx
│       └── ...
├── hooks/
│   ├── useResearchWizard.ts     # Wizard state management
│   ├── useResearchExecution.ts  # Research execution
│   └── useIntentResearch.ts     # Intent research flow
└── types/
    ├── research.types.ts        # Research types
    └── intent.types.ts          # Intent types

Service Design Pattern

All three services follow a similar design pattern:

  1. Standalone Service Classes: Each service is a self-contained class
  2. Lazy Initialization: Services check for API keys on initialization
  3. Error Handling: Graceful degradation when API keys are missing
  4. Standardized Interface: Similar method signatures across services

1. ExaService (backend/services/research/exa_service.py)

Design: Reusable Service

class ExaService:
    """
    Service for competitor discovery and analysis using the Exa API.
    Uses neural search to find semantically similar websites and content.
    """
    
    def __init__(self):
        """Initialize with API credentials from environment."""
        self.api_key = os.getenv("EXA_API_KEY")
        self.exa = None
        self.enabled = False
        self._try_initialize()
    
    async def discover_competitors(...) -> Dict[str, Any]:
        """Discover competitors for a given website."""
    
    async def discover_social_media_accounts(...) -> Dict[str, Any]:
        """Discover social media accounts."""
    
    async def analyze_competitor_content(...) -> Dict[str, Any]:
        """Analyze competitor content."""

Key Features:

  • Standalone: No dependencies on Research Engine
  • Reusable: Can be imported by any module
  • Focused: Primarily for competitor discovery
  • Flexible: Supports various search parameters

Current Usage:

  1. Research Engine: Uses for research queries
  2. Onboarding: Uses for competitor discovery (Step 3)
  3. Blog Writer: Uses via blog-specific wrapper (exa_provider.py)

2. TavilyService (backend/services/research/tavily_service.py)

Design: Reusable Service

class TavilyService:
    """
    Service for web search and research using the Tavily API.
    Provides AI-powered search with real-time information retrieval.
    """
    
    def __init__(self):
        """Initialize with API credentials from environment."""
        self.api_key = os.getenv("TAVILY_API_KEY")
        self.base_url = "https://api.tavily.com"
        self.enabled = False
        self._try_initialize()
    
    async def search(...) -> Dict[str, Any]:
        """Execute a search query using Tavily API."""
    
    async def search_industry_trends(...) -> Dict[str, Any]:
        """Search for current industry trends."""
    
    async def discover_competitors(...) -> Dict[str, Any]:
        """Discover competitors using Tavily search."""

Key Features:

  • Standalone: No dependencies on Research Engine
  • Reusable: Can be imported by any module
  • Flexible: Supports various search parameters (topic, depth, time_range, etc.)
  • Real-time: Optimized for current information

Current Usage:

  1. Research Engine: Uses for research queries
  2. Blog Writer: Uses via blog-specific wrapper (tavily_provider.py)

3. GoogleSearchService (backend/services/research/google_search_service.py)

Design: Reusable Service

class GoogleSearchService:
    """
    Service for conducting real industry research using Google Custom Search API.
    Provides current, relevant industry information for content grounding.
    """
    
    def __init__(self):
        """Initialize with API credentials from environment."""
        self.api_key = os.getenv("GOOGLE_SEARCH_API_KEY")
        self.search_engine_id = os.getenv("GOOGLE_SEARCH_ENGINE_ID")
        self.enabled = False
    
    async def search_industry_trends(...) -> List[Dict[str, Any]]:
        """Search for current industry trends and insights."""

Key Features:

  • Standalone: No dependencies on Research Engine
  • Reusable: Can be imported by any module
  • Focused: Industry trend research
  • Credibility Scoring: Built-in source credibility assessment

Current Usage:

  1. Research Engine: Uses as fallback provider
  2. LinkedIn Service: Uses for industry research

🔄 Reusability Analysis

Services ARE Reusable

All three services (Exa, Tavily, Google Search) are designed to be reusable:

Evidence of Reusability:

  1. Standalone Design:

    • No dependencies on Research Engine
    • Self-contained initialization
    • Independent error handling
  2. Multiple Usage Points:

    # Used in Research Engine
    from services.research.exa_service import ExaService
    
    # Used in Onboarding
    from services.research.exa_service import ExaService
    
    # Used in Blog Writer (via wrapper)
    from services.research.tavily_service import TavilyService
    
    # Used in LinkedIn Service
    from services.research import GoogleSearchService
    
  3. Standardized Interface:

    • Similar method signatures
    • Consistent return formats
    • Environment-based configuration
  4. Export Structure:

    # backend/services/research/__init__.py
    from .google_search_service import GoogleSearchService
    from .exa_service import ExaService
    from .tavily_service import TavilyService
    
    __all__ = [
        "GoogleSearchService",
        "ExaService",
        "TavilyService",
        # ... other exports
    ]
    

⚠️ Integration Patterns

While services are reusable, they are used in different ways:

1. Direct Usage (Most Reusable)

# Direct import and use
from services.research.exa_service import ExaService

exa = ExaService()
result = await exa.discover_competitors(user_url)

Used By:

  • Onboarding (competitor discovery)
  • Research Engine (research queries)

2. Wrapper Pattern (Blog Writer)

# Blog Writer uses wrappers for blog-specific logic
from services.research.tavily_service import TavilyService

class TavilyResearchProvider:
    def __init__(self):
        self.tavily = TavilyService()  # Reuses service
    
    async def search(self, prompt, topic, ...):
        # Blog-specific logic + TavilyService
        return await self.tavily.search(...)

Why Wrappers?:

  • Blog-specific research strategies
  • Blog-specific result formatting
  • Blog-specific error handling
  • Maintains compatibility with existing blog writer code

Location: backend/services/blog_writer/research/tavily_provider.py


3. Engine Orchestration (Research Engine)

# Research Engine orchestrates providers
from services.research.exa_service import ExaService
from services.research.tavily_service import TavilyService
from services.research.google_search_service import GoogleSearchService

class ResearchEngine:
    def __init__(self):
        self._exa_provider = ExaService()
        self._tavily_provider = TavilyService()
        self._google_provider = GoogleSearchService()
    
    async def research(self, context: ResearchContext):
        # Orchestrates providers based on priority
        if self.exa_available:
            return await self._exa_provider.search(...)
        elif self.tavily_available:
            return await self._tavily_provider.search(...)
        else:
            return await self._google_provider.search_industry_trends(...)

Why Orchestration?:

  • Provider priority management
  • Fallback logic
  • Unified interface for all tools
  • Research persona integration

📊 Service Reusability Matrix

Service Standalone Reusable Current Usage Integration Pattern
ExaService Yes Yes Research Engine, Onboarding, Blog Writer Direct + Wrapper
TavilyService Yes Yes Research Engine, Blog Writer Direct + Wrapper
GoogleSearchService Yes Yes Research Engine, LinkedIn Service Direct

🎯 Key Insights

Services Are Reusable

  1. No Tight Coupling: Services don't depend on Research Engine
  2. Standardized Interface: Consistent method signatures
  3. Multiple Usage Points: Used across different modules
  4. Environment-Based Config: No hardcoded dependencies

⚠️ Integration Patterns Vary

  1. Direct Usage: Simple import and use (most reusable)
  2. Wrapper Pattern: Blog-specific wrappers (maintains compatibility)
  3. Engine Orchestration: Research Engine coordinates providers (unified interface)

🔄 Architecture Evolution

Current State:

  • Services are reusable
  • Research Engine provides unified interface
  • Blog Writer uses wrappers for compatibility

Future Recommendations:

  • Consider migrating Blog Writer to use Research Engine directly
  • Standardize on Research Engine for all tools
  • Keep services as low-level building blocks

📝 Usage Examples

Example 1: Direct Usage (Onboarding)

# backend/api/onboarding_utils/step3_research_service.py
from services.research.exa_service import ExaService

exa_service = ExaService()
result = await exa_service.discover_competitors(
    user_url=user_url,
    num_results=10,
    industry_context=industry
)

Example 2: Wrapper Pattern (Blog Writer)

# backend/services/blog_writer/research/tavily_provider.py
from services.research.tavily_service import TavilyService

class TavilyResearchProvider:
    def __init__(self):
        self.tavily = TavilyService()  # Reuses service
    
    async def search(self, research_prompt, topic, industry, ...):
        # Blog-specific query building
        query = self._build_blog_query(research_prompt, topic, industry)
        
        # Use TavilyService
        result = await self.tavily.search(
            query=query,
            topic="general",
            search_depth="advanced",
            max_results=config.max_sources
        )
        
        # Blog-specific result formatting
        return self._format_blog_results(result)

Example 3: Engine Orchestration (Research Engine)

# backend/services/research/core/research_engine.py
from services.research.exa_service import ExaService
from services.research.tavily_service import TavilyService

class ResearchEngine:
    def __init__(self):
        self._exa_provider = ExaService()
        self._tavily_provider = TavilyService()
    
    async def research(self, context: ResearchContext, user_id: str):
        # Get optimized config
        config = self.optimizer.optimize(context)
        
        # Execute based on provider priority
        if config.provider == ResearchProvider.EXA:
            return await self._execute_exa_research(context, config, user_id)
        elif config.provider == ResearchProvider.TAVILY:
            return await self._execute_tavily_research(context, config, user_id)
        else:
            return await self._execute_google_research(context, config, user_id)

Conclusion

Services ARE Reusable

  • ExaService: Reusable, used in Research Engine, Onboarding, Blog Writer
  • TavilyService: Reusable, used in Research Engine, Blog Writer
  • GoogleSearchService: Reusable, used in Research Engine, LinkedIn Service

Integration Patterns:

  1. Direct Usage: Simple import and use (most reusable)
  2. Wrapper Pattern: Blog-specific wrappers (maintains compatibility)
  3. Engine Orchestration: Research Engine coordinates providers (unified interface)

Architecture Benefits:

  • Modularity: Services are independent building blocks
  • Reusability: Can be used by any module
  • Flexibility: Different integration patterns for different needs
  • Maintainability: Changes to services don't break consumers

Status: Services are well-designed for reusability with flexible integration patterns 🚀