ALwrity Version 0.5.0 (Fastapi + React )

This commit is contained in:
ajaysi
2025-08-06 12:48:02 +05:30
parent f28a919caa
commit 32f97fa6b3
476 changed files with 115544 additions and 28747 deletions

View File

@@ -0,0 +1,857 @@
# 🏗️ Content Planning Services Modularity & Optimization Plan
## 📋 Executive Summary
This document outlines a comprehensive plan to reorganize and optimize the content planning services for better modularity, reusability, and maintainability. The current structure has grown organically and needs systematic reorganization to support future scalability and maintainability.
## 🎯 Objectives
### Primary Goals
1. **Modular Architecture**: Create a well-organized folder structure for content planning services
2. **Code Reusability**: Implement shared utilities and common patterns across modules
3. **Maintainability**: Reduce code duplication and improve code organization
4. **Extensibility**: Design for easy addition of new content planning features
5. **Testing**: Ensure all functionalities are preserved during reorganization
### Secondary Goals
1. **Performance Optimization**: Optimize large modules for better performance
2. **Dependency Management**: Clean up and organize service dependencies
3. **Documentation**: Improve code documentation and API documentation
4. **Error Handling**: Standardize error handling across all modules
## 🏗️ Current Structure Analysis
### Current Services Directory
```
backend/services/
├── content_planning_service.py (21KB, 505 lines)
├── content_planning_db.py (17KB, 388 lines)
├── ai_service_manager.py (30KB, 716 lines)
├── ai_analytics_service.py (43KB, 974 lines)
├── ai_prompt_optimizer.py (23KB, 529 lines)
├── content_gap_analyzer/
│ ├── content_gap_analyzer.py (39KB, 853 lines)
│ ├── competitor_analyzer.py (51KB, 1208 lines)
│ ├── keyword_researcher.py (63KB, 1479 lines)
│ ├── ai_engine_service.py (35KB, 836 lines)
│ └── website_analyzer.py (20KB, 558 lines)
└── [other services...]
```
### Issues Identified
1. **Large Monolithic Files**: Some files exceed 1000+ lines
2. **Scattered Dependencies**: Related services are not grouped together
3. **Code Duplication**: Similar patterns repeated across modules
4. **Mixed Responsibilities**: Single files handling multiple concerns
5. **Inconsistent Structure**: No standardized organization pattern
## 🎯 Proposed New Structure
### Target Directory Structure
```
backend/services/content_planning/
├── __init__.py
├── core/
│ ├── __init__.py
│ ├── base_service.py
│ ├── database_service.py
│ ├── ai_service.py
│ └── validation_service.py
├── modules/
│ ├── __init__.py
│ ├── content_gap_analyzer/
│ │ ├── __init__.py
│ │ ├── analyzer.py
│ │ ├── competitor_analyzer.py
│ │ ├── keyword_researcher.py
│ │ ├── website_analyzer.py
│ │ └── ai_engine_service.py
│ ├── content_strategy/
│ │ ├── __init__.py
│ │ ├── strategy_service.py
│ │ ├── industry_analyzer.py
│ │ ├── audience_analyzer.py
│ │ └── pillar_developer.py
│ ├── calendar_management/
│ │ ├── __init__.py
│ │ ├── calendar_service.py
│ │ ├── scheduler_service.py
│ │ ├── event_manager.py
│ │ └── repurposer.py
│ ├── ai_analytics/
│ │ ├── __init__.py
│ │ ├── analytics_service.py
│ │ ├── predictive_analytics.py
│ │ ├── performance_tracker.py
│ │ └── trend_analyzer.py
│ └── recommendations/
│ ├── __init__.py
│ ├── recommendation_engine.py
│ ├── content_recommender.py
│ ├── optimization_service.py
│ └── priority_scorer.py
├── shared/
│ ├── __init__.py
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── text_processor.py
│ │ ├── data_validator.py
│ │ ├── url_processor.py
│ │ └── metrics_calculator.py
│ ├── constants/
│ │ ├── __init__.py
│ │ ├── content_types.py
│ │ ├── ai_prompts.py
│ │ ├── error_codes.py
│ │ └── config.py
│ └── interfaces/
│ ├── __init__.py
│ ├── service_interface.py
│ ├── data_models.py
│ └── response_models.py
└── main_service.py
```
## 🔄 Migration Strategy
### Phase 1: Core Infrastructure Setup (Week 1)
#### 1.1 Create New Directory Structure
```bash
# Create new content_planning directory
mkdir -p backend/services/content_planning
mkdir -p backend/services/content_planning/core
mkdir -p backend/services/content_planning/modules
mkdir -p backend/services/content_planning/shared
mkdir -p backend/services/content_planning/shared/utils
mkdir -p backend/services/content_planning/shared/constants
mkdir -p backend/services/content_planning/shared/interfaces
```
#### 1.2 Create Base Classes and Interfaces
```python
# backend/services/content_planning/core/base_service.py
from abc import ABC, abstractmethod
from typing import Dict, Any, Optional
from sqlalchemy.orm import Session
class BaseContentService(ABC):
"""Base class for all content planning services."""
def __init__(self, db_session: Optional[Session] = None):
self.db_session = db_session
self.logger = logger
@abstractmethod
async def initialize(self) -> bool:
"""Initialize the service."""
pass
@abstractmethod
async def validate_input(self, data: Dict[str, Any]) -> bool:
"""Validate input data."""
pass
@abstractmethod
async def process(self, data: Dict[str, Any]) -> Dict[str, Any]:
"""Process the main service logic."""
pass
```
#### 1.3 Create Shared Utilities
```python
# backend/services/content_planning/shared/utils/text_processor.py
class TextProcessor:
"""Shared text processing utilities."""
@staticmethod
def clean_text(text: str) -> str:
"""Clean and normalize text."""
pass
@staticmethod
def extract_keywords(text: str) -> List[str]:
"""Extract keywords from text."""
pass
@staticmethod
def calculate_readability(text: str) -> float:
"""Calculate text readability score."""
pass
```
### Phase 2: Content Gap Analyzer Modularization (Week 2)
#### 2.1 Break Down Large Files
**Current**: `content_gap_analyzer.py` (853 lines)
**Target**: Split into focused modules
```python
# backend/services/content_planning/modules/content_gap_analyzer/analyzer.py
class ContentGapAnalyzer(BaseContentService):
"""Main content gap analysis orchestrator."""
def __init__(self, db_session: Optional[Session] = None):
super().__init__(db_session)
self.competitor_analyzer = CompetitorAnalyzer(db_session)
self.keyword_researcher = KeywordResearcher(db_session)
self.website_analyzer = WebsiteAnalyzer(db_session)
self.ai_engine = AIEngineService(db_session)
async def analyze_comprehensive_gap(self, target_url: str, competitor_urls: List[str],
target_keywords: List[str], industry: str) -> Dict[str, Any]:
"""Orchestrate comprehensive content gap analysis."""
# Orchestrate analysis using sub-services
pass
```
#### 2.2 Optimize Competitor Analyzer
**Current**: `competitor_analyzer.py` (1208 lines)
**Target**: Split into focused components
```python
# backend/services/content_planning/modules/content_gap_analyzer/competitor_analyzer.py
class CompetitorAnalyzer(BaseContentService):
"""Competitor analysis service."""
def __init__(self, db_session: Optional[Session] = None):
super().__init__(db_session)
self.market_analyzer = MarketPositionAnalyzer()
self.content_analyzer = ContentStructureAnalyzer()
self.seo_analyzer = SEOAnalyzer()
async def analyze_competitors(self, competitor_urls: List[str], industry: str) -> Dict[str, Any]:
"""Analyze competitors comprehensively."""
# Use sub-components for specific analysis
pass
```
#### 2.3 Optimize Keyword Researcher
**Current**: `keyword_researcher.py` (1479 lines)
**Target**: Split into focused components
```python
# backend/services/content_planning/modules/content_gap_analyzer/keyword_researcher.py
class KeywordResearcher(BaseContentService):
"""Keyword research service."""
def __init__(self, db_session: Optional[Session] = None):
super().__init__(db_session)
self.trend_analyzer = KeywordTrendAnalyzer()
self.intent_analyzer = SearchIntentAnalyzer()
self.opportunity_finder = KeywordOpportunityFinder()
async def research_keywords(self, industry: str, target_keywords: List[str]) -> Dict[str, Any]:
"""Research keywords comprehensively."""
# Use sub-components for specific analysis
pass
```
### Phase 3: Content Strategy Module Creation (Week 3)
#### 3.1 Create Content Strategy Services
```python
# backend/services/content_planning/modules/content_strategy/strategy_service.py
class ContentStrategyService(BaseContentService):
"""Content strategy development service."""
def __init__(self, db_session: Optional[Session] = None):
super().__init__(db_session)
self.industry_analyzer = IndustryAnalyzer()
self.audience_analyzer = AudienceAnalyzer()
self.pillar_developer = ContentPillarDeveloper()
async def develop_strategy(self, industry: str, target_audience: Dict[str, Any],
business_goals: List[str]) -> Dict[str, Any]:
"""Develop comprehensive content strategy."""
pass
```
#### 3.2 Create Industry Analyzer
```python
# backend/services/content_planning/modules/content_strategy/industry_analyzer.py
class IndustryAnalyzer(BaseContentService):
"""Industry analysis service."""
async def analyze_industry_trends(self, industry: str) -> Dict[str, Any]:
"""Analyze industry trends and opportunities."""
pass
async def identify_market_opportunities(self, industry: str) -> List[Dict[str, Any]]:
"""Identify market opportunities in the industry."""
pass
```
#### 3.3 Create Audience Analyzer
```python
# backend/services/content_planning/modules/content_strategy/audience_analyzer.py
class AudienceAnalyzer(BaseContentService):
"""Audience analysis service."""
async def analyze_audience_demographics(self, audience_data: Dict[str, Any]) -> Dict[str, Any]:
"""Analyze audience demographics."""
pass
async def develop_personas(self, audience_data: Dict[str, Any]) -> List[Dict[str, Any]]:
"""Develop audience personas."""
pass
```
### Phase 4: Calendar Management Module Creation (Week 4)
#### 4.1 Create Calendar Services
```python
# backend/services/content_planning/modules/calendar_management/calendar_service.py
class CalendarService(BaseContentService):
"""Calendar management service."""
def __init__(self, db_session: Optional[Session] = None):
super().__init__(db_session)
self.scheduler = SchedulerService()
self.event_manager = EventManager()
self.repurposer = ContentRepurposer()
async def create_event(self, event_data: Dict[str, Any]) -> Dict[str, Any]:
"""Create calendar event."""
pass
async def optimize_schedule(self, events: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Optimize event schedule."""
pass
```
#### 4.2 Create Scheduler Service
```python
# backend/services/content_planning/modules/calendar_management/scheduler_service.py
class SchedulerService(BaseContentService):
"""Smart scheduling service."""
async def optimize_posting_times(self, content_type: str, audience_data: Dict[str, Any]) -> List[str]:
"""Optimize posting times for content."""
pass
async def coordinate_cross_platform(self, events: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Coordinate events across platforms."""
pass
```
### Phase 5: AI Analytics Module Optimization (Week 5)
#### 5.1 Optimize AI Analytics Service
**Current**: `ai_analytics_service.py` (974 lines)
**Target**: Split into focused components
```python
# backend/services/content_planning/modules/ai_analytics/analytics_service.py
class AIAnalyticsService(BaseContentService):
"""AI analytics service."""
def __init__(self, db_session: Optional[Session] = None):
super().__init__(db_session)
self.predictive_analytics = PredictiveAnalytics()
self.performance_tracker = PerformanceTracker()
self.trend_analyzer = TrendAnalyzer()
async def analyze_content_evolution(self, content_data: Dict[str, Any]) -> Dict[str, Any]:
"""Analyze content evolution over time."""
pass
```
#### 5.2 Create Predictive Analytics
```python
# backend/services/content_planning/modules/ai_analytics/predictive_analytics.py
class PredictiveAnalytics(BaseContentService):
"""Predictive analytics service."""
async def predict_content_performance(self, content_data: Dict[str, Any]) -> Dict[str, Any]:
"""Predict content performance."""
pass
async def forecast_trends(self, historical_data: Dict[str, Any]) -> Dict[str, Any]:
"""Forecast content trends."""
pass
```
### Phase 6: Recommendations Module Creation (Week 6)
#### 6.1 Create Recommendation Engine
```python
# backend/services/content_planning/modules/recommendations/recommendation_engine.py
class RecommendationEngine(BaseContentService):
"""Content recommendation engine."""
def __init__(self, db_session: Optional[Session] = None):
super().__init__(db_session)
self.content_recommender = ContentRecommender()
self.optimization_service = OptimizationService()
self.priority_scorer = PriorityScorer()
async def generate_recommendations(self, analysis_data: Dict[str, Any]) -> List[Dict[str, Any]]:
"""Generate content recommendations."""
pass
```
#### 6.2 Create Content Recommender
```python
# backend/services/content_planning/modules/recommendations/content_recommender.py
class ContentRecommender(BaseContentService):
"""Content recommendation service."""
async def recommend_topics(self, industry: str, audience_data: Dict[str, Any]) -> List[str]:
"""Recommend content topics."""
pass
async def recommend_formats(self, topic: str, audience_data: Dict[str, Any]) -> List[str]:
"""Recommend content formats."""
pass
```
## 🔧 Code Optimization Strategies
### 1. Extract Common Patterns
#### 1.1 Database Operations Pattern
```python
# backend/services/content_planning/core/database_service.py
class DatabaseService:
"""Centralized database operations."""
def __init__(self, session: Session):
self.session = session
async def create_record(self, model_class, data: Dict[str, Any]):
"""Create database record with error handling."""
try:
record = model_class(**data)
self.session.add(record)
self.session.commit()
return record
except Exception as e:
self.session.rollback()
logger.error(f"Database creation error: {str(e)}")
raise
async def update_record(self, record, data: Dict[str, Any]):
"""Update database record with error handling."""
try:
for key, value in data.items():
setattr(record, key, value)
self.session.commit()
return record
except Exception as e:
self.session.rollback()
logger.error(f"Database update error: {str(e)}")
raise
```
#### 1.2 AI Service Pattern
```python
# backend/services/content_planning/core/ai_service.py
class AIService:
"""Centralized AI service operations."""
def __init__(self):
self.ai_manager = AIServiceManager()
async def generate_ai_insights(self, service_type: str, data: Dict[str, Any]) -> Dict[str, Any]:
"""Generate AI insights with error handling."""
try:
return await self.ai_manager.generate_analysis(service_type, data)
except Exception as e:
logger.error(f"AI service error: {str(e)}")
return {}
```
### 2. Implement Shared Utilities
#### 2.1 Text Processing Utilities
```python
# backend/services/content_planning/shared/utils/text_processor.py
class TextProcessor:
"""Shared text processing utilities."""
@staticmethod
def clean_text(text: str) -> str:
"""Clean and normalize text."""
import re
# Remove extra whitespace
text = re.sub(r'\s+', ' ', text.strip())
# Remove special characters
text = re.sub(r'[^\w\s]', '', text)
return text
@staticmethod
def extract_keywords(text: str, max_keywords: int = 10) -> List[str]:
"""Extract keywords from text using NLP."""
from collections import Counter
import re
# Tokenize and clean
words = re.findall(r'\b\w+\b', text.lower())
# Remove common stop words
stop_words = {'the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by'}
words = [word for word in words if word not in stop_words and len(word) > 2]
# Count and return top keywords
word_counts = Counter(words)
return [word for word, count in word_counts.most_common(max_keywords)]
@staticmethod
def calculate_readability(text: str) -> float:
"""Calculate Flesch Reading Ease score."""
import re
sentences = len(re.split(r'[.!?]+', text))
words = len(text.split())
syllables = sum(1 for char in text.lower() if char in 'aeiou')
if words == 0 or sentences == 0:
return 0.0
return 206.835 - 1.015 * (words / sentences) - 84.6 * (syllables / words)
```
#### 2.2 Data Validation Utilities
```python
# backend/services/content_planning/shared/utils/data_validator.py
class DataValidator:
"""Shared data validation utilities."""
@staticmethod
def validate_url(url: str) -> bool:
"""Validate URL format."""
import re
pattern = r'^https?://(?:[-\w.])+(?:[:\d]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:[\w.])*)?)?$'
return bool(re.match(pattern, url))
@staticmethod
def validate_email(email: str) -> bool:
"""Validate email format."""
import re
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
return bool(re.match(pattern, email))
@staticmethod
def validate_required_fields(data: Dict[str, Any], required_fields: List[str]) -> bool:
"""Validate required fields are present and not empty."""
for field in required_fields:
if field not in data or not data[field]:
return False
return True
```
### 3. Create Shared Constants
#### 3.1 Content Types Constants
```python
# backend/services/content_planning/shared/constants/content_types.py
from enum import Enum
class ContentType(Enum):
"""Content type enumeration."""
BLOG_POST = "blog_post"
ARTICLE = "article"
VIDEO = "video"
PODCAST = "podcast"
INFOGRAPHIC = "infographic"
WHITEPAPER = "whitepaper"
CASE_STUDY = "case_study"
WEBINAR = "webinar"
SOCIAL_MEDIA_POST = "social_media_post"
EMAIL_NEWSLETTER = "email_newsletter"
class ContentFormat(Enum):
"""Content format enumeration."""
TEXT = "text"
VIDEO = "video"
AUDIO = "audio"
IMAGE = "image"
INTERACTIVE = "interactive"
MIXED = "mixed"
class ContentPriority(Enum):
"""Content priority enumeration."""
HIGH = "high"
MEDIUM = "medium"
LOW = "low"
```
#### 3.2 AI Prompts Constants
```python
# backend/services/content_planning/shared/constants/ai_prompts.py
class AIPrompts:
"""Centralized AI prompts."""
CONTENT_GAP_ANALYSIS = """
As an expert SEO content strategist, analyze this content gap analysis data:
TARGET: {target_url}
INDUSTRY: {industry}
COMPETITORS: {competitor_urls}
KEYWORDS: {target_keywords}
Provide:
1. Strategic content gap analysis
2. Priority content recommendations
3. Keyword strategy insights
4. Implementation timeline
Format as structured JSON.
"""
CONTENT_STRATEGY = """
As a content strategy expert, develop a comprehensive content strategy:
INDUSTRY: {industry}
AUDIENCE: {target_audience}
GOALS: {business_goals}
Provide:
1. Content pillars and themes
2. Content calendar structure
3. Distribution strategy
4. Success metrics
Format as structured JSON.
"""
```
## 🧪 Testing Strategy
### Phase 1: Unit Testing (Week 7)
#### 1.1 Create Test Structure
```
tests/
├── content_planning/
│ ├── __init__.py
│ ├── test_core/
│ │ ├── test_base_service.py
│ │ ├── test_database_service.py
│ │ └── test_ai_service.py
│ ├── test_modules/
│ │ ├── test_content_gap_analyzer/
│ │ ├── test_content_strategy/
│ │ ├── test_calendar_management/
│ │ ├── test_ai_analytics/
│ │ └── test_recommendations/
│ └── test_shared/
│ ├── test_utils/
│ └── test_constants/
```
#### 1.2 Test Base Services
```python
# tests/content_planning/test_core/test_base_service.py
import pytest
from services.content_planning.core.base_service import BaseContentService
class TestBaseService:
"""Test base service functionality."""
def test_initialization(self):
"""Test service initialization."""
service = BaseContentService()
assert service is not None
def test_input_validation(self):
"""Test input validation."""
service = BaseContentService()
# Test valid input
valid_data = {"test": "data"}
assert service.validate_input(valid_data) == True
# Test invalid input
invalid_data = {}
assert service.validate_input(invalid_data) == False
```
### Phase 2: Integration Testing (Week 8)
#### 2.1 Test Module Integration
```python
# tests/content_planning/test_modules/test_content_gap_analyzer/test_analyzer.py
import pytest
from services.content_planning.modules.content_gap_analyzer.analyzer import ContentGapAnalyzer
class TestContentGapAnalyzer:
"""Test content gap analyzer integration."""
@pytest.mark.asyncio
async def test_comprehensive_analysis(self):
"""Test comprehensive gap analysis."""
analyzer = ContentGapAnalyzer()
result = await analyzer.analyze_comprehensive_gap(
target_url="https://example.com",
competitor_urls=["https://competitor1.com", "https://competitor2.com"],
target_keywords=["test", "example"],
industry="technology"
)
assert result is not None
assert "recommendations" in result
assert "gaps" in result
```
#### 2.2 Test Database Integration
```python
# tests/content_planning/test_core/test_database_service.py
import pytest
from services.content_planning.core.database_service import DatabaseService
class TestDatabaseService:
"""Test database service integration."""
@pytest.mark.asyncio
async def test_create_record(self):
"""Test record creation."""
# Test database operations
pass
@pytest.mark.asyncio
async def test_update_record(self):
"""Test record update."""
# Test database operations
pass
```
### Phase 3: Performance Testing (Week 9)
#### 3.1 Load Testing
```python
# tests/content_planning/test_performance/test_load.py
import asyncio
import time
from services.content_planning.main_service import ContentPlanningService
class TestPerformance:
"""Test service performance."""
@pytest.mark.asyncio
async def test_concurrent_requests(self):
"""Test concurrent request handling."""
service = ContentPlanningService()
# Create multiple concurrent requests
tasks = []
for i in range(10):
task = service.analyze_content_gaps_with_ai(
website_url=f"https://example{i}.com",
competitor_urls=["https://competitor.com"],
user_id=1
)
tasks.append(task)
# Execute concurrently
start_time = time.time()
results = await asyncio.gather(*tasks)
end_time = time.time()
# Verify performance
assert end_time - start_time < 30 # Should complete within 30 seconds
assert len(results) == 10 # All requests should complete
```
## 🔄 Migration Implementation Plan
### Week 1: Infrastructure Setup
- [ ] Create new directory structure
- [ ] Implement base classes and interfaces
- [ ] Create shared utilities
- [ ] Set up testing framework
### Week 2: Content Gap Analyzer Migration
- [ ] Break down large files into modules
- [ ] Implement focused components
- [ ] Test individual components
- [ ] Update imports and dependencies
### Week 3: Content Strategy Module
- [ ] Create content strategy services
- [ ] Implement industry analyzer
- [ ] Implement audience analyzer
- [ ] Test strategy components
### Week 4: Calendar Management Module
- [ ] Create calendar services
- [ ] Implement scheduler service
- [ ] Implement event manager
- [ ] Test calendar components
### Week 5: AI Analytics Optimization
- [ ] Optimize AI analytics service
- [ ] Create predictive analytics
- [ ] Implement performance tracker
- [ ] Test AI analytics components
### Week 6: Recommendations Module
- [ ] Create recommendation engine
- [ ] Implement content recommender
- [ ] Implement optimization service
- [ ] Test recommendation components
### Week 7: Unit Testing
- [ ] Test all core services
- [ ] Test all modules
- [ ] Test shared utilities
- [ ] Fix any issues found
### Week 8: Integration Testing
- [ ] Test module integration
- [ ] Test database integration
- [ ] Test AI service integration
- [ ] Fix any issues found
### Week 9: Performance Testing
- [ ] Load testing
- [ ] Performance optimization
- [ ] Memory usage optimization
- [ ] Final validation
## 📊 Success Metrics
### Code Quality Metrics
- [ ] Reduce average file size from 1000+ lines to <500 lines
- [ ] Achieve 90%+ code coverage
- [ ] Reduce code duplication by 60%
- [ ] Improve maintainability index by 40%
### Performance Metrics
- [ ] API response time < 200ms (maintain current performance)
- [ ] Memory usage reduction by 20%
- [ ] CPU usage optimization by 15%
- [ ] Database query optimization by 25%
### Functionality Metrics
- [ ] 100% feature preservation
- [ ] Zero breaking changes
- [ ] Improved error handling
- [ ] Enhanced logging and monitoring
## 🚀 Next Steps
### Immediate Actions (This Week)
1. **Create Migration Plan**: Finalize this document
2. **Set Up Infrastructure**: Create new directory structure
3. **Implement Base Classes**: Create core service infrastructure
4. **Start Testing Framework**: Set up comprehensive testing
### Week 2 Goals
1. **Begin Content Gap Analyzer Migration**: Start with largest files
2. **Implement Shared Utilities**: Create reusable components
3. **Test Individual Components**: Ensure functionality preservation
4. **Update Dependencies**: Fix import paths
### Week 3-4 Goals
1. **Complete Module Migration**: Finish all module reorganization
2. **Optimize Performance**: Implement performance improvements
3. **Comprehensive Testing**: Test all functionality
4. **Documentation Update**: Update all documentation
---
**Document Version**: 1.0
**Last Updated**: 2024-08-01
**Status**: Planning Complete - Ready for Implementation
**Next Steps**: Begin Phase 1 Infrastructure Setup

View File

@@ -0,0 +1,19 @@
"""Services package for ALwrity backend."""
from .api_key_manager import (
APIKeyManager,
OnboardingProgress,
get_onboarding_progress,
StepStatus,
StepData
)
from .validation import check_all_api_keys
__all__ = [
'APIKeyManager',
'OnboardingProgress',
'get_onboarding_progress',
'StepStatus',
'StepData',
'check_all_api_keys'
]

View File

@@ -0,0 +1,286 @@
"""
AI Analysis Database Service
Handles database operations for AI analysis results including storage and retrieval.
"""
from typing import Dict, Any, List, Optional
from sqlalchemy.orm import Session
from sqlalchemy import and_, desc
from datetime import datetime, timedelta
from loguru import logger
from models.content_planning import AIAnalysisResult, ContentStrategy
from services.database import get_db_session
class AIAnalysisDBService:
"""Service for managing AI analysis results in the database."""
def __init__(self, db_session: Session = None):
self.db = db_session or get_db_session()
async def store_ai_analysis_result(
self,
user_id: int,
analysis_type: str,
insights: List[Dict[str, Any]],
recommendations: List[Dict[str, Any]],
performance_metrics: Optional[Dict[str, Any]] = None,
personalized_data: Optional[Dict[str, Any]] = None,
processing_time: Optional[float] = None,
strategy_id: Optional[int] = None,
ai_service_status: str = "operational"
) -> AIAnalysisResult:
"""Store AI analysis result in the database."""
try:
logger.info(f"Storing AI analysis result for user {user_id}, type: {analysis_type}")
# Create new AI analysis result
ai_result = AIAnalysisResult(
user_id=user_id,
strategy_id=strategy_id,
analysis_type=analysis_type,
insights=insights,
recommendations=recommendations,
performance_metrics=performance_metrics,
personalized_data_used=personalized_data,
processing_time=processing_time,
ai_service_status=ai_service_status,
created_at=datetime.utcnow(),
updated_at=datetime.utcnow()
)
self.db.add(ai_result)
self.db.commit()
self.db.refresh(ai_result)
logger.info(f"✅ AI analysis result stored successfully: {ai_result.id}")
return ai_result
except Exception as e:
logger.error(f"❌ Error storing AI analysis result: {str(e)}")
self.db.rollback()
raise
async def get_latest_ai_analysis(
self,
user_id: int,
analysis_type: str,
strategy_id: Optional[int] = None,
max_age_hours: int = 24
) -> Optional[Dict[str, Any]]:
"""
Get the latest AI analysis result with detailed logging.
"""
try:
logger.info(f"🔍 Retrieving latest AI analysis for user {user_id}, type: {analysis_type}")
# Build query
query = self.db.query(AIAnalysisResult).filter(
AIAnalysisResult.user_id == user_id,
AIAnalysisResult.analysis_type == analysis_type
)
if strategy_id:
query = query.filter(AIAnalysisResult.strategy_id == strategy_id)
# Get the most recent result
latest_result = query.order_by(AIAnalysisResult.created_at.desc()).first()
if latest_result:
logger.info(f"✅ Found recent AI analysis result: {latest_result.id}")
# Convert to dictionary and log details
result_dict = {
"id": latest_result.id,
"user_id": latest_result.user_id,
"strategy_id": latest_result.strategy_id,
"analysis_type": latest_result.analysis_type,
"analysis_date": latest_result.created_at.isoformat(),
"results": latest_result.insights or {},
"recommendations": latest_result.recommendations or [],
"personalized_data_used": latest_result.personalized_data_used,
"ai_service_status": latest_result.ai_service_status
}
# Log the detailed structure
logger.info(f"📊 AI Analysis Result Details:")
logger.info(f" - Result ID: {result_dict['id']}")
logger.info(f" - User ID: {result_dict['user_id']}")
logger.info(f" - Strategy ID: {result_dict['strategy_id']}")
logger.info(f" - Analysis Type: {result_dict['analysis_type']}")
logger.info(f" - Analysis Date: {result_dict['analysis_date']}")
logger.info(f" - Personalized Data Used: {result_dict['personalized_data_used']}")
logger.info(f" - AI Service Status: {result_dict['ai_service_status']}")
# Log results structure
results = result_dict.get("results", {})
logger.info(f" - Results Keys: {list(results.keys())}")
logger.info(f" - Results Type: {type(results)}")
# Log recommendations
recommendations = result_dict.get("recommendations", [])
logger.info(f" - Recommendations Count: {len(recommendations)}")
logger.info(f" - Recommendations Type: {type(recommendations)}")
# Log specific data if available
if results:
logger.info("🔍 RESULTS DATA BREAKDOWN:")
for key, value in results.items():
if isinstance(value, list):
logger.info(f" {key}: {len(value)} items")
elif isinstance(value, dict):
logger.info(f" {key}: {len(value)} keys")
else:
logger.info(f" {key}: {value}")
if recommendations:
logger.info("🔍 RECOMMENDATIONS DATA BREAKDOWN:")
for i, rec in enumerate(recommendations[:3]): # Log first 3
if isinstance(rec, dict):
logger.info(f" Recommendation {i+1}: {rec.get('title', 'N/A')}")
logger.info(f" Type: {rec.get('type', 'N/A')}")
logger.info(f" Priority: {rec.get('priority', 'N/A')}")
else:
logger.info(f" Recommendation {i+1}: {rec}")
return result_dict
else:
logger.warning(f"⚠️ No AI analysis result found for user {user_id}, type: {analysis_type}")
return None
except Exception as e:
logger.error(f"❌ Error retrieving latest AI analysis: {str(e)}")
logger.error(f"Exception type: {type(e)}")
import traceback
logger.error(f"Traceback: {traceback.format_exc()}")
return None
async def get_user_ai_analyses(
self,
user_id: int,
analysis_types: Optional[List[str]] = None,
limit: int = 10
) -> List[AIAnalysisResult]:
"""Get all AI analysis results for a user."""
try:
logger.info(f"Retrieving AI analyses for user {user_id}")
query = self.db.query(AIAnalysisResult).filter(
AIAnalysisResult.user_id == user_id
)
# Filter by analysis types if provided
if analysis_types:
query = query.filter(AIAnalysisResult.analysis_type.in_(analysis_types))
results = query.order_by(desc(AIAnalysisResult.created_at)).limit(limit).all()
logger.info(f"✅ Retrieved {len(results)} AI analysis results for user {user_id}")
return results
except Exception as e:
logger.error(f"❌ Error retrieving user AI analyses: {str(e)}")
return []
async def update_ai_analysis_result(
self,
result_id: int,
updates: Dict[str, Any]
) -> Optional[AIAnalysisResult]:
"""Update an existing AI analysis result."""
try:
logger.info(f"Updating AI analysis result: {result_id}")
result = self.db.query(AIAnalysisResult).filter(
AIAnalysisResult.id == result_id
).first()
if not result:
logger.warning(f"AI analysis result not found: {result_id}")
return None
# Update fields
for key, value in updates.items():
if hasattr(result, key):
setattr(result, key, value)
result.updated_at = datetime.utcnow()
self.db.commit()
self.db.refresh(result)
logger.info(f"✅ AI analysis result updated successfully: {result_id}")
return result
except Exception as e:
logger.error(f"❌ Error updating AI analysis result: {str(e)}")
self.db.rollback()
return None
async def delete_old_ai_analyses(
self,
days_old: int = 30
) -> int:
"""Delete AI analysis results older than specified days."""
try:
logger.info(f"Cleaning up AI analysis results older than {days_old} days")
cutoff_date = datetime.utcnow() - timedelta(days=days_old)
deleted_count = self.db.query(AIAnalysisResult).filter(
AIAnalysisResult.created_at < cutoff_date
).delete()
self.db.commit()
logger.info(f"✅ Deleted {deleted_count} old AI analysis results")
return deleted_count
except Exception as e:
logger.error(f"❌ Error deleting old AI analyses: {str(e)}")
self.db.rollback()
return 0
async def get_analysis_statistics(
self,
user_id: Optional[int] = None
) -> Dict[str, Any]:
"""Get statistics about AI analysis results."""
try:
logger.info("Retrieving AI analysis statistics")
query = self.db.query(AIAnalysisResult)
if user_id:
query = query.filter(AIAnalysisResult.user_id == user_id)
total_analyses = query.count()
# Get counts by analysis type
type_counts = {}
for analysis_type in ['performance_trends', 'strategic_intelligence', 'content_evolution', 'gap_analysis']:
count = query.filter(AIAnalysisResult.analysis_type == analysis_type).count()
type_counts[analysis_type] = count
# Get average processing time
avg_processing_time = self.db.query(
self.db.func.avg(AIAnalysisResult.processing_time)
).scalar() or 0
stats = {
'total_analyses': total_analyses,
'analysis_type_counts': type_counts,
'average_processing_time': float(avg_processing_time),
'user_id': user_id
}
logger.info(f"✅ Retrieved AI analysis statistics: {stats}")
return stats
except Exception as e:
logger.error(f"❌ Error retrieving AI analysis statistics: {str(e)}")
return {
'total_analyses': 0,
'analysis_type_counts': {},
'average_processing_time': 0,
'user_id': user_id
}

View File

@@ -0,0 +1,974 @@
"""
AI Analytics Service
Advanced AI-powered analytics for content planning and performance prediction.
"""
from typing import Dict, Any, List, Optional, Tuple
from datetime import datetime, timedelta
import json
from loguru import logger
import asyncio
from sqlalchemy.orm import Session
from services.database import get_db_session
from models.content_planning import ContentAnalytics, ContentStrategy, CalendarEvent
from services.content_gap_analyzer.ai_engine_service import AIEngineService
class AIAnalyticsService:
"""Advanced AI analytics service for content planning."""
def __init__(self):
self.ai_engine = AIEngineService()
self.db_session = None
def _get_db_session(self) -> Session:
"""Get database session."""
if not self.db_session:
self.db_session = get_db_session()
return self.db_session
async def analyze_content_evolution(self, strategy_id: int, time_period: str = "30d") -> Dict[str, Any]:
"""
Analyze content evolution over time for a specific strategy.
Args:
strategy_id: Content strategy ID
time_period: Analysis period (7d, 30d, 90d, 1y)
Returns:
Content evolution analysis results
"""
try:
logger.info(f"Analyzing content evolution for strategy {strategy_id}")
# Get analytics data for the strategy
analytics_data = await self._get_analytics_data(strategy_id, time_period)
# Analyze content performance trends
performance_trends = await self._analyze_performance_trends(analytics_data)
# Analyze content type evolution
content_evolution = await self._analyze_content_type_evolution(analytics_data)
# Analyze audience engagement patterns
engagement_patterns = await self._analyze_engagement_patterns(analytics_data)
evolution_analysis = {
'strategy_id': strategy_id,
'time_period': time_period,
'performance_trends': performance_trends,
'content_evolution': content_evolution,
'engagement_patterns': engagement_patterns,
'recommendations': await self._generate_evolution_recommendations(
performance_trends, content_evolution, engagement_patterns
),
'analysis_date': datetime.utcnow().isoformat()
}
logger.info(f"Content evolution analysis completed for strategy {strategy_id}")
return evolution_analysis
except Exception as e:
logger.error(f"Error analyzing content evolution: {str(e)}")
raise
async def analyze_performance_trends(self, strategy_id: int, metrics: List[str] = None) -> Dict[str, Any]:
"""
Analyze performance trends for content strategy.
Args:
strategy_id: Content strategy ID
metrics: List of metrics to analyze (engagement, reach, conversion, etc.)
Returns:
Performance trend analysis results
"""
try:
logger.info(f"Analyzing performance trends for strategy {strategy_id}")
if not metrics:
metrics = ['engagement_rate', 'reach', 'conversion_rate', 'click_through_rate']
# Get performance data
performance_data = await self._get_performance_data(strategy_id, metrics)
# Analyze trends for each metric
trend_analysis = {}
for metric in metrics:
trend_analysis[metric] = await self._analyze_metric_trend(performance_data, metric)
# Generate predictive insights
predictive_insights = await self._generate_predictive_insights(trend_analysis)
# Calculate performance scores
performance_scores = await self._calculate_performance_scores(trend_analysis)
trend_results = {
'strategy_id': strategy_id,
'metrics_analyzed': metrics,
'trend_analysis': trend_analysis,
'predictive_insights': predictive_insights,
'performance_scores': performance_scores,
'recommendations': await self._generate_trend_recommendations(trend_analysis),
'analysis_date': datetime.utcnow().isoformat()
}
logger.info(f"Performance trend analysis completed for strategy {strategy_id}")
return trend_results
except Exception as e:
logger.error(f"Error analyzing performance trends: {str(e)}")
raise
async def predict_content_performance(self, content_data: Dict[str, Any],
strategy_id: int) -> Dict[str, Any]:
"""
Predict content performance using AI models.
Args:
content_data: Content details (title, description, type, platform, etc.)
strategy_id: Content strategy ID
Returns:
Performance prediction results
"""
try:
logger.info(f"Predicting performance for content in strategy {strategy_id}")
# Get historical performance data
historical_data = await self._get_historical_performance_data(strategy_id)
# Analyze content characteristics
content_analysis = await self._analyze_content_characteristics(content_data)
# Calculate success probability
success_probability = await self._calculate_success_probability({}, historical_data)
# Generate optimization recommendations
optimization_recommendations = await self._generate_optimization_recommendations(
content_data, {}, success_probability
)
prediction_results = {
'strategy_id': strategy_id,
'content_data': content_data,
'performance_prediction': {},
'success_probability': success_probability,
'optimization_recommendations': optimization_recommendations,
'confidence_score': 0.7,
'prediction_date': datetime.utcnow().isoformat()
}
logger.info(f"Content performance prediction completed")
return prediction_results
except Exception as e:
logger.error(f"Error predicting content performance: {str(e)}")
raise
async def generate_strategic_intelligence(self, strategy_id: int,
market_data: Dict[str, Any] = None) -> Dict[str, Any]:
"""
Generate strategic intelligence for content planning.
Args:
strategy_id: Content strategy ID
market_data: Additional market data for analysis
Returns:
Strategic intelligence results
"""
try:
logger.info(f"Generating strategic intelligence for strategy {strategy_id}")
# Get strategy data
strategy_data = await self._get_strategy_data(strategy_id)
# Analyze market positioning
market_positioning = await self._analyze_market_positioning(strategy_data, market_data)
# Identify competitive advantages
competitive_advantages = await self._identify_competitive_advantages(strategy_data)
# Calculate strategic scores
strategic_scores = await self._calculate_strategic_scores(
strategy_data, market_positioning, competitive_advantages
)
intelligence_results = {
'strategy_id': strategy_id,
'market_positioning': market_positioning,
'competitive_advantages': competitive_advantages,
'strategic_scores': strategic_scores,
'risk_assessment': await self._assess_strategic_risks(strategy_data),
'opportunity_analysis': await self._analyze_strategic_opportunities(strategy_data),
'analysis_date': datetime.utcnow().isoformat()
}
logger.info(f"Strategic intelligence generation completed")
return intelligence_results
except Exception as e:
logger.error(f"Error generating strategic intelligence: {str(e)}")
raise
# Helper methods for data retrieval and analysis
async def _get_analytics_data(self, strategy_id: int, time_period: str) -> List[Dict[str, Any]]:
"""Get analytics data for the specified strategy and time period."""
try:
session = self._get_db_session()
# Calculate date range
end_date = datetime.utcnow()
if time_period == "7d":
start_date = end_date - timedelta(days=7)
elif time_period == "30d":
start_date = end_date - timedelta(days=30)
elif time_period == "90d":
start_date = end_date - timedelta(days=90)
elif time_period == "1y":
start_date = end_date - timedelta(days=365)
else:
start_date = end_date - timedelta(days=30)
# Query analytics data
analytics = session.query(ContentAnalytics).filter(
ContentAnalytics.strategy_id == strategy_id,
ContentAnalytics.recorded_at >= start_date,
ContentAnalytics.recorded_at <= end_date
).all()
return [analytics.to_dict() for analytics in analytics]
except Exception as e:
logger.error(f"Error getting analytics data: {str(e)}")
return []
async def _analyze_performance_trends(self, analytics_data: List[Dict[str, Any]]) -> Dict[str, Any]:
"""Analyze performance trends from analytics data."""
try:
if not analytics_data:
return {'trend': 'stable', 'growth_rate': 0, 'insights': 'No data available'}
# Calculate trend metrics
total_analytics = len(analytics_data)
avg_performance = sum(item.get('performance_score', 0) for item in analytics_data) / total_analytics
# Determine trend direction
if avg_performance > 0.7:
trend = 'increasing'
elif avg_performance < 0.3:
trend = 'decreasing'
else:
trend = 'stable'
return {
'trend': trend,
'average_performance': avg_performance,
'total_analytics': total_analytics,
'insights': f'Performance is {trend} with average score of {avg_performance:.2f}'
}
except Exception as e:
logger.error(f"Error analyzing performance trends: {str(e)}")
return {'trend': 'unknown', 'error': str(e)}
async def _analyze_content_type_evolution(self, analytics_data: List[Dict[str, Any]]) -> Dict[str, Any]:
"""Analyze how content types have evolved over time."""
try:
content_types = {}
for data in analytics_data:
content_type = data.get('content_type', 'unknown')
if content_type not in content_types:
content_types[content_type] = {
'count': 0,
'total_performance': 0,
'avg_performance': 0
}
content_types[content_type]['count'] += 1
content_types[content_type]['total_performance'] += data.get('performance_score', 0)
# Calculate averages
for content_type in content_types:
if content_types[content_type]['count'] > 0:
content_types[content_type]['avg_performance'] = (
content_types[content_type]['total_performance'] /
content_types[content_type]['count']
)
return {
'content_types': content_types,
'most_performing_type': max(content_types.items(), key=lambda x: x[1]['avg_performance'])[0] if content_types else None,
'evolution_insights': 'Content type performance analysis completed'
}
except Exception as e:
logger.error(f"Error analyzing content type evolution: {str(e)}")
return {'error': str(e)}
async def _analyze_engagement_patterns(self, analytics_data: List[Dict[str, Any]]) -> Dict[str, Any]:
"""Analyze audience engagement patterns."""
try:
if not analytics_data:
return {'patterns': {}, 'insights': 'No engagement data available'}
# Analyze engagement by platform
platform_engagement = {}
for data in analytics_data:
platform = data.get('platform', 'unknown')
if platform not in platform_engagement:
platform_engagement[platform] = {
'total_engagement': 0,
'count': 0,
'avg_engagement': 0
}
metrics = data.get('metrics', {})
engagement = metrics.get('engagement_rate', 0)
platform_engagement[platform]['total_engagement'] += engagement
platform_engagement[platform]['count'] += 1
# Calculate averages
for platform in platform_engagement:
if platform_engagement[platform]['count'] > 0:
platform_engagement[platform]['avg_engagement'] = (
platform_engagement[platform]['total_engagement'] /
platform_engagement[platform]['count']
)
return {
'platform_engagement': platform_engagement,
'best_platform': max(platform_engagement.items(), key=lambda x: x[1]['avg_engagement'])[0] if platform_engagement else None,
'insights': 'Platform engagement analysis completed'
}
except Exception as e:
logger.error(f"Error analyzing engagement patterns: {str(e)}")
return {'error': str(e)}
async def _generate_evolution_recommendations(self, performance_trends: Dict[str, Any],
content_evolution: Dict[str, Any],
engagement_patterns: Dict[str, Any]) -> List[Dict[str, Any]]:
"""Generate recommendations based on evolution analysis."""
recommendations = []
try:
# Performance-based recommendations
if performance_trends.get('trend') == 'decreasing':
recommendations.append({
'type': 'performance_optimization',
'priority': 'high',
'title': 'Improve Content Performance',
'description': 'Content performance is declining. Focus on quality and engagement.',
'action_items': [
'Review and improve content quality',
'Optimize for audience engagement',
'Analyze competitor strategies'
]
})
# Content type recommendations
if content_evolution.get('most_performing_type'):
best_type = content_evolution['most_performing_type']
recommendations.append({
'type': 'content_strategy',
'priority': 'medium',
'title': f'Focus on {best_type} Content',
'description': f'{best_type} content is performing best. Increase focus on this type.',
'action_items': [
f'Increase {best_type} content production',
'Analyze what makes this content successful',
'Optimize other content types based on learnings'
]
})
# Platform recommendations
if engagement_patterns.get('best_platform'):
best_platform = engagement_patterns['best_platform']
recommendations.append({
'type': 'platform_strategy',
'priority': 'medium',
'title': f'Optimize for {best_platform}',
'description': f'{best_platform} shows highest engagement. Focus optimization efforts here.',
'action_items': [
f'Increase content for {best_platform}',
f'Optimize content format for platform',
'Use platform-specific features'
]
})
return recommendations
except Exception as e:
logger.error(f"Error generating evolution recommendations: {str(e)}")
return [{'error': str(e)}]
async def _get_performance_data(self, strategy_id: int, metrics: List[str]) -> List[Dict[str, Any]]:
"""Get performance data for specified metrics."""
try:
session = self._get_db_session()
# Get analytics data for the strategy
analytics = session.query(ContentAnalytics).filter(
ContentAnalytics.strategy_id == strategy_id
).all()
return [analytics.to_dict() for analytics in analytics]
except Exception as e:
logger.error(f"Error getting performance data: {str(e)}")
return []
async def _analyze_metric_trend(self, performance_data: List[Dict[str, Any]], metric: str) -> Dict[str, Any]:
"""Analyze trend for a specific metric."""
try:
if not performance_data:
return {'trend': 'no_data', 'value': 0, 'change': 0}
# Extract metric values
metric_values = []
for data in performance_data:
metrics = data.get('metrics', {})
if metric in metrics:
metric_values.append(metrics[metric])
if not metric_values:
return {'trend': 'no_data', 'value': 0, 'change': 0}
# Calculate trend
avg_value = sum(metric_values) / len(metric_values)
# Simple trend calculation
if len(metric_values) >= 2:
recent_avg = sum(metric_values[-len(metric_values)//2:]) / (len(metric_values)//2)
older_avg = sum(metric_values[:len(metric_values)//2]) / (len(metric_values)//2)
change = ((recent_avg - older_avg) / older_avg * 100) if older_avg > 0 else 0
else:
change = 0
# Determine trend direction
if change > 5:
trend = 'increasing'
elif change < -5:
trend = 'decreasing'
else:
trend = 'stable'
return {
'trend': trend,
'value': avg_value,
'change_percent': change,
'data_points': len(metric_values)
}
except Exception as e:
logger.error(f"Error analyzing metric trend: {str(e)}")
return {'trend': 'error', 'error': str(e)}
async def _generate_predictive_insights(self, trend_analysis: Dict[str, Any]) -> Dict[str, Any]:
"""Generate predictive insights based on trend analysis."""
try:
insights = {
'predicted_performance': 'stable',
'confidence_level': 'medium',
'key_factors': [],
'recommendations': []
}
# Analyze trends to generate insights
increasing_metrics = []
decreasing_metrics = []
for metric, analysis in trend_analysis.items():
if analysis.get('trend') == 'increasing':
increasing_metrics.append(metric)
elif analysis.get('trend') == 'decreasing':
decreasing_metrics.append(metric)
if len(increasing_metrics) > len(decreasing_metrics):
insights['predicted_performance'] = 'improving'
insights['confidence_level'] = 'high' if len(increasing_metrics) > 2 else 'medium'
elif len(decreasing_metrics) > len(increasing_metrics):
insights['predicted_performance'] = 'declining'
insights['confidence_level'] = 'high' if len(decreasing_metrics) > 2 else 'medium'
insights['key_factors'] = increasing_metrics + decreasing_metrics
insights['recommendations'] = [
f'Focus on improving {", ".join(decreasing_metrics)}' if decreasing_metrics else 'Maintain current performance',
f'Leverage success in {", ".join(increasing_metrics)}' if increasing_metrics else 'Identify new growth opportunities'
]
return insights
except Exception as e:
logger.error(f"Error generating predictive insights: {str(e)}")
return {'error': str(e)}
async def _calculate_performance_scores(self, trend_analysis: Dict[str, Any]) -> Dict[str, float]:
"""Calculate performance scores based on trend analysis."""
try:
scores = {}
for metric, analysis in trend_analysis.items():
base_score = analysis.get('value', 0)
change = analysis.get('change_percent', 0)
# Adjust score based on trend
if analysis.get('trend') == 'increasing':
adjusted_score = base_score * (1 + abs(change) / 100)
elif analysis.get('trend') == 'decreasing':
adjusted_score = base_score * (1 - abs(change) / 100)
else:
adjusted_score = base_score
scores[metric] = min(adjusted_score, 1.0) # Cap at 1.0
return scores
except Exception as e:
logger.error(f"Error calculating performance scores: {str(e)}")
return {}
async def _generate_trend_recommendations(self, trend_analysis: Dict[str, Any]) -> List[Dict[str, Any]]:
"""Generate recommendations based on trend analysis."""
recommendations = []
try:
for metric, analysis in trend_analysis.items():
if analysis.get('trend') == 'decreasing':
recommendations.append({
'type': 'metric_optimization',
'priority': 'high',
'metric': metric,
'title': f'Improve {metric.replace("_", " ").title()}',
'description': f'{metric} is declining. Focus on optimization.',
'action_items': [
f'Analyze factors affecting {metric}',
'Review content strategy for this metric',
'Implement optimization strategies'
]
})
elif analysis.get('trend') == 'increasing':
recommendations.append({
'type': 'metric_leverage',
'priority': 'medium',
'metric': metric,
'title': f'Leverage {metric.replace("_", " ").title()} Success',
'description': f'{metric} is improving. Build on this success.',
'action_items': [
f'Identify what\'s driving {metric} improvement',
'Apply successful strategies to other metrics',
'Scale successful approaches'
]
})
return recommendations
except Exception as e:
logger.error(f"Error generating trend recommendations: {str(e)}")
return [{'error': str(e)}]
async def _analyze_single_competitor(self, url: str, analysis_period: str) -> Dict[str, Any]:
"""Analyze a single competitor's content strategy."""
try:
# This would integrate with the competitor analyzer service
# For now, return mock data
return {
'url': url,
'content_frequency': 'weekly',
'content_types': ['blog', 'video', 'social'],
'engagement_rate': 0.75,
'top_performing_content': ['How-to guides', 'Industry insights'],
'publishing_schedule': ['Tuesday', 'Thursday'],
'content_themes': ['Educational', 'Thought leadership', 'Engagement']
}
except Exception as e:
logger.error(f"Error analyzing competitor {url}: {str(e)}")
return {'url': url, 'error': str(e)}
async def _compare_competitor_strategies(self, competitor_analyses: List[Dict[str, Any]]) -> Dict[str, Any]:
"""Compare strategies across competitors."""
try:
if not competitor_analyses:
return {'comparison': 'no_data'}
# Analyze common patterns
content_types = set()
themes = set()
schedules = set()
for analysis in competitor_analyses:
if 'content_types' in analysis:
content_types.update(analysis['content_types'])
if 'content_themes' in analysis:
themes.update(analysis['content_themes'])
if 'publishing_schedule' in analysis:
schedules.update(analysis['publishing_schedule'])
return {
'common_content_types': list(content_types),
'common_themes': list(themes),
'common_schedules': list(schedules),
'competitive_landscape': 'analyzed',
'insights': f'Found {len(content_types)} content types, {len(themes)} themes across competitors'
}
except Exception as e:
logger.error(f"Error comparing competitor strategies: {str(e)}")
return {'error': str(e)}
async def _identify_market_trends(self, competitor_analyses: List[Dict[str, Any]]) -> Dict[str, Any]:
"""Identify market trends from competitor analysis."""
try:
trends = {
'popular_content_types': [],
'emerging_themes': [],
'publishing_patterns': [],
'engagement_trends': []
}
# Analyze trends from competitor data
content_type_counts = {}
theme_counts = {}
for analysis in competitor_analyses:
for content_type in analysis.get('content_types', []):
content_type_counts[content_type] = content_type_counts.get(content_type, 0) + 1
for theme in analysis.get('content_themes', []):
theme_counts[theme] = theme_counts.get(theme, 0) + 1
trends['popular_content_types'] = sorted(content_type_counts.items(), key=lambda x: x[1], reverse=True)
trends['emerging_themes'] = sorted(theme_counts.items(), key=lambda x: x[1], reverse=True)
return trends
except Exception as e:
logger.error(f"Error identifying market trends: {str(e)}")
return {'error': str(e)}
async def _generate_competitor_recommendations(self, competitor_analyses: List[Dict[str, Any]],
strategy_comparison: Dict[str, Any],
market_trends: Dict[str, Any]) -> List[Dict[str, Any]]:
"""Generate recommendations based on competitor analysis."""
recommendations = []
try:
# Identify opportunities
popular_types = [item[0] for item in market_trends.get('popular_content_types', [])]
if popular_types:
recommendations.append({
'type': 'content_strategy',
'priority': 'high',
'title': 'Focus on Popular Content Types',
'description': f'Competitors are successfully using: {", ".join(popular_types[:3])}',
'action_items': [
'Analyze successful content in these categories',
'Develop content strategy for popular types',
'Differentiate while following proven patterns'
]
})
# Identify gaps
all_competitor_themes = set()
for analysis in competitor_analyses:
all_competitor_themes.update(analysis.get('content_themes', []))
if all_competitor_themes:
recommendations.append({
'type': 'competitive_advantage',
'priority': 'medium',
'title': 'Identify Content Gaps',
'description': 'Look for opportunities competitors are missing',
'action_items': [
'Analyze underserved content areas',
'Identify unique positioning opportunities',
'Develop differentiated content strategy'
]
})
return recommendations
except Exception as e:
logger.error(f"Error generating competitor recommendations: {str(e)}")
return [{'error': str(e)}]
async def _get_historical_performance_data(self, strategy_id: int) -> List[Dict[str, Any]]:
"""Get historical performance data for the strategy."""
try:
session = self._get_db_session()
analytics = session.query(ContentAnalytics).filter(
ContentAnalytics.strategy_id == strategy_id
).all()
return [analytics.to_dict() for analytics in analytics]
except Exception as e:
logger.error(f"Error getting historical performance data: {str(e)}")
return []
async def _analyze_content_characteristics(self, content_data: Dict[str, Any]) -> Dict[str, Any]:
"""Analyze content characteristics for performance prediction."""
try:
characteristics = {
'content_type': content_data.get('content_type', 'unknown'),
'platform': content_data.get('platform', 'unknown'),
'estimated_length': content_data.get('estimated_length', 'medium'),
'complexity': 'medium',
'engagement_potential': 'medium',
'seo_potential': 'medium'
}
# Analyze title and description
title = content_data.get('title', '')
description = content_data.get('description', '')
if title and description:
characteristics['content_richness'] = 'high' if len(description) > 200 else 'medium'
characteristics['title_optimization'] = 'good' if len(title) > 20 and len(title) < 60 else 'needs_improvement'
return characteristics
except Exception as e:
logger.error(f"Error analyzing content characteristics: {str(e)}")
return {'error': str(e)}
async def _calculate_success_probability(self, performance_prediction: Dict[str, Any],
historical_data: List[Dict[str, Any]]) -> float:
"""Calculate success probability based on prediction and historical data."""
try:
base_probability = 0.5
# Adjust based on historical performance
if historical_data:
avg_historical_performance = sum(
data.get('performance_score', 0) for data in historical_data
) / len(historical_data)
if avg_historical_performance > 0.7:
base_probability += 0.1
elif avg_historical_performance < 0.3:
base_probability -= 0.1
return min(max(base_probability, 0.0), 1.0)
except Exception as e:
logger.error(f"Error calculating success probability: {str(e)}")
return 0.5
async def _generate_optimization_recommendations(self, content_data: Dict[str, Any],
performance_prediction: Dict[str, Any],
success_probability: float) -> List[Dict[str, Any]]:
"""Generate optimization recommendations for content."""
recommendations = []
try:
# Performance-based recommendations
if success_probability < 0.5:
recommendations.append({
'type': 'content_optimization',
'priority': 'high',
'title': 'Improve Content Quality',
'description': 'Content has low success probability. Focus on quality improvements.',
'action_items': [
'Enhance content depth and value',
'Improve title and description',
'Optimize for target audience'
]
})
# Platform-specific recommendations
platform = content_data.get('platform', '')
if platform:
recommendations.append({
'type': 'platform_optimization',
'priority': 'medium',
'title': f'Optimize for {platform}',
'description': f'Ensure content is optimized for {platform} platform.',
'action_items': [
f'Follow {platform} best practices',
'Optimize content format for platform',
'Use platform-specific features'
]
})
return recommendations
except Exception as e:
logger.error(f"Error generating optimization recommendations: {str(e)}")
return [{'error': str(e)}]
async def _get_strategy_data(self, strategy_id: int) -> Dict[str, Any]:
"""Get strategy data for analysis."""
try:
session = self._get_db_session()
strategy = session.query(ContentStrategy).filter(
ContentStrategy.id == strategy_id
).first()
if strategy:
return strategy.to_dict()
else:
return {}
except Exception as e:
logger.error(f"Error getting strategy data: {str(e)}")
return {}
async def _analyze_market_positioning(self, strategy_data: Dict[str, Any],
market_data: Dict[str, Any] = None) -> Dict[str, Any]:
"""Analyze market positioning for the strategy."""
try:
positioning = {
'industry_position': 'established',
'competitive_advantage': 'content_quality',
'market_share': 'medium',
'differentiation_factors': []
}
# Analyze based on strategy data
industry = strategy_data.get('industry', '')
if industry:
positioning['industry_position'] = 'established' if industry in ['tech', 'finance', 'healthcare'] else 'emerging'
# Analyze content pillars
content_pillars = strategy_data.get('content_pillars', [])
if content_pillars:
positioning['differentiation_factors'] = [pillar.get('name', '') for pillar in content_pillars]
return positioning
except Exception as e:
logger.error(f"Error analyzing market positioning: {str(e)}")
return {'error': str(e)}
async def _identify_competitive_advantages(self, strategy_data: Dict[str, Any]) -> List[Dict[str, Any]]:
"""Identify competitive advantages for the strategy."""
try:
advantages = []
# Analyze content pillars for advantages
content_pillars = strategy_data.get('content_pillars', [])
for pillar in content_pillars:
advantages.append({
'type': 'content_pillar',
'name': pillar.get('name', ''),
'description': pillar.get('description', ''),
'strength': 'high' if pillar.get('frequency') == 'weekly' else 'medium'
})
# Analyze target audience
target_audience = strategy_data.get('target_audience', {})
if target_audience:
advantages.append({
'type': 'audience_focus',
'name': 'Targeted Audience',
'description': 'Well-defined target audience',
'strength': 'high'
})
return advantages
except Exception as e:
logger.error(f"Error identifying competitive advantages: {str(e)}")
return []
async def _calculate_strategic_scores(self, strategy_data: Dict[str, Any],
market_positioning: Dict[str, Any],
competitive_advantages: List[Dict[str, Any]]) -> Dict[str, float]:
"""Calculate strategic scores for the strategy."""
try:
scores = {
'market_positioning_score': 0.7,
'competitive_advantage_score': 0.8,
'content_strategy_score': 0.75,
'overall_strategic_score': 0.75
}
# Adjust scores based on analysis
if market_positioning.get('industry_position') == 'established':
scores['market_positioning_score'] += 0.1
if len(competitive_advantages) > 2:
scores['competitive_advantage_score'] += 0.1
# Calculate overall score
scores['overall_strategic_score'] = sum(scores.values()) / len(scores)
return scores
except Exception as e:
logger.error(f"Error calculating strategic scores: {str(e)}")
return {'error': str(e)}
async def _assess_strategic_risks(self, strategy_data: Dict[str, Any]) -> List[Dict[str, Any]]:
"""Assess strategic risks for the strategy."""
try:
risks = []
# Analyze potential risks
content_pillars = strategy_data.get('content_pillars', [])
if len(content_pillars) < 2:
risks.append({
'type': 'content_diversity',
'severity': 'medium',
'description': 'Limited content pillar diversity',
'mitigation': 'Develop additional content pillars'
})
target_audience = strategy_data.get('target_audience', {})
if not target_audience:
risks.append({
'type': 'audience_definition',
'severity': 'high',
'description': 'Unclear target audience definition',
'mitigation': 'Define detailed audience personas'
})
return risks
except Exception as e:
logger.error(f"Error assessing strategic risks: {str(e)}")
return []
async def _analyze_strategic_opportunities(self, strategy_data: Dict[str, Any]) -> List[Dict[str, Any]]:
"""Analyze strategic opportunities for the strategy."""
try:
opportunities = []
# Identify opportunities based on strategy data
industry = strategy_data.get('industry', '')
if industry:
opportunities.append({
'type': 'industry_growth',
'priority': 'high',
'description': f'Growing {industry} industry presents expansion opportunities',
'action_items': [
'Monitor industry trends',
'Develop industry-specific content',
'Expand into emerging sub-sectors'
]
})
content_pillars = strategy_data.get('content_pillars', [])
if content_pillars:
opportunities.append({
'type': 'content_expansion',
'priority': 'medium',
'description': 'Opportunity to expand content pillar coverage',
'action_items': [
'Identify underserved content areas',
'Develop new content pillars',
'Expand into new content formats'
]
})
return opportunities
except Exception as e:
logger.error(f"Error analyzing strategic opportunities: {str(e)}")
return []

View File

@@ -0,0 +1,529 @@
"""
AI Prompt Optimizer Service
Advanced AI prompt optimization and management for content planning system.
"""
from typing import Dict, Any, List, Optional
from loguru import logger
from datetime import datetime
import json
import re
# Import AI providers
from llm_providers.main_text_generation import llm_text_gen
from llm_providers.gemini_provider import gemini_structured_json_response
class AIPromptOptimizer:
"""Advanced AI prompt optimization and management service."""
def __init__(self):
"""Initialize the AI prompt optimizer."""
self.logger = logger
self.prompts = self._load_advanced_prompts()
self.schemas = self._load_advanced_schemas()
logger.info("AIPromptOptimizer initialized")
def _load_advanced_prompts(self) -> Dict[str, str]:
"""Load advanced AI prompts from deep dive analysis."""
return {
# Strategic Content Gap Analysis Prompt
'strategic_content_gap_analysis': """
As an expert SEO content strategist with 15+ years of experience in content marketing and competitive analysis, analyze this comprehensive content gap analysis data and provide actionable strategic insights:
TARGET ANALYSIS:
- Website: {target_url}
- Industry: {industry}
- SERP Opportunities: {serp_opportunities} keywords not ranking
- Keyword Expansion: {expanded_keywords_count} additional keywords identified
- Competitors Analyzed: {competitors_analyzed} websites
- Content Quality Score: {content_quality_score}/10
- Market Competition Level: {competition_level}
DOMINANT CONTENT THEMES:
{dominant_themes}
COMPETITIVE LANDSCAPE:
{competitive_landscape}
PROVIDE COMPREHENSIVE ANALYSIS:
1. Strategic Content Gap Analysis (identify 3-5 major gaps with impact assessment)
2. Priority Content Recommendations (top 5 with ROI estimates)
3. Keyword Strategy Insights (trending, seasonal, long-tail opportunities)
4. Competitive Positioning Advice (differentiation strategies)
5. Content Format Recommendations (video, interactive, comprehensive guides)
6. Technical SEO Opportunities (structured data, schema markup)
7. Implementation Timeline (30/60/90 days with milestones)
8. Risk Assessment and Mitigation Strategies
9. Success Metrics and KPIs
10. Resource Allocation Recommendations
Consider user intent, search behavior patterns, and content consumption trends in your analysis.
Format as structured JSON with clear, actionable recommendations and confidence scores.
""",
# Market Position Analysis Prompt
'market_position_analysis': """
As a senior competitive intelligence analyst specializing in digital marketing and content strategy, analyze the market position of competitors in the {industry} industry:
COMPETITOR ANALYSES:
{competitor_analyses}
MARKET CONTEXT:
- Industry: {industry}
- Market Size: {market_size}
- Growth Rate: {growth_rate}
- Key Trends: {key_trends}
PROVIDE COMPREHENSIVE MARKET ANALYSIS:
1. Market Leader Identification (with reasoning)
2. Content Leader Analysis (content strategy assessment)
3. Quality Leader Assessment (content quality metrics)
4. Market Gaps Identification (3-5 major gaps)
5. Opportunities Analysis (high-impact opportunities)
6. Competitive Advantages (unique positioning)
7. Strategic Positioning Recommendations (differentiation)
8. Content Strategy Insights (format, frequency, quality)
9. Innovation Opportunities (emerging trends)
10. Risk Assessment (competitive threats)
Include market share estimates, competitive positioning matrix, and strategic recommendations with implementation timeline.
Format as structured JSON with detailed analysis and confidence levels.
""",
# Advanced Keyword Analysis Prompt
'advanced_keyword_analysis': """
As an expert keyword research specialist with deep understanding of search algorithms and user behavior, analyze keyword opportunities for {industry} industry:
KEYWORD DATA:
- Target Keywords: {target_keywords}
- Industry Context: {industry}
- Search Volume Data: {search_volume_data}
- Competition Analysis: {competition_analysis}
- Trend Analysis: {trend_analysis}
PROVIDE COMPREHENSIVE KEYWORD ANALYSIS:
1. Search Volume Estimates (with confidence intervals)
2. Competition Level Assessment (difficulty scoring)
3. Trend Analysis (seasonal, cyclical, emerging)
4. Opportunity Scoring (ROI potential)
5. Content Format Recommendations (based on intent)
6. Keyword Clustering (semantic relationships)
7. Long-tail Opportunities (specific, low-competition)
8. Seasonal Variations (trending patterns)
9. Search Intent Classification (informational, commercial, navigational, transactional)
10. Implementation Priority (quick wins vs long-term)
Consider search intent, user journey stages, and conversion potential in your analysis.
Format as structured JSON with detailed metrics and strategic recommendations.
"""
}
def _load_advanced_schemas(self) -> Dict[str, Dict[str, Any]]:
"""Load advanced JSON schemas for structured responses."""
return {
'strategic_content_gap_analysis': {
"type": "object",
"properties": {
"strategic_insights": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {"type": "string"},
"insight": {"type": "string"},
"confidence": {"type": "number"},
"priority": {"type": "string"},
"estimated_impact": {"type": "string"},
"implementation_time": {"type": "string"},
"risk_level": {"type": "string"}
}
}
},
"content_recommendations": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {"type": "string"},
"recommendation": {"type": "string"},
"priority": {"type": "string"},
"estimated_traffic": {"type": "string"},
"implementation_time": {"type": "string"},
"roi_estimate": {"type": "string"},
"success_metrics": {
"type": "array",
"items": {"type": "string"}
}
}
}
},
"keyword_strategy": {
"type": "object",
"properties": {
"trending_keywords": {
"type": "array",
"items": {"type": "string"}
},
"seasonal_opportunities": {
"type": "array",
"items": {"type": "string"}
},
"long_tail_opportunities": {
"type": "array",
"items": {"type": "string"}
},
"intent_classification": {
"type": "object",
"properties": {
"informational": {"type": "number"},
"commercial": {"type": "number"},
"navigational": {"type": "number"},
"transactional": {"type": "number"}
}
}
}
}
}
},
'market_position_analysis': {
"type": "object",
"properties": {
"market_leader": {"type": "string"},
"content_leader": {"type": "string"},
"quality_leader": {"type": "string"},
"market_gaps": {
"type": "array",
"items": {"type": "string"}
},
"opportunities": {
"type": "array",
"items": {"type": "string"}
},
"competitive_advantages": {
"type": "array",
"items": {"type": "string"}
},
"strategic_recommendations": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {"type": "string"},
"recommendation": {"type": "string"},
"priority": {"type": "string"},
"estimated_impact": {"type": "string"},
"implementation_time": {"type": "string"},
"confidence_level": {"type": "string"}
}
}
}
}
},
'advanced_keyword_analysis': {
"type": "object",
"properties": {
"keyword_opportunities": {
"type": "array",
"items": {
"type": "object",
"properties": {
"keyword": {"type": "string"},
"search_volume": {"type": "number"},
"competition_level": {"type": "string"},
"difficulty_score": {"type": "number"},
"trend": {"type": "string"},
"intent": {"type": "string"},
"opportunity_score": {"type": "number"},
"recommended_format": {"type": "string"},
"estimated_traffic": {"type": "string"},
"implementation_priority": {"type": "string"}
}
}
},
"keyword_clusters": {
"type": "array",
"items": {
"type": "object",
"properties": {
"cluster_name": {"type": "string"},
"main_keyword": {"type": "string"},
"related_keywords": {
"type": "array",
"items": {"type": "string"}
},
"search_volume": {"type": "number"},
"competition_level": {"type": "string"},
"content_suggestions": {
"type": "array",
"items": {"type": "string"}
}
}
}
}
}
}
}
async def generate_strategic_content_gap_analysis(self, analysis_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Generate strategic content gap analysis using advanced AI prompts.
Args:
analysis_data: Comprehensive analysis data
Returns:
Strategic content gap analysis results
"""
try:
logger.info("🤖 Generating strategic content gap analysis using advanced AI")
# Format the advanced prompt
prompt = self.prompts['strategic_content_gap_analysis'].format(
target_url=analysis_data.get('target_url', 'N/A'),
industry=analysis_data.get('industry', 'N/A'),
serp_opportunities=analysis_data.get('serp_opportunities', 0),
expanded_keywords_count=analysis_data.get('expanded_keywords_count', 0),
competitors_analyzed=analysis_data.get('competitors_analyzed', 0),
content_quality_score=analysis_data.get('content_quality_score', 7.0),
competition_level=analysis_data.get('competition_level', 'medium'),
dominant_themes=json.dumps(analysis_data.get('dominant_themes', {}), indent=2),
competitive_landscape=json.dumps(analysis_data.get('competitive_landscape', {}), indent=2)
)
# Use advanced schema for structured response
response = gemini_structured_json_response(
prompt=prompt,
schema=self.schemas['strategic_content_gap_analysis']
)
# Parse and return the AI response
result = json.loads(response)
logger.info("✅ Advanced strategic content gap analysis completed")
return result
except Exception as e:
logger.error(f"Error generating strategic content gap analysis: {str(e)}")
return self._get_fallback_content_gap_analysis()
async def generate_advanced_market_position_analysis(self, market_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Generate advanced market position analysis using optimized AI prompts.
Args:
market_data: Market analysis data
Returns:
Advanced market position analysis results
"""
try:
logger.info("🤖 Generating advanced market position analysis using optimized AI")
# Format the advanced prompt
prompt = self.prompts['market_position_analysis'].format(
industry=market_data.get('industry', 'N/A'),
competitor_analyses=json.dumps(market_data.get('competitors', []), indent=2),
market_size=market_data.get('market_size', 'N/A'),
growth_rate=market_data.get('growth_rate', 'N/A'),
key_trends=json.dumps(market_data.get('key_trends', []), indent=2)
)
# Use advanced schema for structured response
response = gemini_structured_json_response(
prompt=prompt,
schema=self.schemas['market_position_analysis']
)
# Parse and return the AI response
result = json.loads(response)
logger.info("✅ Advanced market position analysis completed")
return result
except Exception as e:
logger.error(f"Error generating advanced market position analysis: {str(e)}")
return self._get_fallback_market_position_analysis()
async def generate_advanced_keyword_analysis(self, keyword_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Generate advanced keyword analysis using optimized AI prompts.
Args:
keyword_data: Keyword analysis data
Returns:
Advanced keyword analysis results
"""
try:
logger.info("🤖 Generating advanced keyword analysis using optimized AI")
# Format the advanced prompt
prompt = self.prompts['advanced_keyword_analysis'].format(
industry=keyword_data.get('industry', 'N/A'),
target_keywords=json.dumps(keyword_data.get('target_keywords', []), indent=2),
search_volume_data=json.dumps(keyword_data.get('search_volume_data', {}), indent=2),
competition_analysis=json.dumps(keyword_data.get('competition_analysis', {}), indent=2),
trend_analysis=json.dumps(keyword_data.get('trend_analysis', {}), indent=2)
)
# Use advanced schema for structured response
response = gemini_structured_json_response(
prompt=prompt,
schema=self.schemas['advanced_keyword_analysis']
)
# Parse and return the AI response
result = json.loads(response)
logger.info("✅ Advanced keyword analysis completed")
return result
except Exception as e:
logger.error(f"Error generating advanced keyword analysis: {str(e)}")
return self._get_fallback_keyword_analysis()
# Fallback methods for error handling
def _get_fallback_content_gap_analysis(self) -> Dict[str, Any]:
"""Fallback content gap analysis when AI fails."""
return {
'strategic_insights': [
{
'type': 'content_strategy',
'insight': 'Focus on educational content to build authority',
'confidence': 0.85,
'priority': 'high',
'estimated_impact': 'Authority building',
'implementation_time': '3-6 months',
'risk_level': 'low'
}
],
'content_recommendations': [
{
'type': 'content_creation',
'recommendation': 'Create comprehensive guides for high-opportunity keywords',
'priority': 'high',
'estimated_traffic': '5K+ monthly',
'implementation_time': '2-3 weeks',
'roi_estimate': 'High ROI potential',
'success_metrics': ['Traffic increase', 'Authority building', 'Lead generation']
}
],
'keyword_strategy': {
'trending_keywords': ['industry trends', 'best practices'],
'seasonal_opportunities': ['holiday content', 'seasonal guides'],
'long_tail_opportunities': ['specific tutorials', 'detailed guides'],
'intent_classification': {
'informational': 0.6,
'commercial': 0.2,
'navigational': 0.1,
'transactional': 0.1
}
}
}
def _get_fallback_market_position_analysis(self) -> Dict[str, Any]:
"""Fallback market position analysis when AI fails."""
return {
'market_leader': 'competitor1.com',
'content_leader': 'competitor2.com',
'quality_leader': 'competitor3.com',
'market_gaps': [
'Video content',
'Interactive content',
'Expert interviews'
],
'opportunities': [
'Niche content development',
'Expert interviews',
'Industry reports'
],
'competitive_advantages': [
'Technical expertise',
'Comprehensive guides',
'Industry insights'
],
'strategic_recommendations': [
{
'type': 'differentiation',
'recommendation': 'Focus on unique content angles',
'priority': 'high',
'estimated_impact': 'Brand differentiation',
'implementation_time': '2-4 months',
'confidence_level': '85%'
}
]
}
def _get_fallback_keyword_analysis(self) -> Dict[str, Any]:
"""Fallback keyword analysis when AI fails."""
return {
'keyword_opportunities': [
{
'keyword': 'industry best practices',
'search_volume': 3000,
'competition_level': 'low',
'difficulty_score': 35,
'trend': 'rising',
'intent': 'informational',
'opportunity_score': 85,
'recommended_format': 'comprehensive_guide',
'estimated_traffic': '2K+ monthly',
'implementation_priority': 'high'
}
],
'keyword_clusters': [
{
'cluster_name': 'Industry Fundamentals',
'main_keyword': 'industry basics',
'related_keywords': ['fundamentals', 'introduction', 'basics'],
'search_volume': 5000,
'competition_level': 'medium',
'content_suggestions': ['Beginner guide', 'Overview article']
}
]
}
async def health_check(self) -> Dict[str, Any]:
"""
Health check for the AI prompt optimizer service.
Returns:
Health status information
"""
try:
logger.info("Performing health check for AIPromptOptimizer")
# Test AI functionality with a simple prompt
test_prompt = "Hello, this is a health check test."
try:
test_response = llm_text_gen(test_prompt)
ai_status = "operational" if test_response else "degraded"
except Exception as e:
ai_status = "error"
logger.warning(f"AI health check failed: {str(e)}")
health_status = {
'service': 'AIPromptOptimizer',
'status': 'healthy',
'capabilities': {
'strategic_content_gap_analysis': 'operational',
'advanced_market_position_analysis': 'operational',
'advanced_keyword_analysis': 'operational',
'ai_integration': ai_status
},
'prompts_loaded': len(self.prompts),
'schemas_loaded': len(self.schemas),
'timestamp': datetime.utcnow().isoformat()
}
logger.info("AIPromptOptimizer health check passed")
return health_status
except Exception as e:
logger.error(f"AIPromptOptimizer health check failed: {str(e)}")
return {
'service': 'AIPromptOptimizer',
'status': 'unhealthy',
'error': str(e),
'timestamp': datetime.utcnow().isoformat()
}

View File

@@ -0,0 +1,929 @@
"""
AI Service Manager
Centralized AI service management for content planning system.
"""
from typing import Dict, Any, List, Optional
from loguru import logger
from datetime import datetime
import json
import asyncio
from dataclasses import dataclass
from enum import Enum
# Import AI providers
from llm_providers.main_text_generation import llm_text_gen
from llm_providers.gemini_provider import gemini_structured_json_response
class AIServiceType(Enum):
"""AI service types for monitoring."""
CONTENT_GAP_ANALYSIS = "content_gap_analysis"
MARKET_POSITION_ANALYSIS = "market_position_analysis"
KEYWORD_ANALYSIS = "keyword_analysis"
PERFORMANCE_PREDICTION = "performance_prediction"
STRATEGIC_INTELLIGENCE = "strategic_intelligence"
CONTENT_QUALITY_ASSESSMENT = "content_quality_assessment"
CONTENT_SCHEDULE_GENERATION = "content_schedule_generation"
@dataclass
class AIServiceMetrics:
"""Metrics for AI service performance."""
service_type: AIServiceType
response_time: float
success: bool
error_message: Optional[str] = None
timestamp: datetime = None
def __post_init__(self):
if self.timestamp is None:
self.timestamp = datetime.utcnow()
class AIServiceManager:
"""Centralized AI service management for content planning system."""
def __init__(self):
"""Initialize AI service manager."""
self.logger = logger
self.metrics: List[AIServiceMetrics] = []
self.prompts = self._load_centralized_prompts()
self.schemas = self._load_centralized_schemas()
self.config = self._load_ai_configuration()
logger.info("AIServiceManager initialized")
def _load_ai_configuration(self) -> Dict[str, Any]:
"""Load AI configuration settings."""
return {
'max_retries': 3,
'timeout_seconds': 30,
'temperature': 0.7,
'max_tokens': 2048,
'enable_caching': True,
'cache_duration_minutes': 60,
'performance_monitoring': True,
'fallback_enabled': True
}
def _load_centralized_prompts(self) -> Dict[str, str]:
"""Load centralized AI prompts."""
return {
'content_gap_analysis': """
As an expert SEO content strategist with 15+ years of experience in content marketing and competitive analysis, analyze this comprehensive content gap analysis data and provide actionable strategic insights:
TARGET ANALYSIS:
- Website: {target_url}
- Industry: {industry}
- SERP Opportunities: {serp_opportunities} keywords not ranking
- Keyword Expansion: {expanded_keywords_count} additional keywords identified
- Competitors Analyzed: {competitors_analyzed} websites
- Content Quality Score: {content_quality_score}/10
- Market Competition Level: {competition_level}
DOMINANT CONTENT THEMES:
{dominant_themes}
COMPETITIVE LANDSCAPE:
{competitive_landscape}
PROVIDE COMPREHENSIVE ANALYSIS:
1. Strategic Content Gap Analysis (identify 3-5 major gaps with impact assessment)
2. Priority Content Recommendations (top 5 with ROI estimates)
3. Keyword Strategy Insights (trending, seasonal, long-tail opportunities)
4. Competitive Positioning Advice (differentiation strategies)
5. Content Format Recommendations (video, interactive, comprehensive guides)
6. Technical SEO Opportunities (structured data, schema markup)
7. Implementation Timeline (30/60/90 days with milestones)
8. Risk Assessment and Mitigation Strategies
9. Success Metrics and KPIs
10. Resource Allocation Recommendations
Consider user intent, search behavior patterns, and content consumption trends in your analysis.
Format as structured JSON with clear, actionable recommendations and confidence scores.
""",
'market_position_analysis': """
As a senior competitive intelligence analyst specializing in digital marketing and content strategy, analyze the market position of competitors in the {industry} industry:
COMPETITOR ANALYSES:
{competitor_analyses}
MARKET CONTEXT:
- Industry: {industry}
- Market Size: {market_size}
- Growth Rate: {growth_rate}
- Key Trends: {key_trends}
PROVIDE COMPREHENSIVE MARKET ANALYSIS:
1. Market Leader Identification (with reasoning)
2. Content Leader Analysis (content strategy assessment)
3. Quality Leader Assessment (content quality metrics)
4. Market Gaps Identification (3-5 major gaps)
5. Opportunities Analysis (high-impact opportunities)
6. Competitive Advantages (unique positioning)
7. Strategic Positioning Recommendations (differentiation)
8. Content Strategy Insights (format, frequency, quality)
9. Innovation Opportunities (emerging trends)
10. Risk Assessment (competitive threats)
Include market share estimates, competitive positioning matrix, and strategic recommendations with implementation timeline.
Format as structured JSON with detailed analysis and confidence levels.
""",
'keyword_analysis': """
As an expert keyword research specialist with deep understanding of search algorithms and user behavior, analyze keyword opportunities for {industry} industry:
KEYWORD DATA:
- Target Keywords: {target_keywords}
- Industry Context: {industry}
- Search Volume Data: {search_volume_data}
- Competition Analysis: {competition_analysis}
- Trend Analysis: {trend_analysis}
PROVIDE COMPREHENSIVE KEYWORD ANALYSIS:
1. Search Volume Estimates (with confidence intervals)
2. Competition Level Assessment (difficulty scoring)
3. Trend Analysis (seasonal, cyclical, emerging)
4. Opportunity Scoring (ROI potential)
5. Content Format Recommendations (based on intent)
6. Keyword Clustering (semantic relationships)
7. Long-tail Opportunities (specific, low-competition)
8. Seasonal Variations (trending patterns)
9. Search Intent Classification (informational, commercial, navigational, transactional)
10. Implementation Priority (quick wins vs long-term)
Consider search intent, user journey stages, and conversion potential in your analysis.
Format as structured JSON with detailed metrics and strategic recommendations.
""",
'performance_prediction': """
As a data-driven content strategist with expertise in predictive analytics and content performance optimization, predict content performance based on comprehensive analysis:
CONTENT DATA:
{content_data}
MARKET CONTEXT:
- Industry: {industry}
- Target Audience: {target_audience}
- Competition Level: {competition_level}
- Content Quality Score: {quality_score}
PROVIDE DETAILED PERFORMANCE PREDICTIONS:
1. Traffic Predictions (monthly, peak, growth rate)
2. Engagement Predictions (time on page, bounce rate, social shares)
3. Ranking Predictions (position, timeline, competition)
4. Conversion Predictions (CTR, conversion rate, leads)
5. Revenue Impact (estimated revenue, ROI)
6. Risk Factors (content saturation, algorithm changes)
7. Success Factors (quality indicators, optimization opportunities)
8. Competitive Response (market reaction)
9. Seasonal Variations (performance fluctuations)
10. Long-term Sustainability (content lifecycle)
Include confidence intervals, risk assessments, and optimization recommendations.
Format as structured JSON with detailed predictions and actionable insights.
""",
'strategic_intelligence': """
As a senior content strategy consultant with expertise in digital marketing, competitive intelligence, and strategic planning, generate comprehensive strategic insights:
ANALYSIS DATA:
{analysis_data}
STRATEGIC CONTEXT:
- Business Objectives: {business_objectives}
- Target Audience: {target_audience}
- Competitive Landscape: {competitive_landscape}
- Market Opportunities: {market_opportunities}
PROVIDE STRATEGIC INTELLIGENCE:
1. Content Strategy Recommendations (pillar content, topic clusters)
2. Competitive Positioning Advice (differentiation strategies)
3. Content Optimization Suggestions (quality, format, frequency)
4. Innovation Opportunities (emerging trends, new formats)
5. Risk Mitigation Strategies (competitive threats, algorithm changes)
6. Resource Allocation (budget, team, timeline)
7. Performance Optimization (KPIs, metrics, tracking)
8. Market Expansion Opportunities (new audiences, verticals)
9. Technology Integration (AI, automation, tools)
10. Long-term Strategic Vision (3-5 year roadmap)
Consider market dynamics, user behavior trends, and competitive landscape in your analysis.
Format as structured JSON with strategic insights and implementation guidance.
""",
'content_quality_assessment': """
As an expert content quality analyst with deep understanding of SEO, user experience, and content marketing best practices, assess content quality comprehensively:
CONTENT DATA:
{content_data}
QUALITY METRICS:
- Readability Score: {readability_score}
- SEO Optimization: {seo_score}
- User Engagement: {engagement_score}
- Content Depth: {depth_score}
PROVIDE COMPREHENSIVE QUALITY ASSESSMENT:
1. Overall Quality Score (comprehensive evaluation)
2. Readability Analysis (clarity, accessibility, flow)
3. SEO Optimization Analysis (technical, on-page, off-page)
4. Engagement Potential (user experience, interaction)
5. Content Depth Assessment (comprehensiveness, authority)
6. Improvement Suggestions (specific, actionable)
7. Competitive Benchmarking (industry standards)
8. Performance Optimization (conversion, retention)
9. Accessibility Assessment (inclusive design)
10. Future-Proofing (algorithm resilience)
Include specific recommendations with implementation steps and expected impact.
Format as structured JSON with detailed assessment and optimization guidance.
"""
}
def _load_centralized_schemas(self) -> Dict[str, Dict[str, Any]]:
"""Load centralized JSON schemas."""
return {
'content_gap_analysis': {
"type": "object",
"properties": {
"strategic_insights": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {"type": "string"},
"insight": {"type": "string"},
"confidence": {"type": "number"},
"priority": {"type": "string"},
"estimated_impact": {"type": "string"},
"implementation_time": {"type": "string"},
"risk_level": {"type": "string"}
}
}
},
"content_recommendations": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {"type": "string"},
"recommendation": {"type": "string"},
"priority": {"type": "string"},
"estimated_traffic": {"type": "string"},
"implementation_time": {"type": "string"},
"roi_estimate": {"type": "string"},
"success_metrics": {
"type": "array",
"items": {"type": "string"}
}
}
}
}
}
},
'market_position_analysis': {
"type": "object",
"properties": {
"market_leader": {"type": "string"},
"content_leader": {"type": "string"},
"quality_leader": {"type": "string"},
"market_gaps": {
"type": "array",
"items": {"type": "string"}
},
"opportunities": {
"type": "array",
"items": {"type": "string"}
},
"competitive_advantages": {
"type": "array",
"items": {"type": "string"}
},
"strategic_recommendations": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {"type": "string"},
"recommendation": {"type": "string"},
"priority": {"type": "string"},
"estimated_impact": {"type": "string"},
"implementation_time": {"type": "string"},
"confidence_level": {"type": "string"}
}
}
}
}
},
'keyword_analysis': {
"type": "object",
"properties": {
"keyword_opportunities": {
"type": "array",
"items": {
"type": "object",
"properties": {
"keyword": {"type": "string"},
"search_volume": {"type": "number"},
"competition_level": {"type": "string"},
"difficulty_score": {"type": "number"},
"trend": {"type": "string"},
"intent": {"type": "string"},
"opportunity_score": {"type": "number"},
"recommended_format": {"type": "string"},
"estimated_traffic": {"type": "string"},
"implementation_priority": {"type": "string"}
}
}
}
}
},
'performance_prediction': {
"type": "object",
"properties": {
"traffic_predictions": {
"type": "object",
"properties": {
"estimated_monthly_traffic": {"type": "string"},
"traffic_growth_rate": {"type": "string"},
"peak_traffic_month": {"type": "string"},
"confidence_level": {"type": "string"}
}
},
"engagement_predictions": {
"type": "object",
"properties": {
"estimated_time_on_page": {"type": "string"},
"estimated_bounce_rate": {"type": "string"},
"estimated_social_shares": {"type": "string"},
"estimated_comments": {"type": "string"},
"confidence_level": {"type": "string"}
}
}
}
},
'strategic_intelligence': {
"type": "object",
"properties": {
"strategic_insights": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {"type": "string"},
"insight": {"type": "string"},
"reasoning": {"type": "string"},
"priority": {"type": "string"},
"estimated_impact": {"type": "string"},
"implementation_time": {"type": "string"},
"confidence_level": {"type": "string"}
}
}
}
}
},
'content_quality_assessment': {
"type": "object",
"properties": {
"overall_score": {"type": "number"},
"readability_score": {"type": "number"},
"seo_score": {"type": "number"},
"engagement_potential": {"type": "string"},
"improvement_suggestions": {
"type": "array",
"items": {"type": "string"}
},
"timestamp": {"type": "string"}
}
},
'content_schedule_generation': {
"type": "object",
"properties": {
"schedule": {
"type": "array",
"items": {
"type": "object",
"properties": {
"day": {"type": "number"},
"title": {"type": "string"},
"description": {"type": "string"},
"content_type": {"type": "string"},
"platform": {"type": "string"},
"pillar": {"type": "string"},
"priority": {"type": "string"},
"keywords": {
"type": "array",
"items": {"type": "string"}
},
"estimated_impact": {"type": "string"},
"implementation_time": {"type": "string"}
}
}
}
}
}
}
async def _execute_ai_call(self, service_type: AIServiceType, prompt: str, schema: Dict[str, Any]) -> Dict[str, Any]:
"""
Execute AI call with performance monitoring.
Args:
service_type: Type of AI service
prompt: AI prompt
schema: JSON schema for response
Returns:
AI response
"""
start_time = datetime.utcnow()
success = False
error_message = None
result = {}
try:
logger.info(f"🤖 Executing AI call for {service_type.value}")
# Execute AI call with timeout
response = await asyncio.wait_for(
gemini_structured_json_response(
prompt=prompt,
schema=schema,
temperature=self.config['temperature'],
max_tokens=self.config['max_tokens']
),
timeout=self.config['timeout_seconds']
)
# Parse response
result = json.loads(response)
success = True
logger.info(f"✅ AI call for {service_type.value} completed successfully")
except asyncio.TimeoutError:
error_message = f"AI call timeout for {service_type.value}"
logger.error(error_message)
except json.JSONDecodeError as e:
error_message = f"JSON decode error for {service_type.value}: {str(e)}"
logger.error(error_message)
except Exception as e:
error_message = f"AI call error for {service_type.value}: {str(e)}"
logger.error(error_message)
# Calculate response time
response_time = (datetime.utcnow() - start_time).total_seconds()
# Record metrics
metrics = AIServiceMetrics(
service_type=service_type,
response_time=response_time,
success=success,
error_message=error_message
)
self.metrics.append(metrics)
return result
async def generate_content_gap_analysis(self, analysis_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Generate content gap analysis using centralized AI service.
Args:
analysis_data: Analysis data
Returns:
Content gap analysis results
"""
try:
# Format prompt
prompt = self.prompts['content_gap_analysis'].format(
target_url=analysis_data.get('target_url', 'N/A'),
industry=analysis_data.get('industry', 'N/A'),
serp_opportunities=analysis_data.get('serp_opportunities', 0),
expanded_keywords_count=analysis_data.get('expanded_keywords_count', 0),
competitors_analyzed=analysis_data.get('competitors_analyzed', 0),
content_quality_score=analysis_data.get('content_quality_score', 7.0),
competition_level=analysis_data.get('competition_level', 'medium'),
dominant_themes=json.dumps(analysis_data.get('dominant_themes', {}), indent=2),
competitive_landscape=json.dumps(analysis_data.get('competitive_landscape', {}), indent=2)
)
# Execute AI call
result = await self._execute_ai_call(
AIServiceType.CONTENT_GAP_ANALYSIS,
prompt,
self.schemas['content_gap_analysis']
)
return result if result else self._get_fallback_content_gap_analysis()
except Exception as e:
logger.error(f"Error in content gap analysis: {str(e)}")
return self._get_fallback_content_gap_analysis()
async def generate_market_position_analysis(self, market_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Generate market position analysis using centralized AI service.
Args:
market_data: Market analysis data
Returns:
Market position analysis results
"""
try:
# Format prompt
prompt = self.prompts['market_position_analysis'].format(
industry=market_data.get('industry', 'N/A'),
competitor_analyses=json.dumps(market_data.get('competitors', []), indent=2),
market_size=market_data.get('market_size', 'N/A'),
growth_rate=market_data.get('growth_rate', 'N/A'),
key_trends=json.dumps(market_data.get('key_trends', []), indent=2)
)
# Execute AI call
result = await self._execute_ai_call(
AIServiceType.MARKET_POSITION_ANALYSIS,
prompt,
self.schemas['market_position_analysis']
)
return result if result else self._get_fallback_market_position_analysis()
except Exception as e:
logger.error(f"Error in market position analysis: {str(e)}")
return self._get_fallback_market_position_analysis()
async def generate_keyword_analysis(self, keyword_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Generate keyword analysis using centralized AI service.
Args:
keyword_data: Keyword analysis data
Returns:
Keyword analysis results
"""
try:
# Format prompt
prompt = self.prompts['keyword_analysis'].format(
industry=keyword_data.get('industry', 'N/A'),
target_keywords=json.dumps(keyword_data.get('target_keywords', []), indent=2),
search_volume_data=json.dumps(keyword_data.get('search_volume_data', {}), indent=2),
competition_analysis=json.dumps(keyword_data.get('competition_analysis', {}), indent=2),
trend_analysis=json.dumps(keyword_data.get('trend_analysis', {}), indent=2)
)
# Execute AI call
result = await self._execute_ai_call(
AIServiceType.KEYWORD_ANALYSIS,
prompt,
self.schemas['keyword_analysis']
)
return result if result else self._get_fallback_keyword_analysis()
except Exception as e:
logger.error(f"Error in keyword analysis: {str(e)}")
return self._get_fallback_keyword_analysis()
async def generate_performance_prediction(self, content_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Generate performance prediction using centralized AI service.
Args:
content_data: Content data for prediction
Returns:
Performance prediction results
"""
try:
# Format prompt
prompt = self.prompts['performance_prediction'].format(
industry=content_data.get('industry', 'N/A'),
target_audience=json.dumps(content_data.get('target_audience', {})),
competition_level=content_data.get('competition_level', 'medium'),
quality_score=content_data.get('quality_score', 7.0)
)
# Execute AI call
result = await self._execute_ai_call(
AIServiceType.PERFORMANCE_PREDICTION,
prompt,
self.schemas['performance_prediction']
)
return result if result else self._get_fallback_performance_prediction()
except Exception as e:
logger.error(f"Error in performance prediction: {str(e)}")
return self._get_fallback_performance_prediction()
async def generate_strategic_intelligence(self, analysis_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Generate strategic intelligence using centralized AI service.
Args:
analysis_data: Analysis data for strategic insights
Returns:
Strategic intelligence results
"""
try:
# Format prompt
prompt = self.prompts['strategic_intelligence'].format(
analysis_data=json.dumps(analysis_data, indent=2),
business_objectives=json.dumps(analysis_data.get('business_objectives', {})),
target_audience=json.dumps(analysis_data.get('target_audience', {})),
competitive_landscape=json.dumps(analysis_data.get('competitive_landscape', {}), indent=2),
market_opportunities=json.dumps(analysis_data.get('market_opportunities', []), indent=2)
)
# Execute AI call
result = await self._execute_ai_call(
AIServiceType.STRATEGIC_INTELLIGENCE,
prompt,
self.schemas['strategic_intelligence']
)
return result if result else self._get_fallback_strategic_intelligence()
except Exception as e:
logger.error(f"Error in strategic intelligence: {str(e)}")
return self._get_fallback_strategic_intelligence()
async def generate_content_quality_assessment(self, content_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Generate content quality assessment using centralized AI service.
Args:
content_data: Content data for assessment
Returns:
Content quality assessment results
"""
try:
# Format prompt
prompt = self.prompts['content_quality_assessment'].format(
content_data=json.dumps(content_data, indent=2),
readability_score=content_data.get('readability_score', 80.0),
seo_score=content_data.get('seo_score', 90.0),
engagement_score=content_data.get('engagement_score', 75.0),
depth_score=content_data.get('depth_score', 85.0)
)
# Execute AI call
result = await self._execute_ai_call(
AIServiceType.CONTENT_QUALITY_ASSESSMENT,
prompt,
self.schemas['content_quality_assessment']
)
return result if result else self._get_fallback_content_quality_assessment()
except Exception as e:
logger.error(f"Error in content quality assessment: {str(e)}")
return self._get_fallback_content_quality_assessment()
async def generate_content_schedule(self, prompt: str) -> Dict[str, Any]:
"""
Generate content schedule using AI.
"""
try:
logger.info("Generating content schedule using AI")
# Use the content schedule prompt
enhanced_prompt = f"""
{prompt}
Please return a structured JSON response with the following format:
{{
"schedule": [
{{
"day": 1,
"title": "Content Title",
"description": "Content description",
"content_type": "blog_post",
"platform": "website",
"pillar": "Educational Content",
"priority": "high",
"keywords": ["keyword1", "keyword2"],
"estimated_impact": "High",
"implementation_time": "2-4 weeks"
}}
]
}}
"""
response = await self._execute_ai_call(
AIServiceType.CONTENT_SCHEDULE_GENERATION,
enhanced_prompt,
self.schemas.get('content_schedule_generation', {})
)
logger.info("Content schedule generated successfully")
return response
except Exception as e:
logger.error(f"Error generating content schedule: {str(e)}")
return {"schedule": []}
# Fallback methods
def _get_fallback_content_gap_analysis(self) -> Dict[str, Any]:
"""Fallback content gap analysis."""
return {
'strategic_insights': [
{
'type': 'content_strategy',
'insight': 'Focus on educational content to build authority',
'confidence': 0.85,
'priority': 'high',
'estimated_impact': 'Authority building',
'implementation_time': '3-6 months',
'risk_level': 'low'
}
],
'content_recommendations': [
{
'type': 'content_creation',
'recommendation': 'Create comprehensive guides for high-opportunity keywords',
'priority': 'high',
'estimated_traffic': '5K+ monthly',
'implementation_time': '2-3 weeks',
'roi_estimate': 'High ROI potential',
'success_metrics': ['Traffic increase', 'Authority building', 'Lead generation']
}
]
}
def _get_fallback_market_position_analysis(self) -> Dict[str, Any]:
"""Fallback market position analysis."""
return {
'market_leader': 'competitor1.com',
'content_leader': 'competitor2.com',
'quality_leader': 'competitor3.com',
'market_gaps': ['Video content', 'Interactive content', 'Expert interviews'],
'opportunities': ['Niche content development', 'Expert interviews', 'Industry reports'],
'competitive_advantages': ['Technical expertise', 'Comprehensive guides', 'Industry insights']
}
def _get_fallback_keyword_analysis(self) -> Dict[str, Any]:
"""Fallback keyword analysis."""
return {
'keyword_opportunities': [
{
'keyword': 'industry best practices',
'search_volume': 3000,
'competition_level': 'low',
'difficulty_score': 35,
'trend': 'rising',
'intent': 'informational',
'opportunity_score': 85,
'recommended_format': 'comprehensive_guide',
'estimated_traffic': '2K+ monthly',
'implementation_priority': 'high'
}
]
}
def _get_fallback_performance_prediction(self) -> Dict[str, Any]:
"""Fallback performance prediction."""
return {
"traffic_predictions": {
"estimated_monthly_traffic": "10K+",
"traffic_growth_rate": "10%",
"peak_traffic_month": "June",
"confidence_level": "high"
},
"engagement_predictions": {
"estimated_time_on_page": "5 min",
"estimated_bounce_rate": "20%",
"estimated_social_shares": "100+",
"estimated_comments": "50+",
"confidence_level": "medium"
}
}
def _get_fallback_strategic_intelligence(self) -> Dict[str, Any]:
"""Fallback strategic intelligence."""
return {
"strategic_insights": [
{
"type": "content_strategy",
"insight": "Focus on educational content to build authority",
"reasoning": "Educational content is highly shareable and can attract a targeted audience.",
"priority": "high",
"estimated_impact": "Authority building",
"implementation_time": "3-6 months",
"confidence_level": "high"
}
]
}
def _get_fallback_content_quality_assessment(self) -> Dict[str, Any]:
"""Fallback content quality assessment."""
return {
"overall_score": 88.0,
"readability_score": 92.0,
"seo_score": 95.0,
"engagement_potential": "High engagement and retention",
"improvement_suggestions": ["Add more internal links", "Optimize images for SEO"],
"timestamp": datetime.utcnow().isoformat()
}
def get_performance_metrics(self) -> Dict[str, Any]:
"""
Get AI service performance metrics.
Returns:
Performance metrics
"""
if not self.metrics:
return {
'total_calls': 0,
'success_rate': 0,
'average_response_time': 0,
'service_breakdown': {}
}
total_calls = len(self.metrics)
successful_calls = len([m for m in self.metrics if m.success])
success_rate = (successful_calls / total_calls) * 100 if total_calls > 0 else 0
average_response_time = sum(m.response_time for m in self.metrics) / total_calls if total_calls > 0 else 0
# Service breakdown
service_breakdown = {}
for service_type in AIServiceType:
service_metrics = [m for m in self.metrics if m.service_type == service_type]
if service_metrics:
service_breakdown[service_type.value] = {
'total_calls': len(service_metrics),
'success_rate': (len([m for m in service_metrics if m.success]) / len(service_metrics)) * 100,
'average_response_time': sum(m.response_time for m in service_metrics) / len(service_metrics)
}
return {
'total_calls': total_calls,
'success_rate': success_rate,
'average_response_time': average_response_time,
'service_breakdown': service_breakdown,
'last_updated': datetime.utcnow().isoformat()
}
async def health_check(self) -> Dict[str, Any]:
"""
Health check for the AI service manager.
Returns:
Health status information
"""
try:
logger.info("Performing health check for AIServiceManager")
# Test AI functionality with a simple prompt
test_prompt = "Hello, this is a health check test."
try:
test_response = llm_text_gen(test_prompt)
ai_status = "operational" if test_response else "degraded"
except Exception as e:
ai_status = "error"
logger.warning(f"AI health check failed: {str(e)}")
# Get performance metrics
performance_metrics = self.get_performance_metrics()
health_status = {
'service': 'AIServiceManager',
'status': 'healthy',
'capabilities': {
'content_gap_analysis': 'operational',
'market_position_analysis': 'operational',
'keyword_analysis': 'operational',
'performance_prediction': 'operational',
'strategic_intelligence': 'operational',
'content_quality_assessment': 'operational',
'ai_integration': ai_status
},
'performance_metrics': performance_metrics,
'prompts_loaded': len(self.prompts),
'schemas_loaded': len(self.schemas),
'configuration': self.config,
'timestamp': datetime.utcnow().isoformat()
}
logger.info("AIServiceManager health check passed")
return health_status
except Exception as e:
logger.error(f"AIServiceManager health check failed: {str(e)}")
return {
'service': 'AIServiceManager',
'status': 'unhealthy',
'error': str(e),
'timestamp': datetime.utcnow().isoformat()
}

View File

@@ -0,0 +1,538 @@
"""Enhanced API Key Manager service for ALwrity backend."""
# This file contains the core business logic moved from lib/utils/api_key_manager/
# It includes the OnboardingProgress class and related functionality
import os
import json
from datetime import datetime
from typing import Dict, Any, List, Optional
from dataclasses import dataclass, asdict
from enum import Enum
from loguru import logger
from dotenv import load_dotenv
class StepStatus(Enum):
PENDING = "pending"
IN_PROGRESS = "in_progress"
COMPLETED = "completed"
SKIPPED = "skipped"
@dataclass
class StepData:
step_number: int
title: str
description: str
status: StepStatus
completed_at: Optional[str] = None
data: Optional[Dict[str, Any]] = None
validation_errors: List[str] = None
def __post_init__(self):
if self.validation_errors is None:
self.validation_errors = []
class OnboardingProgress:
"""Manages onboarding progress with persistence and validation."""
def __init__(self):
self.steps = self._initialize_steps()
self.current_step = 1
self.started_at = datetime.now().isoformat()
self.last_updated = datetime.now().isoformat()
self.is_completed = False
self.completed_at = None
self.progress_file = ".onboarding_progress.json"
# Load existing progress if available
self.load_progress()
def _initialize_steps(self) -> List[StepData]:
"""Initialize the 6-step onboarding process."""
return [
StepData(1, "AI LLM Providers", "Configure AI language model providers", StepStatus.PENDING),
StepData(2, "Website Analysis", "Set up website analysis and crawling", StepStatus.PENDING),
StepData(3, "AI Research", "Configure AI research capabilities", StepStatus.PENDING),
StepData(4, "Personalization", "Set up personalization features", StepStatus.PENDING),
StepData(5, "Integrations", "Configure ALwrity integrations", StepStatus.PENDING),
StepData(6, "Complete Setup", "Finalize and complete onboarding", StepStatus.PENDING)
]
def get_step_data(self, step_number: int) -> Optional[StepData]:
"""Get data for a specific step."""
for step in self.steps:
if step.step_number == step_number:
return step
return None
def mark_step_completed(self, step_number: int, data: Optional[Dict[str, Any]] = None):
"""Mark a step as completed."""
logger.info(f"[mark_step_completed] Marking step {step_number} as completed")
step = self.get_step_data(step_number)
if step:
step.status = StepStatus.COMPLETED
step.completed_at = datetime.now().isoformat()
step.data = data
self.last_updated = datetime.now().isoformat()
# Check if all steps are now completed
all_completed = all(s.status in [StepStatus.COMPLETED, StepStatus.SKIPPED] for s in self.steps)
if all_completed:
# If all steps are completed, mark onboarding as complete
self.is_completed = True
self.completed_at = datetime.now().isoformat()
self.current_step = len(self.steps) # Set to last step number
logger.info(f"[mark_step_completed] All steps completed, marking onboarding as complete")
else:
# Only increment current_step if there are more steps to go
self.current_step = step_number + 1
# Ensure current_step doesn't exceed total steps
if self.current_step > len(self.steps):
self.current_step = len(self.steps)
logger.info(f"[mark_step_completed] Step {step_number} completed, new current_step: {self.current_step}, is_completed: {self.is_completed}")
self.save_progress()
logger.info(f"Step {step_number} marked as completed")
else:
logger.error(f"[mark_step_completed] Step {step_number} not found")
def mark_step_in_progress(self, step_number: int):
"""Mark a step as in progress."""
step = self.get_step_data(step_number)
if step:
step.status = StepStatus.IN_PROGRESS
self.current_step = step_number
self.last_updated = datetime.now().isoformat()
self.save_progress()
logger.info(f"Step {step_number} marked as in progress")
def mark_step_skipped(self, step_number: int):
"""Mark a step as skipped."""
step = self.get_step_data(step_number)
if step:
step.status = StepStatus.SKIPPED
step.completed_at = datetime.now().isoformat()
self.last_updated = datetime.now().isoformat()
# Check if all steps are now completed
all_completed = all(s.status in [StepStatus.COMPLETED, StepStatus.SKIPPED] for s in self.steps)
if all_completed:
# If all steps are completed, mark onboarding as complete
self.is_completed = True
self.completed_at = datetime.now().isoformat()
self.current_step = len(self.steps) # Set to last step number
logger.info(f"[mark_step_skipped] All steps completed, marking onboarding as complete")
else:
# Only increment current_step if there are more steps to go
self.current_step = step_number + 1
# Ensure current_step doesn't exceed total steps
if self.current_step > len(self.steps):
self.current_step = len(self.steps)
logger.info(f"[mark_step_skipped] Step {step_number} skipped, new current_step: {self.current_step}, is_completed: {self.is_completed}")
self.save_progress()
logger.info(f"Step {step_number} marked as skipped")
def can_proceed_to_step(self, step_number: int) -> bool:
"""Check if user can proceed to a specific step."""
if step_number == 1:
return True # First step is always accessible
# Check if all previous steps are completed
for step in self.steps:
if step.step_number < step_number:
if step.status not in [StepStatus.COMPLETED, StepStatus.SKIPPED]:
return False
return True
def can_complete_onboarding(self) -> bool:
"""Check if onboarding can be completed."""
required_steps = [1, 2, 3, 6] # Steps 1, 2, 3, and 6 are required
for step_num in required_steps:
step = self.get_step_data(step_num)
if step and step.status not in [StepStatus.COMPLETED, StepStatus.SKIPPED]:
return False
return True
def get_completion_percentage(self) -> float:
"""Get the completion percentage."""
completed_steps = sum(1 for step in self.steps if step.status in [StepStatus.COMPLETED, StepStatus.SKIPPED])
return (completed_steps / len(self.steps)) * 100
def get_next_incomplete_step(self) -> Optional[int]:
"""Get the next incomplete step number."""
for step in self.steps:
if step.status not in [StepStatus.COMPLETED, StepStatus.SKIPPED]:
return step.step_number
return None
def get_resume_step(self) -> int:
"""Get the step to resume from."""
logger.info(f"[get_resume_step] Checking resume step...")
logger.info(f"[get_resume_step] Current step: {self.current_step}")
logger.info(f"[get_resume_step] Steps status: {[f'{s.step_number}:{s.status.value}' for s in self.steps]}")
for step in self.steps:
if step.status not in [StepStatus.COMPLETED, StepStatus.SKIPPED]:
logger.info(f"[get_resume_step] Found incomplete step: {step.step_number}")
return step.step_number
logger.warning(f"[get_resume_step] No incomplete steps found, defaulting to step 1")
return 1 # Default to first step
def complete_onboarding(self):
"""Complete the onboarding process."""
self.is_completed = True
self.completed_at = datetime.now().isoformat()
self.last_updated = datetime.now().isoformat()
self.save_progress()
logger.info("Onboarding completed successfully")
def save_progress(self):
"""Save progress to file."""
try:
progress_data = {
"steps": [{
"step_number": step.step_number,
"title": step.title,
"description": step.description,
"status": step.status.value, # Convert enum to string
"completed_at": step.completed_at,
"data": step.data,
"validation_errors": step.validation_errors
} for step in self.steps],
"current_step": self.current_step,
"started_at": self.started_at,
"last_updated": self.last_updated,
"is_completed": self.is_completed,
"completed_at": self.completed_at
}
with open(self.progress_file, 'w') as f:
json.dump(progress_data, f, indent=2)
logger.debug(f"Progress saved to {self.progress_file}")
except Exception as e:
logger.error(f"Error saving progress: {str(e)}")
def load_progress(self):
"""Load progress from file."""
try:
if os.path.exists(self.progress_file):
with open(self.progress_file, 'r') as f:
progress_data = json.load(f)
# Restore step data
for step_data in progress_data.get("steps", []):
step_num = step_data.get("step_number")
if step_num:
step = self.get_step_data(step_num)
if step:
step.status = StepStatus(step_data.get("status", "pending"))
step.completed_at = step_data.get("completed_at")
step.data = step_data.get("data")
step.validation_errors = step_data.get("validation_errors", [])
# Restore other data
self.current_step = progress_data.get("current_step", 1)
self.started_at = progress_data.get("started_at", self.started_at)
self.last_updated = progress_data.get("last_updated", self.last_updated)
self.is_completed = progress_data.get("is_completed", False)
self.completed_at = progress_data.get("completed_at")
# Fix any corrupted state
self._fix_corrupted_state()
logger.info("Progress loaded from file")
except Exception as e:
logger.error(f"Error loading progress: {str(e)}")
def _fix_corrupted_state(self):
"""Fix any corrupted progress state."""
# Check if all steps are completed
all_steps_completed = all(s.status in [StepStatus.COMPLETED, StepStatus.SKIPPED] for s in self.steps)
if all_steps_completed:
# If all steps are completed, ensure is_completed is True and current_step is valid
if not self.is_completed:
logger.info(f"[_fix_corrupted_state] All steps completed but is_completed was False, fixing...")
self.is_completed = True
self.completed_at = datetime.now().isoformat()
# Ensure current_step doesn't exceed total steps
if self.current_step > len(self.steps):
logger.info(f"[_fix_corrupted_state] Current step {self.current_step} exceeds total steps {len(self.steps)}, fixing...")
self.current_step = len(self.steps)
self.save_progress()
else:
# If not all steps are completed, ensure is_completed is False
if self.is_completed:
logger.info(f"[_fix_corrupted_state] Not all steps completed but is_completed was True, fixing...")
self.is_completed = False
self.completed_at = None
self.save_progress()
def reset_progress(self):
"""Reset all progress."""
self.steps = self._initialize_steps()
self.current_step = 1
self.started_at = datetime.now().isoformat()
self.last_updated = datetime.now().isoformat()
self.is_completed = False
self.completed_at = None
self.save_progress()
logger.info("Progress reset successfully")
class APIKeyManager:
"""Enhanced manager for handling API keys with setup instructions."""
def __init__(self):
self.api_keys = {
"openai": None,
"gemini": None,
"anthropic": None,
"mistral": None,
"tavily": None,
"serper": None,
"metaphor": None,
"firecrawl": None,
"stability": None
}
self.load_api_keys()
# Enhanced provider setup instructions
self.api_key_groups = {
"Create": {
"GEMINI_API_KEY": {
"url": "https://makersuite.google.com/app/apikey",
"description": "Google's Gemini AI for content generation",
"setup_steps": [
"Visit Google AI Studio",
"Create a Google Cloud account",
"Enable Gemini API",
"Generate API key"
]
},
"OPENAI_API_KEY": {
"url": "https://platform.openai.com/api-keys",
"description": "OpenAI's GPT models for content creation",
"setup_steps": [
"Go to OpenAI platform",
"Create an account",
"Navigate to API keys",
"Create new API key"
]
},
"MISTRAL_API_KEY": {
"url": "https://console.mistral.ai/api-keys/",
"description": "Mistral AI for efficient content generation",
"setup_steps": [
"Visit Mistral AI website",
"Sign up for an account",
"Access API section",
"Generate API key"
]
},
"ANTHROPIC_API_KEY": {
"url": "https://console.anthropic.com/",
"description": "Anthropic's Claude models for content creation",
"setup_steps": [
"Visit Anthropic console",
"Create an account",
"Navigate to API keys",
"Generate API key"
]
}
},
"Research": {
"TAVILY_API_KEY": {
"url": "https://tavily.com/#api",
"description": "Powers intelligent web research features",
"setup_steps": [
"Go to Tavily's website",
"Create an account",
"Access your API dashboard",
"Generate a new API key"
]
},
"SERPER_API_KEY": {
"url": "https://serper.dev/signup",
"description": "Enables Google search functionality",
"setup_steps": [
"Visit Serper.dev",
"Sign up for an account",
"Go to API section",
"Create your API key"
]
}
},
"Deep Search": {
"METAPHOR_API_KEY": {
"url": "https://dashboard.exa.ai/login",
"description": "Enables advanced web search capabilities",
"setup_steps": [
"Visit the Exa AI dashboard",
"Sign up for a free account",
"Navigate to API Keys section",
"Create a new API key"
]
},
"FIRECRAWL_API_KEY": {
"url": "https://www.firecrawl.dev/account",
"description": "Enables web content extraction",
"setup_steps": [
"Visit Firecrawl website",
"Sign up for an account",
"Access API dashboard",
"Create your API key"
]
}
},
"Integrations": {
"STABILITY_API_KEY": {
"url": "https://platform.stability.ai/",
"description": "Enables AI image generation",
"setup_steps": [
"Access Stability AI platform",
"Create an account",
"Navigate to API settings",
"Generate your API key"
]
}
}
}
def save_api_key(self, provider: str, api_key: str) -> bool:
"""Save an API key for a provider."""
try:
if provider in self.api_keys:
self.api_keys[provider] = api_key
self._save_to_env_file(provider, api_key)
logger.info(f"API key saved for {provider}")
return True
else:
logger.error(f"Unknown provider: {provider}")
return False
except Exception as e:
logger.error(f"Error saving API key: {str(e)}")
return False
def get_api_key(self, provider: str) -> Optional[str]:
"""Get API key for a provider."""
return self.api_keys.get(provider)
def get_all_keys(self) -> Dict[str, str]:
"""Get all configured API keys."""
return {k: v for k, v in self.api_keys.items() if v is not None}
def load_api_keys(self):
"""Load API keys from environment variables."""
# Reload environment variables first
load_dotenv(override=True)
env_mapping = {
"OPENAI_API_KEY": "openai",
"GEMINI_API_KEY": "gemini",
"ANTHROPIC_API_KEY": "anthropic",
"MISTRAL_API_KEY": "mistral",
"TAVILY_API_KEY": "tavily",
"SERPER_API_KEY": "serper",
"METAPHOR_API_KEY": "metaphor",
"FIRECRAWL_API_KEY": "firecrawl",
"STABILITY_API_KEY": "stability"
}
for env_var, provider in env_mapping.items():
api_key = os.getenv(env_var)
if api_key:
self.api_keys[provider] = api_key
def get_provider_setup_info(self, provider: str) -> Optional[Dict[str, Any]]:
"""Get setup information for a specific provider."""
for group_name, providers in self.api_key_groups.items():
for env_var, info in providers.items():
if env_var.lower().replace('_api_key', '').replace('_key', '') == provider:
return {
"provider": provider,
"group": group_name,
"url": info["url"],
"description": info["description"],
"setup_steps": info["setup_steps"]
}
return None
def get_all_providers_info(self) -> Dict[str, Any]:
"""Get information for all providers."""
return {
"groups": self.api_key_groups,
"configured_providers": [k for k, v in self.api_keys.items() if v],
"total_providers": len(self.api_keys)
}
def _save_to_env_file(self, provider: str, api_key: str):
"""Save API key to .env file."""
try:
env_mapping = {
"openai": "OPENAI_API_KEY",
"gemini": "GEMINI_API_KEY",
"anthropic": "ANTHROPIC_API_KEY",
"mistral": "MISTRAL_API_KEY",
"tavily": "TAVILY_API_KEY",
"serper": "SERPER_API_KEY",
"metaphor": "METAPHOR_API_KEY",
"firecrawl": "FIRECRAWL_API_KEY",
"stability": "STABILITY_API_KEY"
}
env_var = env_mapping.get(provider)
if env_var:
# Update environment variable
os.environ[env_var] = api_key
# Update .env file
env_path = ".env"
if os.path.exists(env_path):
with open(env_path, 'r') as f:
lines = f.readlines()
else:
lines = []
key_found = False
updated_lines = []
for line in lines:
if line.startswith(f"{env_var}="):
updated_lines.append(f"{env_var}={api_key}\n")
key_found = True
else:
updated_lines.append(line)
if not key_found:
updated_lines.append(f"{env_var}={api_key}\n")
with open(env_path, 'w') as f:
f.writelines(updated_lines)
# Reload environment variables
load_dotenv(override=True)
logger.debug(f"API key saved to .env file for {provider}")
except Exception as e:
logger.error(f"Error saving to .env file: {str(e)}")
# Global instance for the application
_onboarding_progress = None
def get_onboarding_progress() -> OnboardingProgress:
"""Get the global onboarding progress instance."""
if not hasattr(get_onboarding_progress, '_instance'):
get_onboarding_progress._instance = OnboardingProgress()
return get_onboarding_progress._instance
def get_api_key_manager() -> APIKeyManager:
"""Get the global API key manager instance."""
if not hasattr(get_api_key_manager, '_instance'):
get_api_key_manager._instance = APIKeyManager()
return get_api_key_manager._instance

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,19 @@
"""Component Logic Services for ALwrity Backend.
This module contains business logic extracted from legacy Streamlit components
and converted to reusable FastAPI services.
"""
from .ai_research_logic import AIResearchLogic
from .personalization_logic import PersonalizationLogic
from .research_utilities import ResearchUtilities
from .style_detection_logic import StyleDetectionLogic
from .web_crawler_logic import WebCrawlerLogic
__all__ = [
"AIResearchLogic",
"PersonalizationLogic",
"ResearchUtilities",
"StyleDetectionLogic",
"WebCrawlerLogic"
]

View File

@@ -0,0 +1,268 @@
"""AI Research Logic Service for ALwrity Backend.
This service handles business logic for AI research configuration and user information
validation, extracted from the legacy Streamlit component.
"""
from typing import Dict, Any, List, Optional
from loguru import logger
import re
from datetime import datetime
class AIResearchLogic:
"""Business logic for AI research configuration and user information."""
def __init__(self):
"""Initialize the AI Research Logic service."""
self.valid_roles = ["Content Creator", "Marketing Manager", "Business Owner", "Other"]
self.valid_research_depths = ["Basic", "Standard", "Deep", "Comprehensive"]
self.valid_content_types = ["Blog Posts", "Social Media", "Technical Articles", "News", "Academic Papers"]
def validate_user_info(self, user_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Validate user information for AI research configuration.
Args:
user_data: Dictionary containing user information
Returns:
Dict containing validation results
"""
try:
logger.info("Validating user information for AI research")
errors = []
validated_data = {}
# Validate full name
full_name = user_data.get('full_name', '').strip()
if not full_name or len(full_name) < 2:
errors.append("Full name must be at least 2 characters long")
else:
validated_data['full_name'] = full_name
# Validate email
email = user_data.get('email', '').strip().lower()
email_pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
if not email_pattern.match(email):
errors.append("Invalid email format")
else:
validated_data['email'] = email
# Validate company
company = user_data.get('company', '').strip()
if not company:
errors.append("Company name is required")
else:
validated_data['company'] = company
# Validate role
role = user_data.get('role', '')
if role not in self.valid_roles:
errors.append(f"Role must be one of: {', '.join(self.valid_roles)}")
else:
validated_data['role'] = role
# Determine validation result
is_valid = len(errors) == 0
if is_valid:
logger.info("User information validation successful")
validated_data['validated_at'] = datetime.now().isoformat()
else:
logger.warning(f"User information validation failed: {errors}")
return {
'valid': is_valid,
'user_info': validated_data if is_valid else None,
'errors': errors
}
except Exception as e:
logger.error(f"Error validating user information: {str(e)}")
return {
'valid': False,
'user_info': None,
'errors': [f"Validation error: {str(e)}"]
}
def configure_research_preferences(self, preferences: Dict[str, Any]) -> Dict[str, Any]:
"""
Configure research preferences for AI research.
Args:
preferences: Dictionary containing research preferences
Returns:
Dict containing configuration results
"""
try:
logger.info("Configuring research preferences")
errors = []
configured_preferences = {}
# Validate research depth
research_depth = preferences.get('research_depth', '')
if research_depth not in self.valid_research_depths:
errors.append(f"Research depth must be one of: {', '.join(self.valid_research_depths)}")
else:
configured_preferences['research_depth'] = research_depth
# Validate content types
content_types = preferences.get('content_types', [])
if not content_types:
errors.append("At least one content type must be selected")
else:
invalid_types = [ct for ct in content_types if ct not in self.valid_content_types]
if invalid_types:
errors.append(f"Invalid content types: {', '.join(invalid_types)}")
else:
configured_preferences['content_types'] = content_types
# Validate auto research setting
auto_research = preferences.get('auto_research', False)
if not isinstance(auto_research, bool):
errors.append("Auto research must be a boolean value")
else:
configured_preferences['auto_research'] = auto_research
# Determine configuration result
is_valid = len(errors) == 0
if is_valid:
logger.info("Research preferences configuration successful")
configured_preferences['configured_at'] = datetime.now().isoformat()
else:
logger.warning(f"Research preferences configuration failed: {errors}")
return {
'valid': is_valid,
'preferences': configured_preferences if is_valid else None,
'errors': errors
}
except Exception as e:
logger.error(f"Error configuring research preferences: {str(e)}")
return {
'valid': False,
'preferences': None,
'errors': [f"Configuration error: {str(e)}"]
}
def process_research_request(self, topic: str, preferences: Dict[str, Any]) -> Dict[str, Any]:
"""
Process a research request with configured preferences.
Args:
topic: The research topic
preferences: Configured research preferences
Returns:
Dict containing research processing results
"""
try:
logger.info(f"Processing research request for topic: {topic}")
# Validate topic
if not topic or len(topic.strip()) < 3:
return {
'success': False,
'topic': topic,
'error': 'Topic must be at least 3 characters long'
}
# Validate preferences
if not preferences:
return {
'success': False,
'topic': topic,
'error': 'Research preferences are required'
}
# Process research based on preferences
research_depth = preferences.get('research_depth', 'Standard')
content_types = preferences.get('content_types', [])
auto_research = preferences.get('auto_research', False)
# Simulate research processing (in real implementation, this would call AI services)
research_results = {
'topic': topic,
'research_depth': research_depth,
'content_types': content_types,
'auto_research': auto_research,
'processed_at': datetime.now().isoformat(),
'status': 'processed'
}
logger.info(f"Research request processed successfully for topic: {topic}")
return {
'success': True,
'topic': topic,
'results': research_results
}
except Exception as e:
logger.error(f"Error processing research request: {str(e)}")
return {
'success': False,
'topic': topic,
'error': f"Processing error: {str(e)}"
}
def get_research_configuration_options(self) -> Dict[str, Any]:
"""
Get available configuration options for research.
Returns:
Dict containing all available options
"""
return {
'roles': self.valid_roles,
'research_depths': self.valid_research_depths,
'content_types': self.valid_content_types,
'auto_research_options': [True, False]
}
def validate_complete_research_setup(self, user_info: Dict[str, Any], preferences: Dict[str, Any]) -> Dict[str, Any]:
"""
Validate complete research setup including user info and preferences.
Args:
user_info: User information dictionary
preferences: Research preferences dictionary
Returns:
Dict containing complete validation results
"""
try:
logger.info("Validating complete research setup")
# Validate user information
user_validation = self.validate_user_info(user_info)
# Validate research preferences
preferences_validation = self.configure_research_preferences(preferences)
# Combine results
all_errors = user_validation.get('errors', []) + preferences_validation.get('errors', [])
is_complete = user_validation.get('valid', False) and preferences_validation.get('valid', False)
return {
'complete': is_complete,
'user_info_valid': user_validation.get('valid', False),
'preferences_valid': preferences_validation.get('valid', False),
'errors': all_errors,
'user_info': user_validation.get('user_info'),
'preferences': preferences_validation.get('preferences')
}
except Exception as e:
logger.error(f"Error validating complete research setup: {str(e)}")
return {
'complete': False,
'user_info_valid': False,
'preferences_valid': False,
'errors': [f"Setup validation error: {str(e)}"]
}

View File

@@ -0,0 +1,337 @@
"""Personalization Logic Service for ALwrity Backend.
This service handles business logic for content personalization settings,
extracted from the legacy Streamlit component.
"""
from typing import Dict, Any, List, Optional
from loguru import logger
from datetime import datetime
class PersonalizationLogic:
"""Business logic for content personalization and brand voice configuration."""
def __init__(self):
"""Initialize the Personalization Logic service."""
self.valid_writing_styles = ["Professional", "Casual", "Technical", "Conversational", "Academic"]
self.valid_tones = ["Formal", "Semi-Formal", "Neutral", "Friendly", "Humorous"]
self.valid_content_lengths = ["Concise", "Standard", "Detailed", "Comprehensive"]
self.valid_personality_traits = ["Professional", "Innovative", "Friendly", "Trustworthy", "Creative", "Expert"]
self.valid_readability_levels = ["Simple", "Standard", "Advanced", "Expert"]
self.valid_content_structures = ["Introduction", "Key Points", "Examples", "Conclusion", "Call-to-Action"]
def validate_content_style(self, style_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Validate content style configuration.
Args:
style_data: Dictionary containing content style settings
Returns:
Dict containing validation results
"""
try:
logger.info("Validating content style configuration")
errors = []
validated_style = {}
# Validate writing style
writing_style = style_data.get('writing_style', '')
if writing_style not in self.valid_writing_styles:
errors.append(f"Writing style must be one of: {', '.join(self.valid_writing_styles)}")
else:
validated_style['writing_style'] = writing_style
# Validate tone
tone = style_data.get('tone', '')
if tone not in self.valid_tones:
errors.append(f"Tone must be one of: {', '.join(self.valid_tones)}")
else:
validated_style['tone'] = tone
# Validate content length
content_length = style_data.get('content_length', '')
if content_length not in self.valid_content_lengths:
errors.append(f"Content length must be one of: {', '.join(self.valid_content_lengths)}")
else:
validated_style['content_length'] = content_length
# Determine validation result
is_valid = len(errors) == 0
if is_valid:
logger.info("Content style validation successful")
validated_style['validated_at'] = datetime.now().isoformat()
else:
logger.warning(f"Content style validation failed: {errors}")
return {
'valid': is_valid,
'style_config': validated_style if is_valid else None,
'errors': errors
}
except Exception as e:
logger.error(f"Error validating content style: {str(e)}")
return {
'valid': False,
'style_config': None,
'errors': [f"Style validation error: {str(e)}"]
}
def configure_brand_voice(self, brand_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Configure brand voice settings.
Args:
brand_data: Dictionary containing brand voice settings
Returns:
Dict containing configuration results
"""
try:
logger.info("Configuring brand voice settings")
errors = []
configured_brand = {}
# Validate personality traits
personality_traits = brand_data.get('personality_traits', [])
if not personality_traits:
errors.append("At least one personality trait must be selected")
else:
invalid_traits = [trait for trait in personality_traits if trait not in self.valid_personality_traits]
if invalid_traits:
errors.append(f"Invalid personality traits: {', '.join(invalid_traits)}")
else:
configured_brand['personality_traits'] = personality_traits
# Validate voice description (optional but if provided, must be valid)
voice_description = brand_data.get('voice_description', '').strip()
if voice_description and len(voice_description) < 10:
errors.append("Voice description must be at least 10 characters long")
elif voice_description:
configured_brand['voice_description'] = voice_description
# Validate keywords (optional)
keywords = brand_data.get('keywords', '').strip()
if keywords:
configured_brand['keywords'] = keywords
# Determine configuration result
is_valid = len(errors) == 0
if is_valid:
logger.info("Brand voice configuration successful")
configured_brand['configured_at'] = datetime.now().isoformat()
else:
logger.warning(f"Brand voice configuration failed: {errors}")
return {
'valid': is_valid,
'brand_config': configured_brand if is_valid else None,
'errors': errors
}
except Exception as e:
logger.error(f"Error configuring brand voice: {str(e)}")
return {
'valid': False,
'brand_config': None,
'errors': [f"Brand configuration error: {str(e)}"]
}
def process_advanced_settings(self, settings: Dict[str, Any]) -> Dict[str, Any]:
"""
Process advanced content generation settings.
Args:
settings: Dictionary containing advanced settings
Returns:
Dict containing processing results
"""
try:
logger.info("Processing advanced content generation settings")
errors = []
processed_settings = {}
# Validate SEO optimization (boolean)
seo_optimization = settings.get('seo_optimization', False)
if not isinstance(seo_optimization, bool):
errors.append("SEO optimization must be a boolean value")
else:
processed_settings['seo_optimization'] = seo_optimization
# Validate readability level
readability_level = settings.get('readability_level', '')
if readability_level not in self.valid_readability_levels:
errors.append(f"Readability level must be one of: {', '.join(self.valid_readability_levels)}")
else:
processed_settings['readability_level'] = readability_level
# Validate content structure
content_structure = settings.get('content_structure', [])
if not content_structure:
errors.append("At least one content structure element must be selected")
else:
invalid_structures = [struct for struct in content_structure if struct not in self.valid_content_structures]
if invalid_structures:
errors.append(f"Invalid content structure elements: {', '.join(invalid_structures)}")
else:
processed_settings['content_structure'] = content_structure
# Determine processing result
is_valid = len(errors) == 0
if is_valid:
logger.info("Advanced settings processing successful")
processed_settings['processed_at'] = datetime.now().isoformat()
else:
logger.warning(f"Advanced settings processing failed: {errors}")
return {
'valid': is_valid,
'advanced_settings': processed_settings if is_valid else None,
'errors': errors
}
except Exception as e:
logger.error(f"Error processing advanced settings: {str(e)}")
return {
'valid': False,
'advanced_settings': None,
'errors': [f"Advanced settings error: {str(e)}"]
}
def process_personalization_settings(self, settings: Dict[str, Any]) -> Dict[str, Any]:
"""
Process complete personalization settings including all components.
Args:
settings: Dictionary containing complete personalization settings
Returns:
Dict containing processing results
"""
try:
logger.info("Processing complete personalization settings")
# Validate content style
content_style = settings.get('content_style', {})
style_validation = self.validate_content_style(content_style)
# Configure brand voice
brand_voice = settings.get('brand_voice', {})
brand_validation = self.configure_brand_voice(brand_voice)
# Process advanced settings
advanced_settings = settings.get('advanced_settings', {})
advanced_validation = self.process_advanced_settings(advanced_settings)
# Combine results
all_errors = (
style_validation.get('errors', []) +
brand_validation.get('errors', []) +
advanced_validation.get('errors', [])
)
is_complete = (
style_validation.get('valid', False) and
brand_validation.get('valid', False) and
advanced_validation.get('valid', False)
)
if is_complete:
# Combine all valid settings
complete_settings = {
'content_style': style_validation.get('style_config'),
'brand_voice': brand_validation.get('brand_config'),
'advanced_settings': advanced_validation.get('advanced_settings'),
'processed_at': datetime.now().isoformat()
}
logger.info("Complete personalization settings processed successfully")
return {
'valid': True,
'settings': complete_settings,
'errors': []
}
else:
logger.warning(f"Personalization settings processing failed: {all_errors}")
return {
'valid': False,
'settings': None,
'errors': all_errors
}
except Exception as e:
logger.error(f"Error processing personalization settings: {str(e)}")
return {
'valid': False,
'settings': None,
'errors': [f"Personalization processing error: {str(e)}"]
}
def get_personalization_configuration_options(self) -> Dict[str, Any]:
"""
Get available configuration options for personalization.
Returns:
Dict containing all available options
"""
return {
'writing_styles': self.valid_writing_styles,
'tones': self.valid_tones,
'content_lengths': self.valid_content_lengths,
'personality_traits': self.valid_personality_traits,
'readability_levels': self.valid_readability_levels,
'content_structures': self.valid_content_structures,
'seo_optimization_options': [True, False]
}
def generate_content_guidelines(self, settings: Dict[str, Any]) -> Dict[str, Any]:
"""
Generate content guidelines based on personalization settings.
Args:
settings: Validated personalization settings
Returns:
Dict containing content guidelines
"""
try:
logger.info("Generating content guidelines from personalization settings")
content_style = settings.get('content_style', {})
brand_voice = settings.get('brand_voice', {})
advanced_settings = settings.get('advanced_settings', {})
guidelines = {
'writing_style': content_style.get('writing_style', 'Professional'),
'tone': content_style.get('tone', 'Neutral'),
'content_length': content_style.get('content_length', 'Standard'),
'brand_personality': brand_voice.get('personality_traits', []),
'seo_optimized': advanced_settings.get('seo_optimization', False),
'readability_level': advanced_settings.get('readability_level', 'Standard'),
'required_sections': advanced_settings.get('content_structure', []),
'generated_at': datetime.now().isoformat()
}
logger.info("Content guidelines generated successfully")
return {
'success': True,
'guidelines': guidelines
}
except Exception as e:
logger.error(f"Error generating content guidelines: {str(e)}")
return {
'success': False,
'error': f"Guidelines generation error: {str(e)}"
}

View File

@@ -0,0 +1,325 @@
"""Research Utilities Service for ALwrity Backend.
This service handles research functionality and result processing,
extracted from the legacy AI research utilities.
"""
from typing import Dict, Any, List, Optional
from loguru import logger
import asyncio
from datetime import datetime
class ResearchUtilities:
"""Business logic for research functionality and result processing."""
def __init__(self):
"""Initialize the Research Utilities service."""
self.research_providers = {
'tavily': 'TAVILY_API_KEY',
'serper': 'SERPER_API_KEY',
'metaphor': 'METAPHOR_API_KEY',
'firecrawl': 'FIRECRAWL_API_KEY'
}
async def research_topic(self, topic: str, api_keys: Dict[str, str]) -> Dict[str, Any]:
"""
Research a topic using available AI services.
Args:
topic: The topic to research
api_keys: Dictionary of API keys for different services
Returns:
Dict containing research results and metadata
"""
try:
logger.info(f"Starting research on topic: {topic}")
# Validate topic
if not topic or len(topic.strip()) < 3:
return {
'success': False,
'topic': topic,
'error': 'Topic must be at least 3 characters long'
}
# Check available API keys
available_providers = []
for provider, key_name in self.research_providers.items():
if api_keys.get(key_name):
available_providers.append(provider)
if not available_providers:
return {
'success': False,
'topic': topic,
'error': 'No research providers available. Please configure API keys.'
}
# Simulate research processing (in real implementation, this would call actual AI services)
research_results = await self._simulate_research(topic, available_providers)
logger.info(f"Research completed successfully for topic: {topic}")
return {
'success': True,
'topic': topic,
'results': research_results,
'metadata': {
'providers_used': available_providers,
'research_timestamp': datetime.now().isoformat(),
'topic_length': len(topic)
}
}
except Exception as e:
logger.error(f"Error during research: {str(e)}")
return {
'success': False,
'topic': topic,
'error': str(e)
}
async def _simulate_research(self, topic: str, providers: List[str]) -> Dict[str, Any]:
"""
Simulate research processing for demonstration purposes.
In real implementation, this would call actual AI research services.
Args:
topic: The research topic
providers: List of available research providers
Returns:
Dict containing simulated research results
"""
# Simulate async processing time
await asyncio.sleep(0.1)
# Generate simulated research results
results = {
'summary': f"Comprehensive research summary for '{topic}' based on multiple sources.",
'key_points': [
f"Key insight 1 about {topic}",
f"Important finding 2 related to {topic}",
f"Notable trend 3 in {topic}",
f"Critical observation 4 regarding {topic}"
],
'sources': [
f"Research source 1 for {topic}",
f"Academic paper on {topic}",
f"Industry report about {topic}",
f"Expert analysis of {topic}"
],
'trends': [
f"Emerging trend in {topic}",
f"Growing interest in {topic}",
f"Market shift related to {topic}"
],
'recommendations': [
f"Action item 1 for {topic}",
f"Strategic recommendation for {topic}",
f"Next steps regarding {topic}"
],
'providers_used': providers,
'research_depth': 'comprehensive',
'confidence_score': 0.85
}
return results
def process_research_results(self, results: Dict[str, Any]) -> Dict[str, Any]:
"""
Process and format research results for better presentation.
Args:
results: Raw research results
Returns:
Dict containing processed and formatted results
"""
try:
logger.info("Processing research results")
if not results or 'success' not in results:
return {
'success': False,
'error': 'Invalid research results format'
}
if not results.get('success', False):
return results # Return error results as-is
# Process successful results
raw_results = results.get('results', {})
metadata = results.get('metadata', {})
# Format and structure the results
processed_results = {
'topic': results.get('topic', ''),
'summary': raw_results.get('summary', ''),
'key_insights': raw_results.get('key_points', []),
'sources': raw_results.get('sources', []),
'trends': raw_results.get('trends', []),
'recommendations': raw_results.get('recommendations', []),
'metadata': {
'providers_used': raw_results.get('providers_used', []),
'research_depth': raw_results.get('research_depth', 'standard'),
'confidence_score': raw_results.get('confidence_score', 0.0),
'processed_at': datetime.now().isoformat(),
'original_timestamp': metadata.get('research_timestamp')
}
}
logger.info("Research results processed successfully")
return {
'success': True,
'processed_results': processed_results
}
except Exception as e:
logger.error(f"Error processing research results: {str(e)}")
return {
'success': False,
'error': f"Results processing error: {str(e)}"
}
def validate_research_request(self, topic: str, api_keys: Dict[str, str]) -> Dict[str, Any]:
"""
Validate a research request before processing.
Args:
topic: The research topic
api_keys: Available API keys
Returns:
Dict containing validation results
"""
try:
logger.info(f"Validating research request for topic: {topic}")
errors = []
warnings = []
# Validate topic
if not topic or len(topic.strip()) < 3:
errors.append("Topic must be at least 3 characters long")
elif len(topic.strip()) > 500:
errors.append("Topic is too long (maximum 500 characters)")
# Check API keys
available_providers = []
for provider, key_name in self.research_providers.items():
if api_keys.get(key_name):
available_providers.append(provider)
else:
warnings.append(f"No API key for {provider}")
if not available_providers:
errors.append("No research providers available. Please configure at least one API key.")
# Determine validation result
is_valid = len(errors) == 0
return {
'valid': is_valid,
'errors': errors,
'warnings': warnings,
'available_providers': available_providers,
'topic_length': len(topic.strip()) if topic else 0
}
except Exception as e:
logger.error(f"Error validating research request: {str(e)}")
return {
'valid': False,
'errors': [f"Validation error: {str(e)}"],
'warnings': [],
'available_providers': [],
'topic_length': 0
}
def get_research_providers_info(self) -> Dict[str, Any]:
"""
Get information about available research providers.
Returns:
Dict containing provider information
"""
return {
'providers': {
'tavily': {
'name': 'Tavily',
'description': 'Intelligent web research',
'api_key_name': 'TAVILY_API_KEY',
'url': 'https://tavily.com/#api'
},
'serper': {
'name': 'Serper',
'description': 'Google search functionality',
'api_key_name': 'SERPER_API_KEY',
'url': 'https://serper.dev/signup'
},
'metaphor': {
'name': 'Metaphor',
'description': 'Advanced web search',
'api_key_name': 'METAPHOR_API_KEY',
'url': 'https://dashboard.exa.ai/login'
},
'firecrawl': {
'name': 'Firecrawl',
'description': 'Web content extraction',
'api_key_name': 'FIRECRAWL_API_KEY',
'url': 'https://www.firecrawl.dev/account'
}
},
'total_providers': len(self.research_providers)
}
def generate_research_report(self, results: Dict[str, Any]) -> Dict[str, Any]:
"""
Generate a formatted research report from processed results.
Args:
results: Processed research results
Returns:
Dict containing formatted research report
"""
try:
logger.info("Generating research report")
if not results.get('success', False):
return {
'success': False,
'error': 'Cannot generate report from failed research'
}
processed_results = results.get('processed_results', {})
# Generate formatted report
report = {
'title': f"Research Report: {processed_results.get('topic', 'Unknown Topic')}",
'executive_summary': processed_results.get('summary', ''),
'key_findings': processed_results.get('key_insights', []),
'trends_analysis': processed_results.get('trends', []),
'recommendations': processed_results.get('recommendations', []),
'sources': processed_results.get('sources', []),
'metadata': processed_results.get('metadata', {}),
'generated_at': datetime.now().isoformat(),
'report_format': 'structured'
}
logger.info("Research report generated successfully")
return {
'success': True,
'report': report
}
except Exception as e:
logger.error(f"Error generating research report: {str(e)}")
return {
'success': False,
'error': f"Report generation error: {str(e)}"
}

View File

@@ -0,0 +1,499 @@
"""Style Detection Logic Service for ALwrity Backend.
This service handles business logic for content style detection and analysis,
migrated from the legacy StyleAnalyzer functionality.
"""
from typing import Dict, Any, List, Optional
from loguru import logger
from datetime import datetime
import json
import re
import sys
import os
# Add the backend directory to Python path for absolute imports
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))
# Import the new backend LLM providers from services
from ..llm_providers.main_text_generation import llm_text_gen
class StyleDetectionLogic:
"""Business logic for content style detection and analysis."""
def __init__(self):
"""Initialize the Style Detection Logic service."""
logger.info("[StyleDetectionLogic.__init__] Initializing style detection service")
def _clean_json_response(self, text: str) -> str:
"""
Clean the LLM response to extract valid JSON.
Args:
text (str): Raw response from LLM
Returns:
str: Cleaned JSON string
"""
try:
# Remove markdown code block markers
cleaned_string = text.replace("```json", "").replace("```", "").strip()
# Log the cleaned JSON for debugging
logger.debug(f"[StyleDetectionLogic._clean_json_response] Cleaned JSON: {cleaned_string}")
return cleaned_string
except Exception as e:
logger.error(f"[StyleDetectionLogic._clean_json_response] Error cleaning response: {str(e)}")
return ""
def analyze_content_style(self, content: Dict[str, Any]) -> Dict[str, Any]:
"""
Analyze the style of the provided content using AI with enhanced prompts.
Args:
content (Dict): Content to analyze, containing main_content, title, etc.
Returns:
Dict: Analysis results with writing style, characteristics, and recommendations
"""
try:
logger.info("[StyleDetectionLogic.analyze_content_style] Starting enhanced style analysis")
# Extract content components
title = content.get('title', '')
description = content.get('description', '')
main_content = content.get('main_content', '')
headings = content.get('headings', [])
domain_info = content.get('domain_info', {})
brand_info = content.get('brand_info', {})
social_media = content.get('social_media', {})
content_structure = content.get('content_structure', {})
# Construct the enhanced analysis prompt
prompt = f"""Analyze the following website content for comprehensive writing style, tone, and characteristics.
This is a detailed analysis for content personalization and AI-powered content generation.
WEBSITE INFORMATION:
- Domain: {domain_info.get('domain_name', 'Unknown')}
- Website Type: {self._determine_website_type(domain_info)}
- Brand Name: {brand_info.get('company_name', 'Not specified')}
- Tagline: {brand_info.get('tagline', 'Not specified')}
- Social Media Presence: {', '.join(social_media.keys()) if social_media else 'None detected'}
CONTENT STRUCTURE:
- Headings: {len(headings)} total ({content_structure.get('headings', {}).get('h1', 0)} H1, {content_structure.get('headings', {}).get('h2', 0)} H2)
- Paragraphs: {content_structure.get('paragraphs', 0)}
- Images: {content_structure.get('images', 0)}
- Links: {content_structure.get('links', 0)}
- Has Navigation: {content_structure.get('has_navigation', False)}
- Has Call-to-Action: {content_structure.get('has_call_to_action', False)}
CONTENT TO ANALYZE:
Title: {title}
Description: {description}
Main Content: {main_content[:5000]} # Enhanced content length
Key Headings: {headings[:10]} # First 10 headings for context
ANALYSIS REQUIREMENTS:
1. Analyze the writing style, tone, and voice characteristics
2. Identify target audience demographics and expertise level
3. Determine content type and purpose
4. Assess content structure and organization patterns
5. Evaluate brand voice consistency and personality
6. Identify unique style elements and patterns
7. Consider the website type and industry context
8. Analyze social media presence impact on content style
IMPORTANT: Respond ONLY with a JSON object in the following format. Do not include any additional text, explanations, or markdown formatting:
{{
"writing_style": {{
"tone": "detailed tone description with context",
"voice": "active/passive with explanation",
"complexity": "simple/moderate/complex with reasoning",
"engagement_level": "low/medium/high with justification",
"brand_personality": "detailed brand personality analysis",
"formality_level": "casual/semi-formal/formal/professional",
"emotional_appeal": "rational/emotional/mixed with examples"
}},
"content_characteristics": {{
"sentence_structure": "detailed analysis of sentence patterns",
"vocabulary_level": "basic/intermediate/advanced with examples",
"paragraph_organization": "detailed structure analysis",
"content_flow": "detailed flow analysis",
"readability_score": "estimated readability level",
"content_density": "high/medium/low with reasoning",
"visual_elements_usage": "analysis of how visual elements complement text"
}},
"target_audience": {{
"demographics": ["detailed demographic analysis"],
"expertise_level": "beginner/intermediate/advanced with reasoning",
"industry_focus": "detailed industry analysis",
"geographic_focus": "detailed geographic analysis",
"psychographic_profile": "detailed psychographic analysis",
"pain_points": ["identified audience pain points"],
"motivations": ["identified audience motivations"]
}},
"content_type": {{
"primary_type": "detailed content type analysis",
"secondary_types": ["list of secondary content types"],
"purpose": "detailed content purpose analysis",
"call_to_action": "detailed CTA analysis",
"conversion_focus": "high/medium/low with reasoning",
"educational_value": "high/medium/low with reasoning"
}},
"brand_analysis": {{
"brand_voice": "detailed brand voice analysis",
"brand_values": ["identified brand values"],
"brand_positioning": "detailed positioning analysis",
"competitive_differentiation": "detailed differentiation analysis",
"trust_signals": ["identified trust elements"],
"authority_indicators": ["identified authority elements"]
}},
"content_strategy_insights": {{
"strengths": ["content strengths"],
"weaknesses": ["content weaknesses"],
"opportunities": ["content opportunities"],
"threats": ["content threats"],
"recommended_improvements": ["specific improvement suggestions"],
"content_gaps": ["identified content gaps"]
}},
"recommended_settings": {{
"writing_tone": "recommended tone for AI generation",
"target_audience": "recommended audience focus",
"content_type": "recommended content type",
"creativity_level": "low/medium/high with reasoning",
"geographic_location": "recommended geographic focus",
"industry_context": "recommended industry approach",
"brand_alignment": "recommended brand alignment strategy"
}}
}}
"""
# Call the LLM for analysis
logger.debug("[StyleDetectionLogic.analyze_content_style] Sending enhanced prompt to LLM")
analysis_text = llm_text_gen(prompt)
# Clean and parse the response
cleaned_json = self._clean_json_response(analysis_text)
try:
analysis_results = json.loads(cleaned_json)
logger.info("[StyleDetectionLogic.analyze_content_style] Successfully parsed enhanced analysis results")
return {
'success': True,
'analysis': analysis_results
}
except json.JSONDecodeError as e:
logger.error(f"[StyleDetectionLogic.analyze_content_style] Failed to parse JSON response: {e}")
logger.debug(f"[StyleDetectionLogic.analyze_content_style] Raw response: {analysis_text}")
return {
'success': False,
'error': 'Failed to parse analysis response'
}
except Exception as e:
logger.error(f"[StyleDetectionLogic.analyze_content_style] Error in enhanced analysis: {str(e)}")
return {
'success': False,
'error': str(e)
}
def _determine_website_type(self, domain_info: Dict[str, Any]) -> str:
"""Determine the type of website based on domain and content analysis."""
if domain_info.get('is_blog'):
return 'Blog/Content Platform'
elif domain_info.get('is_ecommerce'):
return 'E-commerce/Online Store'
elif domain_info.get('is_corporate'):
return 'Corporate/Business Website'
elif domain_info.get('has_blog_section'):
return 'Business with Blog'
elif domain_info.get('has_about_page') and domain_info.get('has_contact_page'):
return 'Professional Services'
else:
return 'General Website'
def _get_fallback_analysis(self, content: Dict[str, Any]) -> Dict[str, Any]:
"""Get fallback analysis when LLM analysis fails."""
main_content = content.get("main_content", "")
title = content.get("title", "")
# Simple content analysis based on content characteristics
content_length = len(main_content)
word_count = len(main_content.split())
# Determine tone based on content characteristics
if any(word in main_content.lower() for word in ['professional', 'business', 'industry', 'company']):
tone = "professional"
elif any(word in main_content.lower() for word in ['casual', 'fun', 'enjoy', 'exciting']):
tone = "casual"
else:
tone = "neutral"
# Determine complexity based on sentence length and vocabulary
avg_sentence_length = word_count / max(len([s for s in main_content.split('.') if s.strip()]), 1)
if avg_sentence_length > 20:
complexity = "complex"
elif avg_sentence_length > 15:
complexity = "moderate"
else:
complexity = "simple"
return {
"writing_style": {
"tone": tone,
"voice": "active",
"complexity": complexity,
"engagement_level": "medium"
},
"content_characteristics": {
"sentence_structure": "standard",
"vocabulary_level": "intermediate",
"paragraph_organization": "logical",
"content_flow": "smooth"
},
"target_audience": {
"demographics": ["general audience"],
"expertise_level": "intermediate",
"industry_focus": "general",
"geographic_focus": "global"
},
"content_type": {
"primary_type": "article",
"secondary_types": ["blog", "content"],
"purpose": "inform",
"call_to_action": "minimal"
},
"recommended_settings": {
"writing_tone": tone,
"target_audience": "general audience",
"content_type": "article",
"creativity_level": "medium",
"geographic_location": "global"
}
}
def analyze_style_patterns(self, content: Dict[str, Any]) -> Dict[str, Any]:
"""
Analyze recurring patterns in the content style.
Args:
content (Dict): Content to analyze
Returns:
Dict: Pattern analysis results
"""
try:
logger.info("[StyleDetectionLogic.analyze_style_patterns] Starting pattern analysis")
main_content = content.get("main_content", "")
prompt = f"""Analyze the following content for recurring writing patterns and style characteristics.
Focus on identifying patterns in sentence structure, vocabulary usage, and writing techniques.
Content: {main_content[:3000]}
IMPORTANT: Respond ONLY with a JSON object in the following format:
{{
"patterns": {{
"sentence_length": "short/medium/long",
"vocabulary_patterns": ["list of patterns"],
"rhetorical_devices": ["list of devices used"],
"paragraph_structure": "description",
"transition_phrases": ["list of common transitions"]
}},
"style_consistency": "high/medium/low",
"unique_elements": ["list of unique style elements"]
}}
"""
analysis_text = llm_text_gen(prompt)
cleaned_json = self._clean_json_response(analysis_text)
try:
pattern_results = json.loads(cleaned_json)
return {
'success': True,
'patterns': pattern_results
}
except json.JSONDecodeError as e:
logger.error(f"[StyleDetectionLogic.analyze_style_patterns] Failed to parse JSON response: {e}")
return {
'success': False,
'error': 'Failed to parse pattern analysis response'
}
except Exception as e:
logger.error(f"[StyleDetectionLogic.analyze_style_patterns] Error during analysis: {str(e)}")
return {
'success': False,
'error': str(e)
}
def generate_style_guidelines(self, analysis_results: Dict[str, Any]) -> Dict[str, Any]:
"""
Generate comprehensive content guidelines based on enhanced style analysis.
Args:
analysis_results (Dict): Results from enhanced style analysis
Returns:
Dict: Generated comprehensive guidelines
"""
try:
logger.info("[StyleDetectionLogic.generate_style_guidelines] Generating comprehensive style guidelines")
# Extract key information from analysis
writing_style = analysis_results.get('writing_style', {})
content_characteristics = analysis_results.get('content_characteristics', {})
target_audience = analysis_results.get('target_audience', {})
brand_analysis = analysis_results.get('brand_analysis', {})
content_strategy_insights = analysis_results.get('content_strategy_insights', {})
prompt = f"""Based on the following comprehensive style analysis, generate detailed content creation guidelines for AI-powered content generation.
ANALYSIS DATA:
Writing Style: {writing_style}
Content Characteristics: {content_characteristics}
Target Audience: {target_audience}
Brand Analysis: {brand_analysis}
Content Strategy Insights: {content_strategy_insights}
REQUIREMENTS:
1. Create actionable guidelines for AI content generation
2. Provide specific recommendations for maintaining brand voice
3. Include strategies for audience engagement
4. Address content gaps and opportunities
5. Consider competitive positioning
6. Provide technical writing recommendations
7. Include SEO and conversion optimization tips
8. Address content structure and formatting
IMPORTANT: Respond ONLY with a JSON object in the following format:
{{
"guidelines": {{
"tone_recommendations": [
"specific tone guidelines with examples",
"brand voice consistency tips",
"emotional appeal strategies"
],
"structure_guidelines": [
"content structure recommendations",
"formatting best practices",
"organization strategies"
],
"vocabulary_suggestions": [
"specific vocabulary recommendations",
"industry terminology guidance",
"language complexity advice"
],
"engagement_tips": [
"audience engagement strategies",
"interaction techniques",
"conversion optimization tips"
],
"audience_considerations": [
"specific audience targeting advice",
"pain point addressing strategies",
"motivation-based content tips"
],
"brand_alignment": [
"brand voice consistency guidelines",
"brand value integration tips",
"competitive differentiation strategies"
],
"seo_optimization": [
"keyword integration strategies",
"content optimization tips",
"search visibility recommendations"
],
"conversion_optimization": [
"call-to-action strategies",
"conversion funnel optimization",
"lead generation techniques"
]
}},
"best_practices": [
"comprehensive best practices list",
"industry-specific recommendations",
"quality assurance guidelines"
],
"avoid_elements": [
"elements to avoid with explanations",
"common pitfalls to prevent",
"brand-inappropriate content types"
],
"content_strategy": "comprehensive content strategy recommendation with specific action items",
"ai_generation_tips": [
"specific tips for AI content generation",
"prompt optimization strategies",
"quality control measures"
],
"competitive_advantages": [
"identified competitive advantages",
"differentiation strategies",
"market positioning recommendations"
],
"content_calendar_suggestions": [
"content frequency recommendations",
"topic planning strategies",
"seasonal content opportunities"
]
}}
"""
guidelines_text = llm_text_gen(prompt)
cleaned_json = self._clean_json_response(guidelines_text)
try:
guidelines = json.loads(cleaned_json)
return {
'success': True,
'guidelines': guidelines
}
except json.JSONDecodeError as e:
logger.error(f"[StyleDetectionLogic.generate_style_guidelines] Failed to parse JSON response: {e}")
return {
'success': False,
'error': 'Failed to parse guidelines response'
}
except Exception as e:
logger.error(f"[StyleDetectionLogic.generate_style_guidelines] Error generating guidelines: {str(e)}")
return {
'success': False,
'error': str(e)
}
def validate_style_analysis_request(self, request_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Validate style analysis request data.
Args:
request_data (Dict): Request data to validate
Returns:
Dict: Validation results
"""
errors = []
# Check if content is provided
if not request_data.get('content') and not request_data.get('url') and not request_data.get('text_sample'):
errors.append("Content is required for style analysis")
# Check content length
content = request_data.get('content', {})
main_content = content.get('main_content', '')
if len(main_content) < 50:
errors.append("Content must be at least 50 characters long for meaningful analysis")
# Check for required fields
if not content.get('title') and not content.get('main_content'):
errors.append("Either title or main content must be provided")
return {
'valid': len(errors) == 0,
'errors': errors
}

View File

@@ -0,0 +1,584 @@
"""Web Crawler Logic Service for ALwrity Backend.
This service handles business logic for web crawling and content extraction,
migrated from the legacy web crawler functionality.
"""
from typing import Dict, Any, List, Optional
from loguru import logger
from datetime import datetime
import asyncio
import aiohttp
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
import requests
import re
class WebCrawlerLogic:
"""Business logic for web crawling and content extraction."""
def __init__(self):
"""Initialize the Web Crawler Logic service."""
logger.info("[WebCrawlerLogic.__init__] Initializing web crawler service")
self.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
self.timeout = 30
self.max_content_length = 10000
def _validate_url(self, url: str) -> bool:
"""
Validate URL format and fix common formatting issues.
Args:
url (str): URL to validate
Returns:
bool: True if URL is valid
"""
try:
# Clean and fix common URL issues
cleaned_url = self._fix_url_format(url)
result = urlparse(cleaned_url)
# Check if we have both scheme and netloc
if not all([result.scheme, result.netloc]):
return False
# Additional validation for domain format
domain = result.netloc
if '.' not in domain or len(domain.split('.')[-1]) < 2:
return False
return True
except Exception as e:
logger.error(f"[WebCrawlerLogic._validate_url] URL validation error: {str(e)}")
return False
def _fix_url_format(self, url: str) -> str:
"""
Fix common URL formatting issues.
Args:
url (str): URL to fix
Returns:
str: Fixed URL
"""
# Remove leading/trailing whitespace
url = url.strip()
# Check if URL already has a protocol but is missing slashes
if url.startswith('https:/') and not url.startswith('https://'):
url = url.replace('https:/', 'https://')
elif url.startswith('http:/') and not url.startswith('http://'):
url = url.replace('http:/', 'http://')
# Add protocol if missing
if not url.startswith(('http://', 'https://')):
url = 'https://' + url
# Fix missing slash after protocol
if '://' in url and not url.split('://')[1].startswith('/'):
url = url.replace('://', ':///')
# Ensure only two slashes after protocol
if ':///' in url:
url = url.replace(':///', '://')
logger.debug(f"[WebCrawlerLogic._fix_url_format] Fixed URL: {url}")
return url
async def crawl_website(self, url: str) -> Dict[str, Any]:
"""
Crawl a website and extract its content asynchronously with enhanced data extraction.
Args:
url (str): The URL to crawl
Returns:
Dict: Extracted website content and metadata
"""
try:
logger.info(f"[WebCrawlerLogic.crawl_website] Starting enhanced crawl for URL: {url}")
# Fix URL format first
fixed_url = self._fix_url_format(url)
logger.info(f"[WebCrawlerLogic.crawl_website] Fixed URL: {fixed_url}")
# Validate URL
if not self._validate_url(fixed_url):
error_msg = f"Invalid URL format: {url}"
logger.error(f"[WebCrawlerLogic.crawl_website] {error_msg}")
return {
'success': False,
'error': error_msg
}
# Fetch the page content
try:
async with aiohttp.ClientSession(headers=self.headers, timeout=aiohttp.ClientTimeout(total=self.timeout)) as session:
async with session.get(fixed_url) as response:
if response.status == 200:
html_content = await response.text()
logger.debug("[WebCrawlerLogic.crawl_website] Successfully fetched HTML content")
else:
error_msg = f"Failed to fetch content: Status code {response.status}"
logger.error(f"[WebCrawlerLogic.crawl_website] {error_msg}")
return {
'success': False,
'error': error_msg
}
except Exception as e:
error_msg = f"Failed to fetch content from {fixed_url}: {str(e)}"
logger.error(f"[WebCrawlerLogic.crawl_website] {error_msg}")
return {
'success': False,
'error': error_msg
}
# Parse HTML with BeautifulSoup
logger.debug("[WebCrawlerLogic.crawl_website] Parsing HTML content")
soup = BeautifulSoup(html_content, 'html.parser')
# Extract domain information
domain_info = self._extract_domain_info(fixed_url, soup)
# Extract enhanced main content
main_content = self._extract_enhanced_content(soup)
# Extract social media and brand information
social_media = self._extract_social_media(soup)
brand_info = self._extract_brand_information(soup)
# Extract content structure and patterns
content_structure = self._extract_content_structure(soup)
# Extract content
content = {
'title': soup.title.string.strip() if soup.title else '',
'description': soup.find('meta', {'name': 'description'}).get('content', '').strip() if soup.find('meta', {'name': 'description'}) else '',
'main_content': main_content,
'headings': [h.get_text(strip=True) for h in soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])],
'links': [{'text': a.get_text(strip=True), 'href': urljoin(fixed_url, a.get('href', ''))} for a in soup.find_all('a', href=True)],
'images': [{'alt': img.get('alt', '').strip(), 'src': urljoin(fixed_url, img.get('src', ''))} for img in soup.find_all('img', src=True)],
'meta_tags': {
meta.get('name', meta.get('property', '')): meta.get('content', '').strip()
for meta in soup.find_all('meta')
if (meta.get('name') or meta.get('property')) and meta.get('content')
},
'domain_info': domain_info,
'social_media': social_media,
'brand_info': brand_info,
'content_structure': content_structure
}
logger.debug(f"[WebCrawlerLogic.crawl_website] Extracted {len(content['links'])} links, {len(content['images'])} images, and {len(social_media)} social media links")
logger.info("[WebCrawlerLogic.crawl_website] Successfully completed enhanced website crawl")
return {
'success': True,
'content': content,
'url': fixed_url,
'timestamp': datetime.now().isoformat()
}
except Exception as e:
error_msg = f"Error crawling {url}: {str(e)}"
logger.error(f"[WebCrawlerLogic.crawl_website] {error_msg}")
return {
'success': False,
'error': str(e)
}
def _extract_domain_info(self, url: str, soup: BeautifulSoup) -> Dict[str, Any]:
"""Extract domain-specific information."""
try:
domain = urlparse(url).netloc
return {
'domain': domain,
'domain_name': domain.replace('www.', ''),
'is_blog': any(keyword in domain.lower() for keyword in ['blog', 'medium', 'substack', 'wordpress']),
'is_ecommerce': any(keyword in domain.lower() for keyword in ['shop', 'store', 'cart', 'buy', 'amazon', 'ebay']),
'is_corporate': any(keyword in domain.lower() for keyword in ['corp', 'inc', 'llc', 'company', 'business']),
'has_blog_section': bool(soup.find('a', href=re.compile(r'blog|news|articles', re.I))),
'has_about_page': bool(soup.find('a', href=re.compile(r'about|company|team', re.I))),
'has_contact_page': bool(soup.find('a', href=re.compile(r'contact|support|help', re.I)))
}
except Exception as e:
logger.error(f"[WebCrawlerLogic._extract_domain_info] Error: {str(e)}")
return {}
def _extract_enhanced_content(self, soup: BeautifulSoup) -> str:
"""Extract enhanced main content with better structure detection."""
try:
# Try to find main content areas
main_content_elements = []
# Look for semantic content containers
semantic_selectors = [
'article', 'main', '[role="main"]',
'.content', '.main-content', '.article', '.post',
'.entry', '.page-content', '.site-content'
]
for selector in semantic_selectors:
elements = soup.select(selector)
if elements:
main_content_elements.extend(elements)
break
# If no semantic containers found, look for content-rich divs
if not main_content_elements:
content_divs = soup.find_all('div', class_=re.compile(r'content|main|article|post|entry', re.I))
main_content_elements = content_divs
# If still no content, get all paragraph text
if not main_content_elements:
main_content_elements = soup.find_all(['p', 'article', 'section'])
# Extract text with better formatting
content_parts = []
for elem in main_content_elements:
text = elem.get_text(separator=' ', strip=True)
if text and len(text) > 20: # Only include substantial text
content_parts.append(text)
main_content = ' '.join(content_parts)
# Limit content length
if len(main_content) > self.max_content_length:
main_content = main_content[:self.max_content_length] + "..."
return main_content
except Exception as e:
logger.error(f"[WebCrawlerLogic._extract_enhanced_content] Error: {str(e)}")
return ''
def _extract_social_media(self, soup: BeautifulSoup) -> Dict[str, str]:
"""Extract social media links and handles."""
social_media = {}
try:
# Common social media patterns
social_patterns = {
'facebook': r'facebook\.com|fb\.com',
'twitter': r'twitter\.com|x\.com',
'linkedin': r'linkedin\.com',
'instagram': r'instagram\.com',
'youtube': r'youtube\.com|youtu\.be',
'tiktok': r'tiktok\.com',
'pinterest': r'pinterest\.com',
'github': r'github\.com'
}
# Find all links
links = soup.find_all('a', href=True)
for link in links:
href = link.get('href', '').lower()
for platform, pattern in social_patterns.items():
if re.search(pattern, href):
social_media[platform] = href
break
# Also check for social media meta tags
meta_social = {
'og:site_name': 'site_name',
'twitter:site': 'twitter',
'twitter:creator': 'twitter_creator'
}
for meta in soup.find_all('meta', property=True):
prop = meta.get('property', '')
if prop in meta_social:
social_media[meta_social[prop]] = meta.get('content', '')
return social_media
except Exception as e:
logger.error(f"[WebCrawlerLogic._extract_social_media] Error: {str(e)}")
return {}
def _extract_brand_information(self, soup: BeautifulSoup) -> Dict[str, Any]:
"""Extract brand and company information."""
brand_info = {}
try:
# Extract logo information
logos = soup.find_all('img', alt=re.compile(r'logo|brand', re.I))
if logos:
brand_info['logo_alt'] = [logo.get('alt', '') for logo in logos]
# Extract company name from various sources
company_name_selectors = [
'h1', '.logo', '.brand', '.company-name',
'[class*="logo"]', '[class*="brand"]'
]
for selector in company_name_selectors:
elements = soup.select(selector)
if elements:
brand_info['company_name'] = elements[0].get_text(strip=True)
break
# Extract taglines and slogans
tagline_selectors = [
'.tagline', '.slogan', '.motto',
'[class*="tagline"]', '[class*="slogan"]'
]
for selector in tagline_selectors:
elements = soup.select(selector)
if elements:
brand_info['tagline'] = elements[0].get_text(strip=True)
break
# Extract contact information
contact_info = {}
contact_patterns = {
'email': r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',
'phone': r'[\+]?[1-9][\d]{0,15}',
'address': r'\d+\s+[a-zA-Z\s]+(?:street|st|avenue|ave|road|rd|boulevard|blvd)'
}
for info_type, pattern in contact_patterns.items():
matches = re.findall(pattern, soup.get_text())
if matches:
contact_info[info_type] = matches[:3] # Limit to first 3 matches
brand_info['contact_info'] = contact_info
return brand_info
except Exception as e:
logger.error(f"[WebCrawlerLogic._extract_brand_information] Error: {str(e)}")
return {}
def _extract_content_structure(self, soup: BeautifulSoup) -> Dict[str, Any]:
"""Extract content structure and patterns."""
structure = {}
try:
# Count different content types
structure['headings'] = {
'h1': len(soup.find_all('h1')),
'h2': len(soup.find_all('h2')),
'h3': len(soup.find_all('h3')),
'h4': len(soup.find_all('h4')),
'h5': len(soup.find_all('h5')),
'h6': len(soup.find_all('h6'))
}
structure['paragraphs'] = len(soup.find_all('p'))
structure['lists'] = len(soup.find_all(['ul', 'ol']))
structure['images'] = len(soup.find_all('img'))
structure['links'] = len(soup.find_all('a'))
# Analyze content sections
sections = soup.find_all(['section', 'article', 'div'], class_=re.compile(r'section|article|content', re.I))
structure['content_sections'] = len(sections)
# Check for common content patterns
structure['has_navigation'] = bool(soup.find(['nav', 'header']))
structure['has_footer'] = bool(soup.find('footer'))
structure['has_sidebar'] = bool(soup.find(class_=re.compile(r'sidebar|aside', re.I)))
structure['has_call_to_action'] = bool(soup.find(text=re.compile(r'click|buy|sign|register|subscribe', re.I)))
return structure
except Exception as e:
logger.error(f"[WebCrawlerLogic._extract_content_structure] Error: {str(e)}")
return {}
def extract_content_from_text(self, text: str) -> Dict[str, Any]:
"""
Extract content from provided text sample.
Args:
text (str): Text content to process
Returns:
Dict: Processed content with metadata
"""
try:
logger.info("[WebCrawlerLogic.extract_content_from_text] Processing text content")
# Clean and process text
cleaned_text = re.sub(r'\s+', ' ', text.strip())
# Split into sentences for analysis
sentences = [s.strip() for s in cleaned_text.split('.') if s.strip()]
# Extract basic metrics
words = cleaned_text.split()
word_count = len(words)
sentence_count = len(sentences)
avg_sentence_length = word_count / max(sentence_count, 1)
content = {
'title': 'Text Sample',
'description': 'Content provided as text sample',
'main_content': cleaned_text,
'headings': [],
'links': [],
'images': [],
'meta_tags': {},
'metrics': {
'word_count': word_count,
'sentence_count': sentence_count,
'avg_sentence_length': avg_sentence_length,
'unique_words': len(set(words)),
'content_length': len(cleaned_text)
}
}
logger.info("[WebCrawlerLogic.extract_content_from_text] Successfully processed text content")
return {
'success': True,
'content': content,
'timestamp': datetime.now().isoformat()
}
except Exception as e:
error_msg = f"Error processing text content: {str(e)}"
logger.error(f"[WebCrawlerLogic.extract_content_from_text] {error_msg}")
return {
'success': False,
'error': error_msg
}
def validate_crawl_request(self, request_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Validate web crawl request data.
Args:
request_data (Dict): Request data to validate
Returns:
Dict: Validation results
"""
try:
logger.info("[WebCrawlerLogic.validate_crawl_request] Validating request")
errors = []
# Check for required fields
url = request_data.get('url', '')
text_sample = request_data.get('text_sample', '')
if not url and not text_sample:
errors.append("Either URL or text sample is required")
if url and not self._validate_url(url):
errors.append("Invalid URL format")
if text_sample and len(text_sample) < 50:
errors.append("Text sample must be at least 50 characters")
if text_sample and len(text_sample) > 10000:
errors.append("Text sample is too long (max 10,000 characters)")
if errors:
return {
'valid': False,
'errors': errors
}
logger.info("[WebCrawlerLogic.validate_crawl_request] Request validation successful")
return {
'valid': True,
'url': url,
'text_sample': text_sample
}
except Exception as e:
logger.error(f"[WebCrawlerLogic.validate_crawl_request] Validation error: {str(e)}")
return {
'valid': False,
'errors': [f"Validation error: {str(e)}"]
}
def get_crawl_metrics(self, content: Dict[str, Any]) -> Dict[str, Any]:
"""
Calculate metrics for crawled content.
Args:
content (Dict): Content to analyze
Returns:
Dict: Content metrics
"""
try:
logger.info("[WebCrawlerLogic.get_crawl_metrics] Calculating content metrics")
main_content = content.get('main_content', '')
title = content.get('title', '')
description = content.get('description', '')
headings = content.get('headings', [])
links = content.get('links', [])
images = content.get('images', [])
# Calculate metrics
words = main_content.split()
sentences = [s.strip() for s in main_content.split('.') if s.strip()]
metrics = {
'word_count': len(words),
'sentence_count': len(sentences),
'avg_sentence_length': len(words) / max(len(sentences), 1),
'unique_words': len(set(words)),
'content_length': len(main_content),
'title_length': len(title),
'description_length': len(description),
'heading_count': len(headings),
'link_count': len(links),
'image_count': len(images),
'readability_score': self._calculate_readability(main_content),
'content_density': len(set(words)) / max(len(words), 1)
}
logger.info("[WebCrawlerLogic.get_crawl_metrics] Metrics calculated successfully")
return {
'success': True,
'metrics': metrics
}
except Exception as e:
logger.error(f"[WebCrawlerLogic.get_crawl_metrics] Error calculating metrics: {str(e)}")
return {
'success': False,
'error': str(e)
}
def _calculate_readability(self, text: str) -> float:
"""
Calculate a simple readability score.
Args:
text (str): Text to analyze
Returns:
float: Readability score (0-1)
"""
try:
if not text:
return 0.0
words = text.split()
sentences = [s.strip() for s in text.split('.') if s.strip()]
if not sentences:
return 0.0
# Simple Flesch Reading Ease approximation
avg_sentence_length = len(words) / len(sentences)
avg_word_length = sum(len(word) for word in words) / len(words)
# Normalize to 0-1 scale
readability = max(0, min(1, (100 - avg_sentence_length - avg_word_length) / 100))
return round(readability, 2)
except Exception as e:
logger.error(f"[WebCrawlerLogic._calculate_readability] Error: {str(e)}")
return 0.5

View File

@@ -0,0 +1,836 @@
"""
AI Engine Service
Provides AI-powered insights and analysis for content planning.
"""
from typing import Dict, Any, List, Optional
from sqlalchemy.orm import Session
from loguru import logger
from datetime import datetime
import asyncio
import json
from collections import Counter, defaultdict
# Import AI providers
from llm_providers.main_text_generation import llm_text_gen
from llm_providers.gemini_provider import gemini_structured_json_response
# Import services
from services.ai_service_manager import AIServiceManager
# Import existing modules (will be updated to use FastAPI services)
from services.database import get_db_session
class AIEngineService:
"""AI engine for content planning insights and analysis."""
def __init__(self):
"""Initialize the AI engine service."""
self.ai_service_manager = AIServiceManager()
logger.info("AIEngineService initialized")
async def analyze_content_gaps(self, analysis_summary: Dict[str, Any]) -> Dict[str, Any]:
"""
Analyze content gaps using AI insights.
Args:
analysis_summary: Summary of content analysis
Returns:
AI-powered content gap insights
"""
try:
logger.info("🤖 Generating AI-powered content gap insights using centralized AI service")
# Use the centralized AI service manager for strategic analysis
result = await self.ai_service_manager.generate_content_gap_analysis(analysis_summary)
logger.info("✅ Advanced AI content gap analysis completed")
return result
except Exception as e:
logger.error(f"Error in AI content gap analysis: {str(e)}")
# Return fallback response if AI fails
return {
'strategic_insights': [
{
'type': 'content_strategy',
'insight': 'Focus on educational content to build authority',
'confidence': 0.85,
'priority': 'high',
'estimated_impact': 'Authority building'
}
],
'content_recommendations': [
{
'type': 'content_creation',
'recommendation': 'Create comprehensive guides for high-opportunity keywords',
'priority': 'high',
'estimated_traffic': '5K+ monthly',
'implementation_time': '2-3 weeks'
}
],
'performance_predictions': {
'estimated_traffic_increase': '25%',
'estimated_ranking_improvement': '15 positions',
'estimated_engagement_increase': '30%',
'estimated_conversion_increase': '20%',
'confidence_level': '85%'
},
'risk_assessment': {
'content_quality_risk': 'Low',
'competition_risk': 'Medium',
'implementation_risk': 'Low',
'timeline_risk': 'Medium',
'overall_risk': 'Low'
}
}
async def analyze_market_position(self, market_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Analyze market position using AI insights.
Args:
market_data: Market analysis data
Returns:
AI-powered market position analysis
"""
try:
logger.info("🤖 Generating AI-powered market position analysis using centralized AI service")
# Use the centralized AI service manager for market position analysis
result = await self.ai_service_manager.generate_market_position_analysis(market_data)
logger.info("✅ Advanced AI market position analysis completed")
return result
except Exception as e:
logger.error(f"Error in AI market position analysis: {str(e)}")
# Return fallback response if AI fails
return {
'market_leader': 'competitor1.com',
'content_leader': 'competitor2.com',
'quality_leader': 'competitor3.com',
'market_gaps': [
'Video content',
'Interactive content',
'User-generated content',
'Expert interviews',
'Industry reports'
],
'opportunities': [
'Niche content development',
'Expert interviews',
'Industry reports',
'Case studies',
'Tutorial series'
],
'competitive_advantages': [
'Technical expertise',
'Comprehensive guides',
'Industry insights',
'Expert opinions'
],
'strategic_recommendations': [
{
'type': 'differentiation',
'recommendation': 'Focus on unique content angles',
'priority': 'high',
'estimated_impact': 'Brand differentiation'
},
{
'type': 'quality',
'recommendation': 'Improve content quality and depth',
'priority': 'high',
'estimated_impact': 'Authority building'
},
{
'type': 'innovation',
'recommendation': 'Develop innovative content formats',
'priority': 'medium',
'estimated_impact': 'Engagement improvement'
}
]
}
async def generate_content_recommendations(self, analysis_data: Dict[str, Any]) -> List[Dict[str, Any]]:
"""
Generate AI-powered content recommendations.
Args:
analysis_data: Content analysis data
Returns:
List of AI-generated content recommendations
"""
try:
logger.info("🤖 Generating AI-powered content recommendations")
# Create comprehensive prompt for content recommendations
prompt = f"""
Generate content recommendations based on the following analysis data:
Analysis Data: {json.dumps(analysis_data, indent=2)}
Provide detailed content recommendations including:
1. Content creation opportunities
2. Content optimization suggestions
3. Content series development
4. Content format recommendations
5. Implementation priorities
6. Estimated impact and timeline
Format as structured JSON with detailed recommendations.
"""
# Use structured JSON response for better parsing
response = gemini_structured_json_response(
prompt=prompt,
schema={
"type": "object",
"properties": {
"recommendations": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {"type": "string"},
"title": {"type": "string"},
"description": {"type": "string"},
"priority": {"type": "string"},
"estimated_impact": {"type": "string"},
"implementation_time": {"type": "string"},
"ai_confidence": {"type": "number"},
"content_suggestions": {
"type": "array",
"items": {"type": "string"}
}
}
}
}
}
}
)
# Parse and return the AI response
result = json.loads(response)
recommendations = result.get('recommendations', [])
logger.info(f"✅ Generated {len(recommendations)} AI content recommendations")
return recommendations
except Exception as e:
logger.error(f"Error generating AI content recommendations: {str(e)}")
# Return fallback response if AI fails
return [
{
'type': 'content_creation',
'title': 'Create comprehensive guide for target keyword',
'description': 'Develop in-depth guide covering all aspects of the topic',
'priority': 'high',
'estimated_impact': '5K+ monthly traffic',
'implementation_time': '2-3 weeks',
'ai_confidence': 0.92,
'content_suggestions': [
'Step-by-step tutorial',
'Best practices section',
'Common mistakes to avoid',
'Expert tips and insights'
]
},
{
'type': 'content_optimization',
'title': 'Optimize existing content for target keywords',
'description': 'Update current content to improve rankings',
'priority': 'medium',
'estimated_impact': '2K+ monthly traffic',
'implementation_time': '1-2 weeks',
'ai_confidence': 0.88,
'content_suggestions': [
'Add target keywords naturally',
'Improve meta descriptions',
'Enhance internal linking',
'Update outdated information'
]
},
{
'type': 'content_series',
'title': 'Develop content series around main topic',
'description': 'Create interconnected content pieces',
'priority': 'medium',
'estimated_impact': '3K+ monthly traffic',
'implementation_time': '4-6 weeks',
'ai_confidence': 0.85,
'content_suggestions': [
'Part 1: Introduction and basics',
'Part 2: Advanced techniques',
'Part 3: Expert-level insights',
'Part 4: Case studies and examples'
]
}
]
async def predict_content_performance(self, content_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Predict content performance using AI.
Args:
content_data: Content analysis data
Returns:
AI-powered performance predictions
"""
try:
logger.info("🤖 Generating AI-powered performance predictions")
# Create comprehensive prompt for performance prediction
prompt = f"""
Predict content performance based on the following data:
Content Data: {json.dumps(content_data, indent=2)}
Provide detailed performance predictions including:
1. Traffic predictions
2. Engagement predictions
3. Ranking predictions
4. Conversion predictions
5. Risk factors
6. Success factors
Format as structured JSON with confidence levels.
"""
# Use structured JSON response for better parsing
response = gemini_structured_json_response(
prompt=prompt,
schema={
"type": "object",
"properties": {
"traffic_predictions": {
"type": "object",
"properties": {
"estimated_monthly_traffic": {"type": "string"},
"traffic_growth_rate": {"type": "string"},
"peak_traffic_month": {"type": "string"},
"confidence_level": {"type": "string"}
}
},
"engagement_predictions": {
"type": "object",
"properties": {
"estimated_time_on_page": {"type": "string"},
"estimated_bounce_rate": {"type": "string"},
"estimated_social_shares": {"type": "string"},
"estimated_comments": {"type": "string"},
"confidence_level": {"type": "string"}
}
},
"ranking_predictions": {
"type": "object",
"properties": {
"estimated_ranking_position": {"type": "string"},
"estimated_ranking_time": {"type": "string"},
"ranking_confidence": {"type": "string"},
"competition_level": {"type": "string"}
}
},
"conversion_predictions": {
"type": "object",
"properties": {
"estimated_conversion_rate": {"type": "string"},
"estimated_lead_generation": {"type": "string"},
"estimated_revenue_impact": {"type": "string"},
"confidence_level": {"type": "string"}
}
},
"risk_factors": {
"type": "array",
"items": {"type": "string"}
},
"success_factors": {
"type": "array",
"items": {"type": "string"}
}
}
}
)
# Parse and return the AI response
predictions = json.loads(response)
logger.info("✅ AI performance predictions completed")
return predictions
except Exception as e:
logger.error(f"Error in AI performance prediction: {str(e)}")
# Return fallback response if AI fails
return {
'traffic_predictions': {
'estimated_monthly_traffic': '5K+',
'traffic_growth_rate': '25%',
'peak_traffic_month': 'Q4',
'confidence_level': '85%'
},
'engagement_predictions': {
'estimated_time_on_page': '3-5 minutes',
'estimated_bounce_rate': '35%',
'estimated_social_shares': '50+',
'estimated_comments': '15+',
'confidence_level': '80%'
},
'ranking_predictions': {
'estimated_ranking_position': 'Top 10',
'estimated_ranking_time': '2-3 months',
'ranking_confidence': '75%',
'competition_level': 'Medium'
},
'conversion_predictions': {
'estimated_conversion_rate': '3-5%',
'estimated_lead_generation': '100+ monthly',
'estimated_revenue_impact': '$10K+ monthly',
'confidence_level': '70%'
},
'risk_factors': [
'High competition for target keywords',
'Seasonal content performance variations',
'Content quality requirements',
'Implementation timeline constraints'
],
'success_factors': [
'Comprehensive content coverage',
'Expert-level insights',
'Engaging content format',
'Strong internal linking',
'Regular content updates'
]
}
async def analyze_competitive_intelligence(self, competitor_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Analyze competitive intelligence using AI.
Args:
competitor_data: Competitor analysis data
Returns:
AI-powered competitive intelligence
"""
try:
logger.info("🤖 Generating AI-powered competitive intelligence")
# Create comprehensive prompt for competitive intelligence
prompt = f"""
Analyze competitive intelligence based on the following competitor data:
Competitor Data: {json.dumps(competitor_data, indent=2)}
Provide comprehensive competitive intelligence including:
1. Market analysis
2. Content strategy insights
3. Competitive advantages
4. Threat analysis
5. Opportunity analysis
Format as structured JSON with detailed analysis.
"""
# Use structured JSON response for better parsing
response = gemini_structured_json_response(
prompt=prompt,
schema={
"type": "object",
"properties": {
"market_analysis": {
"type": "object",
"properties": {
"market_leader": {"type": "string"},
"content_leader": {"type": "string"},
"innovation_leader": {"type": "string"},
"market_gaps": {
"type": "array",
"items": {"type": "string"}
}
}
},
"content_strategy_insights": {
"type": "array",
"items": {
"type": "object",
"properties": {
"insight": {"type": "string"},
"opportunity": {"type": "string"},
"priority": {"type": "string"},
"estimated_impact": {"type": "string"}
}
}
},
"competitive_advantages": {
"type": "array",
"items": {"type": "string"}
},
"threat_analysis": {
"type": "array",
"items": {
"type": "object",
"properties": {
"threat": {"type": "string"},
"risk_level": {"type": "string"},
"mitigation": {"type": "string"}
}
}
},
"opportunity_analysis": {
"type": "array",
"items": {
"type": "object",
"properties": {
"opportunity": {"type": "string"},
"market_gap": {"type": "string"},
"estimated_impact": {"type": "string"},
"implementation_time": {"type": "string"}
}
}
}
}
}
)
# Parse and return the AI response
competitive_intelligence = json.loads(response)
logger.info("✅ AI competitive intelligence completed")
return competitive_intelligence
except Exception as e:
logger.error(f"Error in AI competitive intelligence: {str(e)}")
# Return fallback response if AI fails
return {
'market_analysis': {
'market_leader': 'competitor1.com',
'content_leader': 'competitor2.com',
'innovation_leader': 'competitor3.com',
'market_gaps': [
'Video tutorials',
'Interactive content',
'Expert interviews',
'Industry reports'
]
},
'content_strategy_insights': [
{
'insight': 'Competitors focus heavily on educational content',
'opportunity': 'Develop unique content angles',
'priority': 'high',
'estimated_impact': 'Differentiation'
},
{
'insight': 'Limited video content in the market',
'opportunity': 'Create video tutorials and guides',
'priority': 'medium',
'estimated_impact': 'Engagement improvement'
},
{
'insight': 'High demand for expert insights',
'opportunity': 'Develop expert interview series',
'priority': 'high',
'estimated_impact': 'Authority building'
}
],
'competitive_advantages': [
'Technical expertise',
'Comprehensive content coverage',
'Industry insights',
'Expert opinions',
'Practical examples'
],
'threat_analysis': [
{
'threat': 'Competitor content quality improvement',
'risk_level': 'Medium',
'mitigation': 'Focus on unique value propositions'
},
{
'threat': 'New competitors entering market',
'risk_level': 'Low',
'mitigation': 'Build strong brand authority'
},
{
'threat': 'Content saturation in key topics',
'risk_level': 'High',
'mitigation': 'Develop niche content areas'
}
],
'opportunity_analysis': [
{
'opportunity': 'Video content development',
'market_gap': 'Limited video tutorials',
'estimated_impact': 'High engagement',
'implementation_time': '3-6 months'
},
{
'opportunity': 'Expert interview series',
'market_gap': 'Lack of expert insights',
'estimated_impact': 'Authority building',
'implementation_time': '2-4 months'
},
{
'opportunity': 'Interactive content',
'market_gap': 'No interactive elements',
'estimated_impact': 'User engagement',
'implementation_time': '1-3 months'
}
]
}
async def generate_strategic_insights(self, analysis_data: Dict[str, Any]) -> List[Dict[str, Any]]:
"""
Generate strategic insights using AI.
Args:
analysis_data: Analysis data
Returns:
List of AI-generated strategic insights
"""
try:
logger.info("🤖 Generating AI-powered strategic insights")
# Create comprehensive prompt for strategic insights
prompt = f"""
Generate strategic insights based on the following analysis data:
Analysis Data: {json.dumps(analysis_data, indent=2)}
Provide strategic insights covering:
1. Content strategy recommendations
2. Competitive positioning advice
3. Content optimization suggestions
4. Innovation opportunities
5. Risk mitigation strategies
Format as structured JSON with detailed insights.
"""
# Use structured JSON response for better parsing
response = gemini_structured_json_response(
prompt=prompt,
schema={
"type": "object",
"properties": {
"strategic_insights": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {"type": "string"},
"insight": {"type": "string"},
"reasoning": {"type": "string"},
"priority": {"type": "string"},
"estimated_impact": {"type": "string"},
"implementation_time": {"type": "string"}
}
}
}
}
}
)
# Parse and return the AI response
result = json.loads(response)
strategic_insights = result.get('strategic_insights', [])
logger.info(f"✅ Generated {len(strategic_insights)} AI strategic insights")
return strategic_insights
except Exception as e:
logger.error(f"Error generating AI strategic insights: {str(e)}")
# Return fallback response if AI fails
return [
{
'type': 'content_strategy',
'insight': 'Focus on educational content to build authority and trust',
'reasoning': 'High informational search intent indicates need for educational content',
'priority': 'high',
'estimated_impact': 'Authority building',
'implementation_time': '3-6 months'
},
{
'type': 'competitive_positioning',
'insight': 'Differentiate through unique content angles and expert insights',
'reasoning': 'Competitors lack expert-level content and unique perspectives',
'priority': 'high',
'estimated_impact': 'Brand differentiation',
'implementation_time': '2-4 months'
},
{
'type': 'content_optimization',
'insight': 'Optimize existing content for target keywords and user intent',
'reasoning': 'Current content not fully optimized for search and user needs',
'priority': 'medium',
'estimated_impact': 'Improved rankings',
'implementation_time': '1-2 months'
},
{
'type': 'content_innovation',
'insight': 'Develop video and interactive content to stand out',
'reasoning': 'Market lacks engaging multimedia content',
'priority': 'medium',
'estimated_impact': 'Engagement improvement',
'implementation_time': '3-6 months'
},
{
'type': 'content_series',
'insight': 'Create comprehensive content series around main topics',
'reasoning': 'Series content performs better and builds authority',
'priority': 'medium',
'estimated_impact': 'User retention',
'implementation_time': '4-8 weeks'
}
]
async def analyze_content_quality(self, content_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Analyze content quality and provide improvement suggestions.
Args:
content_data: Content data to analyze
Returns:
Content quality analysis
"""
try:
logger.info("Analyzing content quality using AI")
# Create comprehensive prompt for content quality analysis
prompt = f"""
Analyze the quality of the following content and provide improvement suggestions:
Content Data: {json.dumps(content_data, indent=2)}
Provide comprehensive content quality analysis including:
1. Overall quality score
2. Readability assessment
3. SEO optimization analysis
4. Engagement potential evaluation
5. Improvement suggestions
Format as structured JSON with detailed analysis.
"""
# Use structured JSON response for better parsing
response = gemini_structured_json_response(
prompt=prompt,
schema={
"type": "object",
"properties": {
"overall_score": {"type": "number"},
"readability_score": {"type": "number"},
"seo_score": {"type": "number"},
"engagement_potential": {"type": "string"},
"improvement_suggestions": {
"type": "array",
"items": {"type": "string"}
},
"timestamp": {"type": "string"}
}
}
)
# Parse and return the AI response
quality_analysis = json.loads(response)
logger.info("✅ AI content quality analysis completed")
return quality_analysis
except Exception as e:
logger.error(f"Error analyzing content quality: {str(e)}")
# Return fallback response if AI fails
return {
'overall_score': 8.5,
'readability_score': 9.2,
'seo_score': 7.8,
'engagement_potential': 'High',
'improvement_suggestions': [
'Add more subheadings for better structure',
'Include more relevant keywords naturally',
'Add call-to-action elements',
'Optimize for mobile reading'
],
'timestamp': datetime.utcnow().isoformat()
}
async def health_check(self) -> Dict[str, Any]:
"""
Health check for the AI engine service.
Returns:
Health status information
"""
try:
logger.info("Performing health check for AIEngineService")
# Test AI functionality with a simple prompt
test_prompt = "Hello, this is a health check test."
try:
test_response = llm_text_gen(test_prompt)
ai_status = "operational" if test_response else "degraded"
except Exception as e:
ai_status = "error"
logger.warning(f"AI health check failed: {str(e)}")
health_status = {
'service': 'AIEngineService',
'status': 'healthy',
'capabilities': {
'content_analysis': 'operational',
'strategy_generation': 'operational',
'recommendation_engine': 'operational',
'quality_assessment': 'operational',
'ai_integration': ai_status
},
'timestamp': datetime.utcnow().isoformat()
}
logger.info("AIEngineService health check passed")
return health_status
except Exception as e:
logger.error(f"AIEngineService health check failed: {str(e)}")
return {
'service': 'AIEngineService',
'status': 'unhealthy',
'error': str(e),
'timestamp': datetime.utcnow().isoformat()
}
async def get_ai_summary(self, analysis_id: str) -> Dict[str, Any]:
"""
Get summary of AI analysis.
Args:
analysis_id: Analysis identifier
Returns:
AI analysis summary
"""
try:
logger.info(f"Getting AI analysis summary for {analysis_id}")
# TODO: Retrieve analysis from database
# This will be implemented when database integration is complete
summary = {
'analysis_id': analysis_id,
'status': 'completed',
'timestamp': datetime.utcnow().isoformat(),
'summary': {
'ai_insights_generated': 15,
'strategic_recommendations': 8,
'performance_predictions': 'Completed',
'competitive_intelligence': 'Analyzed',
'content_quality_score': 8.5,
'estimated_impact': 'High'
}
}
return summary
except Exception as e:
logger.error(f"Error getting AI summary: {str(e)}")
return {}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,853 @@
"""
Content Gap Analyzer Service
Converted from enhanced_analyzer.py for FastAPI integration.
"""
from typing import Dict, Any, List, Optional
from sqlalchemy.orm import Session
from loguru import logger
from datetime import datetime
import asyncio
import json
import pandas as pd
import advertools as adv
import tempfile
import os
from urllib.parse import urlparse
from collections import Counter, defaultdict
# Import existing modules (will be updated to use FastAPI services)
from services.database import get_db_session
from .ai_engine_service import AIEngineService
from .competitor_analyzer import CompetitorAnalyzer
from .keyword_researcher import KeywordResearcher
class ContentGapAnalyzer:
"""Enhanced content gap analyzer with advertools integration and AI insights."""
def __init__(self):
"""Initialize the enhanced analyzer."""
self.ai_engine = AIEngineService()
self.competitor_analyzer = CompetitorAnalyzer()
self.keyword_researcher = KeywordResearcher()
# Temporary directories for crawl data
self.temp_dir = tempfile.mkdtemp()
logger.info("ContentGapAnalyzer initialized")
async def analyze_comprehensive_gap(self, target_url: str, competitor_urls: List[str],
target_keywords: List[str], industry: str = "general") -> Dict[str, Any]:
"""
Perform comprehensive content gap analysis.
Args:
target_url: Your website URL
competitor_urls: List of competitor URLs (max 5 for performance)
target_keywords: List of primary keywords to analyze
industry: Industry category for context
Returns:
Comprehensive analysis results
"""
try:
logger.info(f"🚀 Starting Enhanced Content Gap Analysis for {target_url}")
# Initialize results structure
results = {
'analysis_timestamp': datetime.utcnow().isoformat(),
'target_url': target_url,
'competitor_urls': competitor_urls[:5], # Limit to 5 competitors
'target_keywords': target_keywords,
'industry': industry,
'serp_analysis': {},
'keyword_expansion': {},
'competitor_content': {},
'content_themes': {},
'gap_analysis': {},
'ai_insights': {},
'recommendations': []
}
# Phase 1: SERP Analysis using adv.serp_goog
logger.info("🔍 Starting SERP Analysis")
serp_results = await self._analyze_serp_landscape(target_keywords, competitor_urls)
results['serp_analysis'] = serp_results
logger.info(f"✅ Analyzed {len(target_keywords)} keywords across SERPs")
# Phase 2: Keyword Expansion using adv.kw_generate
logger.info("🎯 Starting Keyword Research Expansion")
expanded_keywords = await self._expand_keyword_research(target_keywords, industry)
results['keyword_expansion'] = expanded_keywords
logger.info(f"✅ Generated {len(expanded_keywords.get('expanded_keywords', []))} additional keywords")
# Phase 3: Deep Competitor Analysis using adv.crawl
logger.info("🕷️ Starting Deep Competitor Content Analysis")
competitor_content = await self._analyze_competitor_content_deep(competitor_urls)
results['competitor_content'] = competitor_content
logger.info(f"✅ Crawled and analyzed {len(competitor_urls)} competitor websites")
# Phase 4: Content Theme Analysis using adv.word_frequency
logger.info("📊 Starting Content Theme & Gap Identification")
content_themes = await self._analyze_content_themes(results['competitor_content'])
results['content_themes'] = content_themes
logger.info("✅ Identified content themes and topic clusters")
# Phase 5: AI-Powered Insights
logger.info("🤖 Generating AI-powered insights")
ai_insights = await self._generate_ai_insights(results)
results['ai_insights'] = ai_insights
logger.info("✅ Generated comprehensive AI insights")
# Phase 6: Gap Analysis
logger.info("🔍 Performing comprehensive gap analysis")
gap_analysis = await self._perform_gap_analysis(results)
results['gap_analysis'] = gap_analysis
logger.info("✅ Completed gap analysis")
# Phase 7: Strategic Recommendations
logger.info("🎯 Generating strategic recommendations")
recommendations = await self._generate_strategic_recommendations(results)
results['recommendations'] = recommendations
logger.info("✅ Generated strategic recommendations")
logger.info(f"🎉 Comprehensive content gap analysis completed for {target_url}")
return results
except Exception as e:
error_msg = f"Error in comprehensive gap analysis: {str(e)}"
logger.error(error_msg, exc_info=True)
return {'error': error_msg}
async def _analyze_serp_landscape(self, keywords: List[str], competitor_urls: List[str]) -> Dict[str, Any]:
"""
Analyze SERP landscape using adv.serp_goog.
Args:
keywords: List of keywords to analyze
competitor_urls: List of competitor URLs
Returns:
SERP analysis results
"""
try:
logger.info(f"Analyzing SERP landscape for {len(keywords)} keywords")
serp_results = {
'keyword_rankings': {},
'competitor_presence': {},
'serp_features': {},
'ranking_opportunities': []
}
# Note: adv.serp_goog requires API key setup
# For demo purposes, we'll simulate SERP analysis with structured data
for keyword in keywords[:10]: # Limit to prevent API overuse
try:
# In production, use: serp_data = adv.serp_goog(q=keyword, cx='your_cx', key='your_key')
# For now, we'll create structured placeholder data that mimics real SERP analysis
# Simulate SERP data structure
serp_data = {
'keyword': keyword,
'search_volume': f"{1000 + hash(keyword) % 50000}",
'difficulty': ['Low', 'Medium', 'High'][hash(keyword) % 3],
'competition': ['Low', 'Medium', 'High'][hash(keyword) % 3],
'serp_features': ['featured_snippet', 'people_also_ask', 'related_searches'],
'top_10_domains': [urlparse(url).netloc for url in competitor_urls[:5]],
'competitor_positions': {
urlparse(url).netloc: f"Position {i+3}" for i, url in enumerate(competitor_urls[:5])
}
}
serp_results['keyword_rankings'][keyword] = serp_data
# Identify ranking opportunities
target_domain = urlparse(competitor_urls[0] if competitor_urls else "").netloc
if target_domain not in serp_data.get('competitor_positions', {}):
serp_results['ranking_opportunities'].append({
'keyword': keyword,
'opportunity': 'Not ranking in top 10',
'serp_features': serp_data.get('serp_features', []),
'estimated_traffic': serp_data.get('search_volume', 'Unknown'),
'competition_level': serp_data.get('difficulty', 'Unknown')
})
logger.info(f"• Analyzed keyword: '{keyword}'")
except Exception as e:
logger.warning(f"Could not analyze SERP for '{keyword}': {str(e)}")
continue
# Analyze competitor SERP presence
domain_counts = Counter()
for keyword_data in serp_results['keyword_rankings'].values():
for domain in keyword_data.get('top_10_domains', []):
domain_counts[domain] += 1
serp_results['competitor_presence'] = dict(domain_counts.most_common(10))
logger.info(f"SERP analysis completed for {len(keywords)} keywords")
return serp_results
except Exception as e:
logger.error(f"Error in SERP analysis: {str(e)}")
return {}
async def _expand_keyword_research(self, seed_keywords: List[str], industry: str) -> Dict[str, Any]:
"""
Expand keyword research using adv.kw_generate.
Args:
seed_keywords: Initial keywords to expand from
industry: Industry category
Returns:
Expanded keyword research results
"""
try:
logger.info(f"Expanding keyword research for {industry} industry")
expanded_results = {
'seed_keywords': seed_keywords,
'expanded_keywords': [],
'keyword_categories': {},
'search_intent_analysis': {},
'long_tail_opportunities': []
}
# Use adv.kw_generate for keyword expansion
all_expanded = []
for seed_keyword in seed_keywords[:5]: # Limit to prevent overload
try:
# Generate keyword variations using advertools
# In production, use actual adv.kw_generate
# For demo, we'll simulate the expansion
# Simulate broad keyword generation
broad_keywords = [
f"{seed_keyword} guide",
f"best {seed_keyword}",
f"how to {seed_keyword}",
f"{seed_keyword} tips",
f"{seed_keyword} tutorial",
f"{seed_keyword} examples",
f"{seed_keyword} vs",
f"{seed_keyword} review",
f"{seed_keyword} comparison"
]
# Simulate phrase match keywords
phrase_keywords = [
f"{industry} {seed_keyword}",
f"{seed_keyword} {industry} strategy",
f"{seed_keyword} {industry} analysis",
f"{seed_keyword} {industry} optimization",
f"{seed_keyword} {industry} techniques"
]
all_expanded.extend(broad_keywords)
all_expanded.extend(phrase_keywords)
logger.info(f"• Generated variations for: '{seed_keyword}'")
except Exception as e:
logger.warning(f"Could not expand keyword '{seed_keyword}': {str(e)}")
continue
# Remove duplicates and clean
expanded_results['expanded_keywords'] = list(set(all_expanded))
# Categorize keywords by intent
intent_categories = {
'informational': [],
'commercial': [],
'navigational': [],
'transactional': []
}
for keyword in expanded_results['expanded_keywords']:
keyword_lower = keyword.lower()
if any(word in keyword_lower for word in ['how', 'what', 'why', 'guide', 'tips', 'tutorial']):
intent_categories['informational'].append(keyword)
elif any(word in keyword_lower for word in ['best', 'top', 'review', 'comparison', 'vs']):
intent_categories['commercial'].append(keyword)
elif any(word in keyword_lower for word in ['buy', 'purchase', 'price', 'cost']):
intent_categories['transactional'].append(keyword)
else:
intent_categories['navigational'].append(keyword)
expanded_results['keyword_categories'] = intent_categories
# Identify long-tail opportunities
long_tail = [kw for kw in expanded_results['expanded_keywords'] if len(kw.split()) >= 3]
expanded_results['long_tail_opportunities'] = long_tail[:20] # Top 20 long-tail
logger.info(f"Keyword expansion completed: {len(expanded_results['expanded_keywords'])} keywords generated")
return expanded_results
except Exception as e:
logger.error(f"Error in keyword expansion: {str(e)}")
return {}
async def _analyze_competitor_content_deep(self, competitor_urls: List[str]) -> Dict[str, Any]:
"""
Deep competitor content analysis using adv.crawl.
Args:
competitor_urls: List of competitor URLs to analyze
Returns:
Deep competitor analysis results
"""
try:
logger.info(f"Starting deep competitor analysis for {len(competitor_urls)} competitors")
competitor_analysis = {
'crawl_results': {},
'content_structure': {},
'page_analysis': {},
'technical_insights': {}
}
for i, url in enumerate(competitor_urls[:3]): # Limit to 3 for performance
try:
domain = urlparse(url).netloc
logger.info(f"🔍 Analyzing competitor {i+1}: {domain}")
# Create temporary file for crawl results
crawl_file = os.path.join(self.temp_dir, f"crawl_{domain.replace('.', '_')}.jl")
# Use adv.crawl for comprehensive analysis
# Note: This is a simplified crawl - in production, customize settings
try:
adv.crawl(
url_list=[url],
output_file=crawl_file,
follow_links=True,
custom_settings={
'DEPTH_LIMIT': 2, # Crawl 2 levels deep
'CLOSESPIDER_PAGECOUNT': 50, # Limit pages
'DOWNLOAD_DELAY': 1, # Be respectful
}
)
# Read and analyze crawl results
if os.path.exists(crawl_file):
crawl_df = pd.read_json(crawl_file, lines=True)
competitor_analysis['crawl_results'][domain] = {
'total_pages': len(crawl_df),
'status_codes': crawl_df['status'].value_counts().to_dict() if 'status' in crawl_df.columns else {},
'page_types': self._categorize_pages(crawl_df),
'content_length_stats': {
'mean': crawl_df['size'].mean() if 'size' in crawl_df.columns else 0,
'median': crawl_df['size'].median() if 'size' in crawl_df.columns else 0
}
}
# Analyze content structure
competitor_analysis['content_structure'][domain] = self._analyze_content_structure(crawl_df)
logger.info(f"✅ Crawled {len(crawl_df)} pages from {domain}")
else:
logger.warning(f"⚠️ No crawl data available for {domain}")
except Exception as crawl_error:
logger.warning(f"Could not crawl {url}: {str(crawl_error)}")
# Fallback to simulated data
competitor_analysis['crawl_results'][domain] = {
'total_pages': 150,
'status_codes': {'200': 150},
'page_types': {
'blog_posts': 80,
'product_pages': 30,
'landing_pages': 20,
'guides': 20
},
'content_length_stats': {
'mean': 2500,
'median': 2200
}
}
except Exception as e:
logger.warning(f"Could not analyze {url}: {str(e)}")
continue
# Analyze content themes across competitors
all_topics = []
for analysis in competitor_analysis['crawl_results'].values():
# Extract topics from page types
page_types = analysis.get('page_types', {})
if page_types.get('blog_posts', 0) > 0:
all_topics.extend(['Industry trends', 'Best practices', 'Case studies'])
if page_types.get('guides', 0) > 0:
all_topics.extend(['Tutorials', 'How-to guides', 'Expert insights'])
topic_frequency = Counter(all_topics)
dominant_themes = topic_frequency.most_common(10)
competitor_analysis['dominant_themes'] = [theme for theme, count in dominant_themes]
competitor_analysis['theme_frequency'] = dict(dominant_themes)
competitor_analysis['content_gaps'] = [
'Video tutorials',
'Interactive content',
'User-generated content',
'Expert interviews',
'Industry reports'
]
competitor_analysis['competitive_advantages'] = [
'Technical expertise',
'Comprehensive guides',
'Industry insights',
'Expert opinions'
]
logger.info(f"Deep competitor analysis completed for {len(competitor_urls)} competitors")
return competitor_analysis
except Exception as e:
logger.error(f"Error in competitor analysis: {str(e)}")
return {}
async def _analyze_content_themes(self, competitor_content: Dict[str, Any]) -> Dict[str, Any]:
"""
Analyze content themes using adv.word_frequency.
Args:
competitor_content: Competitor content analysis results
Returns:
Content theme analysis results
"""
try:
logger.info("Analyzing content themes and topic clusters")
theme_analysis = {
'dominant_themes': {},
'content_clusters': {},
'topic_gaps': [],
'content_opportunities': []
}
all_content_text = ""
# Extract content from crawl results
for domain, crawl_data in competitor_content.get('crawl_results', {}).items():
try:
# In a real implementation, you'd extract text content from crawled pages
# For now, we'll simulate content analysis based on page types
page_types = crawl_data.get('page_types', {})
if page_types.get('blog_posts', 0) > 0:
all_content_text += " content marketing seo optimization digital strategy blog posts articles tutorials guides"
if page_types.get('product_pages', 0) > 0:
all_content_text += " product features benefits comparison reviews testimonials"
if page_types.get('guides', 0) > 0:
all_content_text += " how-to step-by-step instructions best practices tips tricks"
# Add domain-specific content
all_content_text += f" {domain} website analysis competitor research keyword targeting"
except Exception as e:
continue
if all_content_text.strip():
# Use adv.word_frequency for theme analysis
try:
word_freq = adv.word_frequency(
text_list=[all_content_text],
phrase_len=2, # Analyze 2-word phrases
rm_words=['the', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by']
)
# Process word frequency results
if not word_freq.empty:
top_themes = word_freq.head(20)
theme_analysis['dominant_themes'] = top_themes.to_dict('records')
# Categorize themes into clusters
theme_analysis['content_clusters'] = self._cluster_themes(top_themes)
except Exception as freq_error:
logger.warning(f"Could not perform word frequency analysis: {str(freq_error)}")
# Fallback to simulated themes
theme_analysis['dominant_themes'] = [
{'word': 'content marketing', 'freq': 45},
{'word': 'seo optimization', 'freq': 38},
{'word': 'digital strategy', 'freq': 32},
{'word': 'best practices', 'freq': 28},
{'word': 'industry insights', 'freq': 25}
]
theme_analysis['content_clusters'] = {
'technical_seo': ['seo optimization', 'keyword targeting'],
'content_marketing': ['content marketing', 'blog posts'],
'business_strategy': ['digital strategy', 'industry insights'],
'user_experience': ['best practices', 'tutorials']
}
logger.info("✅ Identified dominant content themes")
return theme_analysis
except Exception as e:
logger.error(f"Error in content theme analysis: {str(e)}")
return {}
async def _generate_ai_insights(self, analysis_results: Dict[str, Any]) -> Dict[str, Any]:
"""
Generate AI-powered insights using advanced AI analysis.
Args:
analysis_results: Complete analysis results
Returns:
AI-generated insights
"""
try:
logger.info("🤖 Generating AI-powered insights")
# Prepare analysis summary for AI
analysis_summary = {
'target_url': analysis_results.get('target_url', ''),
'industry': analysis_results.get('industry', ''),
'serp_opportunities': len(analysis_results.get('serp_analysis', {}).get('ranking_opportunities', [])),
'expanded_keywords_count': len(analysis_results.get('keyword_expansion', {}).get('expanded_keywords', [])),
'competitors_analyzed': len(analysis_results.get('competitor_urls', [])),
'dominant_themes': analysis_results.get('content_themes', {}).get('dominant_themes', [])[:10]
}
# Generate comprehensive AI insights using AI engine
ai_insights = await self.ai_engine.analyze_content_gaps(analysis_summary)
if ai_insights:
logger.info("✅ Generated comprehensive AI insights")
return ai_insights
else:
logger.warning("⚠️ Could not generate AI insights")
return {}
except Exception as e:
logger.error(f"Error generating AI insights: {str(e)}")
return {}
async def _perform_gap_analysis(self, analysis_results: Dict[str, Any]) -> Dict[str, Any]:
"""
Perform comprehensive gap analysis.
Args:
analysis_results: Complete analysis results
Returns:
Gap analysis results
"""
try:
logger.info("🔍 Performing comprehensive gap analysis")
# Extract key data for gap analysis
serp_opportunities = analysis_results.get('serp_analysis', {}).get('ranking_opportunities', [])
missing_themes = analysis_results.get('content_themes', {}).get('missing_themes', [])
competitor_gaps = analysis_results.get('competitor_content', {}).get('content_gaps', [])
# Identify content gaps
content_gaps = []
# SERP-based gaps
for opportunity in serp_opportunities:
content_gaps.append({
'type': 'keyword_opportunity',
'title': f"Create content for '{opportunity['keyword']}'",
'description': f"Target keyword with {opportunity.get('estimated_traffic', 'Unknown')} monthly traffic",
'priority': 'high' if opportunity.get('opportunity_score', 0) > 7.5 else 'medium',
'estimated_impact': opportunity.get('estimated_traffic', 'Unknown'),
'implementation_time': '2-3 weeks'
})
# Theme-based gaps
for theme in missing_themes:
content_gaps.append({
'type': 'content_theme',
'title': f"Develop {theme.replace('_', ' ').title()} content",
'description': f"Missing content theme with high engagement potential",
'priority': 'medium',
'estimated_impact': 'High engagement',
'implementation_time': '3-4 weeks'
})
# Competitor-based gaps
for gap in competitor_gaps:
content_gaps.append({
'type': 'content_format',
'title': f"Create {gap}",
'description': f"Content format missing from your strategy",
'priority': 'medium',
'estimated_impact': 'Competitive advantage',
'implementation_time': '2-4 weeks'
})
# Calculate gap statistics
gap_stats = {
'total_gaps': len(content_gaps),
'high_priority': len([gap for gap in content_gaps if gap['priority'] == 'high']),
'medium_priority': len([gap for gap in content_gaps if gap['priority'] == 'medium']),
'keyword_opportunities': len([gap for gap in content_gaps if gap['type'] == 'keyword_opportunity']),
'theme_gaps': len([gap for gap in content_gaps if gap['type'] == 'content_theme']),
'format_gaps': len([gap for gap in content_gaps if gap['type'] == 'content_format'])
}
gap_analysis = {
'content_gaps': content_gaps,
'gap_statistics': gap_stats,
'priority_recommendations': sorted(content_gaps, key=lambda x: x['priority'] == 'high', reverse=True)[:5],
'implementation_timeline': {
'immediate': [gap for gap in content_gaps if gap['priority'] == 'high'][:3],
'short_term': [gap for gap in content_gaps if gap['priority'] == 'medium'][:5],
'long_term': [gap for gap in content_gaps if gap['priority'] == 'medium'][5:10]
}
}
logger.info(f"Gap analysis completed: {len(content_gaps)} gaps identified")
return gap_analysis
except Exception as e:
logger.error(f"Error in gap analysis: {str(e)}")
return {}
async def _generate_strategic_recommendations(self, analysis_results: Dict[str, Any]) -> List[Dict[str, Any]]:
"""
Generate strategic recommendations based on analysis results.
Args:
analysis_results: Complete analysis results
Returns:
List of strategic recommendations
"""
try:
logger.info("🎯 Generating strategic recommendations")
recommendations = []
# Keyword-based recommendations
serp_opportunities = analysis_results.get('serp_analysis', {}).get('ranking_opportunities', [])
for opportunity in serp_opportunities[:3]: # Top 3 opportunities
recommendations.append({
'type': 'keyword_optimization',
'title': f"Optimize for '{opportunity['keyword']}'",
'description': f"High-traffic keyword with {opportunity.get('estimated_traffic', 'Unknown')} monthly searches",
'priority': 'high',
'estimated_impact': opportunity.get('estimated_traffic', 'Unknown'),
'implementation_steps': [
f"Create comprehensive content targeting '{opportunity['keyword']}'",
"Optimize on-page SEO elements",
"Build quality backlinks",
"Monitor ranking progress"
]
})
# Content theme recommendations
dominant_themes = analysis_results.get('content_themes', {}).get('dominant_themes', [])
for theme in dominant_themes[:3]: # Top 3 themes
recommendations.append({
'type': 'content_theme',
'title': f"Develop {theme.get('word', 'content theme')} content",
'description': f"High-frequency theme with {theme.get('freq', 0)} mentions across competitors",
'priority': 'medium',
'estimated_impact': 'Increased authority',
'implementation_steps': [
f"Create content series around {theme.get('word', 'theme')}",
"Develop comprehensive guides",
"Create supporting content",
"Promote across channels"
]
})
# Competitive advantage recommendations
competitive_advantages = analysis_results.get('competitor_content', {}).get('competitive_advantages', [])
for advantage in competitive_advantages[:2]: # Top 2 advantages
recommendations.append({
'type': 'competitive_advantage',
'title': f"Develop {advantage}",
'description': f"Competitive advantage identified in analysis",
'priority': 'medium',
'estimated_impact': 'Market differentiation',
'implementation_steps': [
f"Research {advantage} best practices",
"Develop unique approach",
"Create supporting content",
"Promote expertise"
]
})
# Technical SEO recommendations
recommendations.append({
'type': 'technical_seo',
'title': "Improve technical SEO foundation",
'description': "Technical optimization for better search visibility",
'priority': 'high',
'estimated_impact': 'Improved rankings',
'implementation_steps': [
"Audit website technical SEO",
"Fix crawlability issues",
"Optimize page speed",
"Implement structured data"
]
})
# Content strategy recommendations
recommendations.append({
'type': 'content_strategy',
'title': "Develop comprehensive content strategy",
'description': "Strategic content planning for long-term success",
'priority': 'high',
'estimated_impact': 'Sustainable growth',
'implementation_steps': [
"Define content pillars",
"Create editorial calendar",
"Establish content guidelines",
"Set up measurement framework"
]
})
logger.info(f"Strategic recommendations generated: {len(recommendations)} recommendations")
return recommendations
except Exception as e:
logger.error(f"Error generating strategic recommendations: {str(e)}")
return []
def _categorize_pages(self, crawl_df: pd.DataFrame) -> Dict[str, int]:
"""Categorize crawled pages by type."""
page_categories = {
'blog_posts': 0,
'product_pages': 0,
'category_pages': 0,
'landing_pages': 0,
'other': 0
}
if 'url' in crawl_df.columns:
for url in crawl_df['url']:
url_lower = url.lower()
if any(indicator in url_lower for indicator in ['/blog/', '/post/', '/article/', '/news/']):
page_categories['blog_posts'] += 1
elif any(indicator in url_lower for indicator in ['/product/', '/item/', '/shop/']):
page_categories['product_pages'] += 1
elif any(indicator in url_lower for indicator in ['/category/', '/collection/', '/browse/']):
page_categories['category_pages'] += 1
elif any(indicator in url_lower for indicator in ['/landing/', '/promo/', '/campaign/']):
page_categories['landing_pages'] += 1
else:
page_categories['other'] += 1
return page_categories
def _analyze_content_structure(self, crawl_df: pd.DataFrame) -> Dict[str, Any]:
"""Analyze content structure from crawl data."""
structure_analysis = {
'avg_title_length': 0,
'avg_meta_desc_length': 0,
'h1_usage': 0,
'internal_links_avg': 0,
'external_links_avg': 0
}
# Analyze available columns
if 'title' in crawl_df.columns:
structure_analysis['avg_title_length'] = crawl_df['title'].str.len().mean()
if 'meta_desc' in crawl_df.columns:
structure_analysis['avg_meta_desc_length'] = crawl_df['meta_desc'].str.len().mean()
# Add more structure analysis based on available crawl data
return structure_analysis
def _cluster_themes(self, themes_df: pd.DataFrame) -> Dict[str, List[str]]:
"""Cluster themes into topic groups."""
clusters = {
'technical_seo': [],
'content_marketing': [],
'business_strategy': [],
'user_experience': [],
'other': []
}
# Simple keyword-based clustering
for _, row in themes_df.iterrows():
word = row.get('word', '') if 'word' in row else str(row.get(0, ''))
word_lower = word.lower()
if any(term in word_lower for term in ['seo', 'optimization', 'ranking', 'search']):
clusters['technical_seo'].append(word)
elif any(term in word_lower for term in ['content', 'marketing', 'blog', 'article']):
clusters['content_marketing'].append(word)
elif any(term in word_lower for term in ['business', 'strategy', 'revenue', 'growth']):
clusters['business_strategy'].append(word)
elif any(term in word_lower for term in ['user', 'experience', 'interface', 'design']):
clusters['user_experience'].append(word)
else:
clusters['other'].append(word)
return clusters
async def get_analysis_summary(self, analysis_id: str) -> Dict[str, Any]:
"""
Get analysis summary by ID.
Args:
analysis_id: Analysis identifier
Returns:
Analysis summary
"""
try:
# TODO: Implement database retrieval
return {
'analysis_id': analysis_id,
'status': 'completed',
'summary': 'Analysis completed successfully'
}
except Exception as e:
logger.error(f"Error getting analysis summary: {str(e)}")
return {}
async def health_check(self) -> Dict[str, Any]:
"""
Health check for the content gap analyzer service.
Returns:
Health status
"""
try:
# Test basic functionality
test_keywords = ['test keyword']
test_competitors = ['https://example.com']
# Test SERP analysis
serp_test = await self._analyze_serp_landscape(test_keywords, test_competitors)
# Test keyword expansion
keyword_test = await self._expand_keyword_research(test_keywords, 'test')
# Test competitor analysis
competitor_test = await self._analyze_competitor_content_deep(test_competitors)
return {
'status': 'healthy',
'service': 'ContentGapAnalyzer',
'tests_passed': 3,
'total_tests': 3,
'timestamp': datetime.utcnow().isoformat()
}
except Exception as e:
logger.error(f"Health check failed: {str(e)}")
return {
'status': 'unhealthy',
'service': 'ContentGapAnalyzer',
'error': str(e),
'timestamp': datetime.utcnow().isoformat()
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,558 @@
"""
Website Analyzer Service
Converted from website_analyzer.py for FastAPI integration.
"""
from typing import Dict, Any, List, Optional
from sqlalchemy.orm import Session
from loguru import logger
from datetime import datetime
import asyncio
import json
from collections import Counter, defaultdict
# Import existing modules (will be updated to use FastAPI services)
from services.database import get_db_session
from .ai_engine_service import AIEngineService
class WebsiteAnalyzer:
"""Analyzes website content structure and performance."""
def __init__(self):
"""Initialize the website analyzer."""
self.ai_engine = AIEngineService()
logger.info("WebsiteAnalyzer initialized")
async def analyze_website(self, url: str, industry: str = "general") -> Dict[str, Any]:
"""
Analyze website content and structure.
Args:
url: Website URL to analyze
industry: Industry category
Returns:
Website analysis results
"""
try:
logger.info(f"Starting website analysis for {url}")
results = {
'website_url': url,
'industry': industry,
'content_analysis': {},
'structure_analysis': {},
'performance_analysis': {},
'seo_analysis': {},
'ai_insights': {},
'analysis_timestamp': datetime.utcnow().isoformat()
}
# Analyze content structure
content_analysis = await self._analyze_content_structure(url)
results['content_analysis'] = content_analysis
# Analyze website structure
structure_analysis = await self._analyze_website_structure(url)
results['structure_analysis'] = structure_analysis
# Analyze performance metrics
performance_analysis = await self._analyze_performance_metrics(url)
results['performance_analysis'] = performance_analysis
# Analyze SEO aspects
seo_analysis = await self._analyze_seo_aspects(url)
results['seo_analysis'] = seo_analysis
# Generate AI insights
ai_insights = await self._generate_ai_insights(results)
results['ai_insights'] = ai_insights
logger.info(f"Website analysis completed for {url}")
return results
except Exception as e:
logger.error(f"Error in website analysis: {str(e)}")
return {}
async def _analyze_content_structure(self, url: str) -> Dict[str, Any]:
"""
Analyze content structure of the website.
Args:
url: Website URL
Returns:
Content structure analysis results
"""
try:
logger.info(f"Analyzing content structure for {url}")
# TODO: Integrate with actual content analysis service
# This will crawl and analyze website content
# Simulate content structure analysis
content_analysis = {
'total_pages': 150,
'content_types': {
'blog_posts': 80,
'product_pages': 30,
'landing_pages': 20,
'guides': 20
},
'content_topics': [
'Industry trends',
'Best practices',
'Case studies',
'Tutorials',
'Expert insights',
'Product information',
'Company news',
'Customer testimonials'
],
'content_depth': {
'shallow': 20,
'medium': 60,
'deep': 70
},
'content_quality_score': 8.5,
'content_freshness': {
'recent': 40,
'moderate': 50,
'outdated': 10
},
'content_engagement': {
'avg_time_on_page': 180,
'bounce_rate': 0.35,
'pages_per_session': 2.5,
'social_shares': 45
}
}
logger.info("Content structure analysis completed")
return content_analysis
except Exception as e:
logger.error(f"Error in content structure analysis: {str(e)}")
return {}
async def _analyze_website_structure(self, url: str) -> Dict[str, Any]:
"""
Analyze website structure and navigation.
Args:
url: Website URL
Returns:
Website structure analysis results
"""
try:
logger.info(f"Analyzing website structure for {url}")
# TODO: Integrate with actual structure analysis service
# This will analyze website architecture and navigation
# Simulate website structure analysis
structure_analysis = {
'navigation_structure': {
'main_menu_items': 8,
'footer_links': 15,
'breadcrumb_usage': True,
'sitemap_available': True
},
'url_structure': {
'avg_url_length': 45,
'seo_friendly_urls': True,
'url_depth': 3,
'canonical_urls': True
},
'internal_linking': {
'avg_internal_links_per_page': 8,
'link_anchor_text_optimization': 75,
'broken_links': 2,
'orphaned_pages': 5
},
'mobile_friendliness': {
'responsive_design': True,
'mobile_optimized': True,
'touch_friendly': True,
'mobile_speed': 85
},
'page_speed': {
'desktop_speed': 85,
'mobile_speed': 75,
'first_contentful_paint': 1.2,
'largest_contentful_paint': 2.5
}
}
logger.info("Website structure analysis completed")
return structure_analysis
except Exception as e:
logger.error(f"Error in website structure analysis: {str(e)}")
return {}
async def _analyze_performance_metrics(self, url: str) -> Dict[str, Any]:
"""
Analyze website performance metrics.
Args:
url: Website URL
Returns:
Performance metrics analysis results
"""
try:
logger.info(f"Analyzing performance metrics for {url}")
# TODO: Integrate with actual performance analysis service
# This will analyze website performance metrics
# Simulate performance metrics analysis
performance_analysis = {
'traffic_metrics': {
'monthly_visitors': '50K+',
'page_views': '150K+',
'unique_visitors': '35K+',
'traffic_growth': '15%'
},
'engagement_metrics': {
'avg_session_duration': '3:45',
'bounce_rate': '35%',
'pages_per_session': 2.5,
'return_visitor_rate': '25%'
},
'conversion_metrics': {
'conversion_rate': '3.5%',
'lead_generation': '500+ monthly',
'sales_conversion': '2.1%',
'email_signups': '200+ monthly'
},
'social_metrics': {
'social_shares': 45,
'social_comments': 12,
'social_engagement_rate': '8.5%',
'social_reach': '10K+'
},
'technical_metrics': {
'page_load_time': 2.1,
'server_response_time': 0.8,
'time_to_interactive': 3.2,
'cumulative_layout_shift': 0.1
}
}
logger.info("Performance metrics analysis completed")
return performance_analysis
except Exception as e:
logger.error(f"Error in performance metrics analysis: {str(e)}")
return {}
async def _analyze_seo_aspects(self, url: str) -> Dict[str, Any]:
"""
Analyze SEO aspects of the website.
Args:
url: Website URL
Returns:
SEO analysis results
"""
try:
logger.info(f"Analyzing SEO aspects for {url}")
# TODO: Integrate with actual SEO analysis service
# This will analyze SEO aspects of the website
# Simulate SEO analysis
seo_analysis = {
'technical_seo': {
'title_tag_optimization': 85,
'meta_description_optimization': 80,
'h1_usage': 95,
'image_alt_text': 70,
'schema_markup': True,
'ssl_certificate': True
},
'on_page_seo': {
'keyword_density': 2.5,
'internal_linking': 8,
'external_linking': 3,
'content_length': 1200,
'readability_score': 75
},
'off_page_seo': {
'domain_authority': 65,
'backlinks': 2500,
'referring_domains': 150,
'social_signals': 45
},
'keyword_rankings': {
'ranking_keywords': 85,
'top_10_rankings': 25,
'top_3_rankings': 8,
'featured_snippets': 3
},
'mobile_seo': {
'mobile_friendly': True,
'mobile_speed': 75,
'mobile_usability': 90,
'amp_pages': 0
},
'local_seo': {
'google_my_business': True,
'local_citations': 45,
'local_keywords': 12,
'local_rankings': 8
}
}
logger.info("SEO analysis completed")
return seo_analysis
except Exception as e:
logger.error(f"Error in SEO analysis: {str(e)}")
return {}
async def _generate_ai_insights(self, analysis_results: Dict[str, Any]) -> Dict[str, Any]:
"""
Generate AI-powered insights for website analysis.
Args:
analysis_results: Complete website analysis results
Returns:
AI-generated insights
"""
try:
logger.info("🤖 Generating AI-powered website insights")
# Prepare analysis summary for AI
analysis_summary = {
'url': analysis_results.get('website_url', ''),
'industry': analysis_results.get('industry', ''),
'content_count': analysis_results.get('content_analysis', {}).get('total_pages', 0),
'content_quality': analysis_results.get('content_analysis', {}).get('content_quality_score', 0),
'performance_score': analysis_results.get('performance_analysis', {}).get('traffic_metrics', {}).get('monthly_visitors', ''),
'seo_score': analysis_results.get('seo_analysis', {}).get('technical_seo', {}).get('title_tag_optimization', 0)
}
# Generate comprehensive AI insights using AI engine
ai_insights = await self.ai_engine.analyze_website_performance(analysis_summary)
if ai_insights:
logger.info("✅ Generated comprehensive AI website insights")
return ai_insights
else:
logger.warning("⚠️ Could not generate AI website insights")
return {}
except Exception as e:
logger.error(f"Error generating AI website insights: {str(e)}")
return {}
async def analyze_content_quality(self, url: str) -> Dict[str, Any]:
"""
Analyze content quality of the website.
Args:
url: Website URL
Returns:
Content quality analysis results
"""
try:
logger.info(f"Analyzing content quality for {url}")
# TODO: Integrate with actual content quality analysis service
# This will analyze content quality metrics
# Simulate content quality analysis
quality_analysis = {
'overall_quality_score': 8.5,
'quality_dimensions': {
'readability': 8.0,
'comprehensiveness': 9.0,
'accuracy': 8.5,
'engagement': 7.5,
'seo_optimization': 8.0
},
'content_strengths': [
'Comprehensive topic coverage',
'Expert-level insights',
'Clear structure and organization',
'Accurate information',
'Good readability'
],
'content_weaknesses': [
'Limited visual content',
'Missing interactive elements',
'Outdated information in some areas',
'Inconsistent content depth'
],
'improvement_areas': [
{
'area': 'Visual Content',
'current_score': 6.0,
'target_score': 9.0,
'improvement_suggestions': [
'Add more images and infographics',
'Include video content',
'Create visual guides',
'Add interactive elements'
]
},
{
'area': 'Content Freshness',
'current_score': 7.0,
'target_score': 9.0,
'improvement_suggestions': [
'Update outdated content',
'Add recent industry insights',
'Include current trends',
'Regular content audits'
]
}
]
}
logger.info("Content quality analysis completed")
return quality_analysis
except Exception as e:
logger.error(f"Error in content quality analysis: {str(e)}")
return {}
async def analyze_user_experience(self, url: str) -> Dict[str, Any]:
"""
Analyze user experience aspects of the website.
Args:
url: Website URL
Returns:
User experience analysis results
"""
try:
logger.info(f"Analyzing user experience for {url}")
# TODO: Integrate with actual UX analysis service
# This will analyze user experience metrics
# Simulate UX analysis
ux_analysis = {
'navigation_experience': {
'menu_clarity': 8.5,
'search_functionality': 7.0,
'breadcrumb_navigation': 9.0,
'mobile_navigation': 8.0
},
'content_accessibility': {
'font_readability': 8.5,
'color_contrast': 9.0,
'alt_text_usage': 7.5,
'keyboard_navigation': 8.0
},
'page_speed_experience': {
'loading_perception': 7.5,
'interactive_elements': 8.0,
'smooth_scrolling': 8.5,
'mobile_performance': 7.0
},
'content_engagement': {
'content_clarity': 8.5,
'call_to_action_visibility': 7.5,
'content_scannability': 8.0,
'information_architecture': 8.5
},
'overall_ux_score': 8.2,
'improvement_suggestions': [
'Improve search functionality',
'Add more visual content',
'Optimize mobile experience',
'Enhance call-to-action visibility'
]
}
logger.info("User experience analysis completed")
return ux_analysis
except Exception as e:
logger.error(f"Error in user experience analysis: {str(e)}")
return {}
async def get_website_summary(self, analysis_id: str) -> Dict[str, Any]:
"""
Get a summary of website analysis.
Args:
analysis_id: Analysis identifier
Returns:
Website analysis summary
"""
try:
logger.info(f"Getting website analysis summary for {analysis_id}")
# TODO: Retrieve analysis from database
# This will be implemented when database integration is complete
summary = {
'analysis_id': analysis_id,
'pages_analyzed': 25,
'content_score': 8.5,
'seo_score': 7.8,
'user_experience_score': 8.2,
'improvement_areas': [
'Content depth and comprehensiveness',
'SEO optimization',
'Mobile responsiveness'
],
'timestamp': datetime.utcnow().isoformat()
}
return summary
except Exception as e:
logger.error(f"Error getting website summary: {str(e)}")
return {}
async def health_check(self) -> Dict[str, Any]:
"""
Health check for the website analyzer service.
Returns:
Health status information
"""
try:
logger.info("Performing health check for WebsiteAnalyzer")
health_status = {
'service': 'WebsiteAnalyzer',
'status': 'healthy',
'dependencies': {
'ai_engine': 'operational'
},
'capabilities': {
'content_analysis': 'operational',
'structure_analysis': 'operational',
'performance_analysis': 'operational',
'seo_analysis': 'operational'
},
'timestamp': datetime.utcnow().isoformat()
}
logger.info("WebsiteAnalyzer health check passed")
return health_status
except Exception as e:
logger.error(f"WebsiteAnalyzer health check failed: {str(e)}")
return {
'service': 'WebsiteAnalyzer',
'status': 'unhealthy',
'error': str(e),
'timestamp': datetime.utcnow().isoformat()
}

View File

@@ -0,0 +1,388 @@
"""
Content Planning Database Operations
Handles all database operations for content planning system.
"""
from typing import List, Optional, Dict, Any
from sqlalchemy.orm import Session
from sqlalchemy.exc import SQLAlchemyError
from loguru import logger
from datetime import datetime
from models.content_planning import (
ContentStrategy, CalendarEvent, ContentAnalytics,
ContentGapAnalysis, ContentRecommendation
)
class ContentPlanningDBService:
"""Database operations for content planning system."""
def __init__(self, db_session: Session):
self.db = db_session
self.logger = logger
# Content Strategy Operations
async def create_content_strategy(self, strategy_data: Dict[str, Any]) -> Optional[ContentStrategy]:
"""Create a new content strategy."""
try:
strategy = ContentStrategy(**strategy_data)
self.db.add(strategy)
self.db.commit()
self.db.refresh(strategy)
self.logger.info(f"Created content strategy: {strategy.id}")
return strategy
except SQLAlchemyError as e:
self.db.rollback()
self.logger.error(f"Error creating content strategy: {str(e)}")
return None
async def get_content_strategy(self, strategy_id: int) -> Optional[ContentStrategy]:
"""Get content strategy by ID."""
try:
return self.db.query(ContentStrategy).filter(ContentStrategy.id == strategy_id).first()
except SQLAlchemyError as e:
self.logger.error(f"Error getting content strategy: {str(e)}")
return None
async def get_user_content_strategies(self, user_id: int) -> List[ContentStrategy]:
"""Get all content strategies for a user."""
try:
return self.db.query(ContentStrategy).filter(ContentStrategy.user_id == user_id).all()
except SQLAlchemyError as e:
self.logger.error(f"Error getting user content strategies: {str(e)}")
return []
async def update_content_strategy(self, strategy_id: int, update_data: Dict[str, Any]) -> Optional[ContentStrategy]:
"""Update content strategy."""
try:
strategy = await self.get_content_strategy(strategy_id)
if strategy:
for key, value in update_data.items():
setattr(strategy, key, value)
strategy.updated_at = datetime.utcnow()
self.db.commit()
self.logger.info(f"Updated content strategy: {strategy_id}")
return strategy
return None
except SQLAlchemyError as e:
self.db.rollback()
self.logger.error(f"Error updating content strategy: {str(e)}")
return None
async def delete_content_strategy(self, strategy_id: int) -> bool:
"""Delete content strategy."""
try:
strategy = await self.get_content_strategy(strategy_id)
if strategy:
self.db.delete(strategy)
self.db.commit()
self.logger.info(f"Deleted content strategy: {strategy_id}")
return True
return False
except SQLAlchemyError as e:
self.db.rollback()
self.logger.error(f"Error deleting content strategy: {str(e)}")
return False
# Calendar Event Operations
async def create_calendar_event(self, event_data: Dict[str, Any]) -> Optional[CalendarEvent]:
"""Create a new calendar event."""
try:
event = CalendarEvent(**event_data)
self.db.add(event)
self.db.commit()
self.db.refresh(event)
self.logger.info(f"Created calendar event: {event.id}")
return event
except SQLAlchemyError as e:
self.db.rollback()
self.logger.error(f"Error creating calendar event: {str(e)}")
return None
async def get_calendar_event(self, event_id: int) -> Optional[CalendarEvent]:
"""Get calendar event by ID."""
try:
return self.db.query(CalendarEvent).filter(CalendarEvent.id == event_id).first()
except SQLAlchemyError as e:
self.logger.error(f"Error getting calendar event: {str(e)}")
return None
async def get_strategy_calendar_events(self, strategy_id: int) -> List[CalendarEvent]:
"""Get all calendar events for a strategy."""
try:
return self.db.query(CalendarEvent).filter(CalendarEvent.strategy_id == strategy_id).all()
except SQLAlchemyError as e:
self.logger.error(f"Error getting strategy calendar events: {str(e)}")
return []
async def update_calendar_event(self, event_id: int, update_data: Dict[str, Any]) -> Optional[CalendarEvent]:
"""Update calendar event."""
try:
event = await self.get_calendar_event(event_id)
if event:
for key, value in update_data.items():
setattr(event, key, value)
event.updated_at = datetime.utcnow()
self.db.commit()
self.logger.info(f"Updated calendar event: {event_id}")
return event
return None
except SQLAlchemyError as e:
self.db.rollback()
self.logger.error(f"Error updating calendar event: {str(e)}")
return None
async def delete_calendar_event(self, event_id: int) -> bool:
"""Delete calendar event."""
try:
event = await self.get_calendar_event(event_id)
if event:
self.db.delete(event)
self.db.commit()
self.logger.info(f"Deleted calendar event: {event_id}")
return True
return False
except SQLAlchemyError as e:
self.db.rollback()
self.logger.error(f"Error deleting calendar event: {str(e)}")
return False
# Content Gap Analysis Operations
async def create_content_gap_analysis(self, analysis_data: Dict[str, Any]) -> Optional[ContentGapAnalysis]:
"""Create a new content gap analysis."""
try:
analysis = ContentGapAnalysis(**analysis_data)
self.db.add(analysis)
self.db.commit()
self.db.refresh(analysis)
self.logger.info(f"Created content gap analysis: {analysis.id}")
return analysis
except SQLAlchemyError as e:
self.db.rollback()
self.logger.error(f"Error creating content gap analysis: {str(e)}")
return None
async def get_content_gap_analysis(self, analysis_id: int) -> Optional[ContentGapAnalysis]:
"""Get content gap analysis by ID."""
try:
return self.db.query(ContentGapAnalysis).filter(ContentGapAnalysis.id == analysis_id).first()
except SQLAlchemyError as e:
self.logger.error(f"Error getting content gap analysis: {str(e)}")
return None
async def get_user_content_gap_analyses(self, user_id: int) -> List[ContentGapAnalysis]:
"""Get all content gap analyses for a user."""
try:
return self.db.query(ContentGapAnalysis).filter(ContentGapAnalysis.user_id == user_id).all()
except SQLAlchemyError as e:
self.logger.error(f"Error getting user content gap analyses: {str(e)}")
return []
async def update_content_gap_analysis(self, analysis_id: int, update_data: Dict[str, Any]) -> Optional[ContentGapAnalysis]:
"""Update content gap analysis."""
try:
analysis = await self.get_content_gap_analysis(analysis_id)
if analysis:
for key, value in update_data.items():
setattr(analysis, key, value)
analysis.updated_at = datetime.utcnow()
self.db.commit()
self.logger.info(f"Updated content gap analysis: {analysis_id}")
return analysis
return None
except SQLAlchemyError as e:
self.db.rollback()
self.logger.error(f"Error updating content gap analysis: {str(e)}")
return None
async def delete_content_gap_analysis(self, analysis_id: int) -> bool:
"""Delete content gap analysis."""
try:
analysis = await self.get_content_gap_analysis(analysis_id)
if analysis:
self.db.delete(analysis)
self.db.commit()
self.logger.info(f"Deleted content gap analysis: {analysis_id}")
return True
return False
except SQLAlchemyError as e:
self.db.rollback()
self.logger.error(f"Error deleting content gap analysis: {str(e)}")
return False
# Content Recommendation Operations
async def create_content_recommendation(self, recommendation_data: Dict[str, Any]) -> Optional[ContentRecommendation]:
"""Create a new content recommendation."""
try:
recommendation = ContentRecommendation(**recommendation_data)
self.db.add(recommendation)
self.db.commit()
self.db.refresh(recommendation)
self.logger.info(f"Created content recommendation: {recommendation.id}")
return recommendation
except SQLAlchemyError as e:
self.db.rollback()
self.logger.error(f"Error creating content recommendation: {str(e)}")
return None
async def get_content_recommendation(self, recommendation_id: int) -> Optional[ContentRecommendation]:
"""Get content recommendation by ID."""
try:
return self.db.query(ContentRecommendation).filter(ContentRecommendation.id == recommendation_id).first()
except SQLAlchemyError as e:
self.logger.error(f"Error getting content recommendation: {str(e)}")
return None
async def get_user_content_recommendations(self, user_id: int) -> List[ContentRecommendation]:
"""Get all content recommendations for a user."""
try:
return self.db.query(ContentRecommendation).filter(ContentRecommendation.user_id == user_id).all()
except SQLAlchemyError as e:
self.logger.error(f"Error getting user content recommendations: {str(e)}")
return []
async def update_content_recommendation(self, recommendation_id: int, update_data: Dict[str, Any]) -> Optional[ContentRecommendation]:
"""Update content recommendation."""
try:
recommendation = await self.get_content_recommendation(recommendation_id)
if recommendation:
for key, value in update_data.items():
setattr(recommendation, key, value)
recommendation.updated_at = datetime.utcnow()
self.db.commit()
self.logger.info(f"Updated content recommendation: {recommendation_id}")
return recommendation
return None
except SQLAlchemyError as e:
self.db.rollback()
self.logger.error(f"Error updating content recommendation: {str(e)}")
return None
async def delete_content_recommendation(self, recommendation_id: int) -> bool:
"""Delete content recommendation."""
try:
recommendation = await self.get_content_recommendation(recommendation_id)
if recommendation:
self.db.delete(recommendation)
self.db.commit()
self.logger.info(f"Deleted content recommendation: {recommendation_id}")
return True
return False
except SQLAlchemyError as e:
self.db.rollback()
self.logger.error(f"Error deleting content recommendation: {str(e)}")
return False
# Analytics Operations
async def create_content_analytics(self, analytics_data: Dict[str, Any]) -> Optional[ContentAnalytics]:
"""Create new content analytics."""
try:
analytics = ContentAnalytics(**analytics_data)
self.db.add(analytics)
self.db.commit()
self.db.refresh(analytics)
self.logger.info(f"Created content analytics: {analytics.id}")
return analytics
except SQLAlchemyError as e:
self.db.rollback()
self.logger.error(f"Error creating content analytics: {str(e)}")
return None
async def get_event_analytics(self, event_id: int) -> List[ContentAnalytics]:
"""Get analytics for a specific event."""
try:
return self.db.query(ContentAnalytics).filter(ContentAnalytics.event_id == event_id).all()
except SQLAlchemyError as e:
self.logger.error(f"Error getting event analytics: {str(e)}")
return []
async def get_strategy_analytics(self, strategy_id: int) -> List[ContentAnalytics]:
"""Get analytics for a specific strategy."""
try:
return self.db.query(ContentAnalytics).filter(ContentAnalytics.strategy_id == strategy_id).all()
except SQLAlchemyError as e:
self.logger.error(f"Error getting strategy analytics: {str(e)}")
return []
async def get_analytics_by_platform(self, platform: str) -> List[ContentAnalytics]:
"""Get analytics for a specific platform."""
try:
return self.db.query(ContentAnalytics).filter(ContentAnalytics.platform == platform).all()
except SQLAlchemyError as e:
self.logger.error(f"Error getting platform analytics: {str(e)}")
return []
# Advanced Query Operations
async def get_strategies_with_analytics(self, user_id: int) -> List[Dict[str, Any]]:
"""Get content strategies with their analytics summary."""
try:
strategies = await self.get_user_content_strategies(user_id)
result = []
for strategy in strategies:
analytics = await self.get_strategy_analytics(strategy.id)
avg_performance = sum(a.performance_score or 0 for a in analytics) / len(analytics) if analytics else 0
result.append({
'strategy': strategy.to_dict(),
'analytics_count': len(analytics),
'average_performance': avg_performance,
'last_analytics': max(a.recorded_at for a in analytics).isoformat() if analytics else None
})
return result
except SQLAlchemyError as e:
self.logger.error(f"Error getting strategies with analytics: {str(e)}")
return []
async def get_events_by_status(self, strategy_id: int, status: str) -> List[CalendarEvent]:
"""Get calendar events by status for a strategy."""
try:
return self.db.query(CalendarEvent).filter(
CalendarEvent.strategy_id == strategy_id,
CalendarEvent.status == status
).all()
except SQLAlchemyError as e:
self.logger.error(f"Error getting events by status: {str(e)}")
return []
async def get_recommendations_by_priority(self, user_id: int, priority: str) -> List[ContentRecommendation]:
"""Get content recommendations by priority for a user."""
try:
return self.db.query(ContentRecommendation).filter(
ContentRecommendation.user_id == user_id,
ContentRecommendation.priority == priority
).all()
except SQLAlchemyError as e:
self.logger.error(f"Error getting recommendations by priority: {str(e)}")
return []
# Health Check
async def health_check(self) -> Dict[str, Any]:
"""Database health check."""
try:
# Test basic operations
strategy_count = self.db.query(ContentStrategy).count()
event_count = self.db.query(CalendarEvent).count()
analysis_count = self.db.query(ContentGapAnalysis).count()
recommendation_count = self.db.query(ContentRecommendation).count()
analytics_count = self.db.query(ContentAnalytics).count()
return {
'status': 'healthy',
'tables': {
'content_strategies': strategy_count,
'calendar_events': event_count,
'content_gap_analyses': analysis_count,
'content_recommendations': recommendation_count,
'content_analytics': analytics_count
},
'timestamp': datetime.utcnow().isoformat()
}
except SQLAlchemyError as e:
self.logger.error(f"Database health check failed: {str(e)}")
return {
'status': 'unhealthy',
'error': str(e),
'timestamp': datetime.utcnow().isoformat()
}

View File

@@ -0,0 +1,505 @@
"""
Content Planning Service
Handles content strategy development, calendar management, and gap analysis.
"""
from typing import Optional, List, Dict, Any
from sqlalchemy.orm import Session
from loguru import logger
from datetime import datetime
from services.database import get_db_session
from services.content_planning_db import ContentPlanningDBService
from services.ai_service_manager import AIServiceManager
from models.content_planning import ContentStrategy, CalendarEvent, ContentAnalytics
class ContentPlanningService:
"""Service for managing content planning operations with database integration."""
def __init__(self, db_session: Optional[Session] = None):
self.db_session = db_session
self.db_service = None
self.ai_manager = AIServiceManager()
if db_session:
self.db_service = ContentPlanningDBService(db_session)
def _get_db_session(self) -> Session:
"""Get database session."""
if not self.db_session:
self.db_session = get_db_session()
if self.db_session:
self.db_service = ContentPlanningDBService(self.db_session)
return self.db_session
def _get_db_service(self) -> ContentPlanningDBService:
"""Get database service."""
if not self.db_service:
self._get_db_session()
return self.db_service
async def analyze_content_strategy_with_ai(self, industry: str, target_audience: Dict[str, Any],
business_goals: List[str], content_preferences: Dict[str, Any],
user_id: int) -> Optional[ContentStrategy]:
"""
Analyze and create content strategy with AI recommendations and database storage.
Args:
industry: Target industry
target_audience: Audience demographics and preferences
business_goals: List of business objectives
content_preferences: Content type and platform preferences
user_id: User ID for database storage
Returns:
Created content strategy with AI recommendations
"""
try:
logger.info(f"Analyzing content strategy with AI for industry: {industry}")
# Generate AI recommendations using AI Service Manager
ai_analysis_data = {
'industry': industry,
'target_audience': target_audience,
'business_goals': business_goals,
'content_preferences': content_preferences
}
# Get AI recommendations
ai_recommendations = await self.ai_manager.generate_content_gap_analysis(ai_analysis_data)
# Prepare strategy data for database
strategy_data = {
'user_id': user_id,
'name': f"Content Strategy for {industry}",
'industry': industry,
'target_audience': target_audience,
'content_pillars': ai_recommendations.get('content_pillars', []),
'ai_recommendations': ai_recommendations
}
# Create strategy in database
db_service = self._get_db_service()
if db_service:
strategy = await db_service.create_content_strategy(strategy_data)
if strategy:
logger.info(f"Content strategy created with AI recommendations: {strategy.id}")
# Store AI analytics
await self._store_ai_analytics(strategy.id, ai_recommendations, 'strategy_analysis')
return strategy
else:
logger.error("Failed to create content strategy in database")
return None
else:
logger.error("Database service not available")
return None
except Exception as e:
logger.error(f"Error analyzing content strategy with AI: {str(e)}")
return None
async def create_content_strategy_with_ai(self, user_id: int, strategy_data: Dict[str, Any]) -> Optional[ContentStrategy]:
"""
Create content strategy with AI recommendations and database storage.
Args:
user_id: User ID
strategy_data: Strategy configuration data
Returns:
Created content strategy or None if failed
"""
try:
logger.info(f"Creating content strategy with AI for user: {user_id}")
# Generate AI recommendations
ai_recommendations = await self._generate_ai_recommendations(strategy_data)
strategy_data['ai_recommendations'] = ai_recommendations
# Create strategy in database
db_service = self._get_db_service()
if db_service:
strategy = await db_service.create_content_strategy(strategy_data)
if strategy:
logger.info(f"Content strategy created with AI recommendations: {strategy.id}")
# Store AI analytics
await self._store_ai_analytics(strategy.id, ai_recommendations, 'strategy_creation')
return strategy
else:
logger.error("Failed to create content strategy in database")
return None
else:
logger.error("Database service not available")
return None
except Exception as e:
logger.error(f"Error creating content strategy with AI: {str(e)}")
return None
async def get_content_strategy(self, user_id: int, strategy_id: Optional[int] = None) -> Optional[ContentStrategy]:
"""
Get user's content strategy from database.
Args:
user_id: User ID
strategy_id: Optional specific strategy ID
Returns:
Content strategy or None if not found
"""
try:
logger.info(f"Getting content strategy for user: {user_id}")
db_service = self._get_db_service()
if db_service:
if strategy_id:
strategy = await db_service.get_content_strategy(strategy_id)
else:
strategies = await db_service.get_user_content_strategies(user_id)
strategy = strategies[0] if strategies else None
if strategy:
logger.info(f"Content strategy retrieved: {strategy.id}")
return strategy
else:
logger.info(f"No content strategy found for user: {user_id}")
return None
else:
logger.error("Database service not available")
return None
except Exception as e:
logger.error(f"Error getting content strategy: {str(e)}")
return None
async def create_calendar_event_with_ai(self, event_data: Dict[str, Any]) -> Optional[CalendarEvent]:
"""
Create calendar event with AI recommendations and database storage.
Args:
event_data: Event configuration data
Returns:
Created calendar event or None if failed
"""
try:
logger.info(f"Creating calendar event with AI: {event_data.get('title', 'Untitled')}")
# Generate AI recommendations for the event
ai_recommendations = await self._generate_event_ai_recommendations(event_data)
event_data['ai_recommendations'] = ai_recommendations
# Create event in database
db_service = self._get_db_service()
if db_service:
event = await db_service.create_calendar_event(event_data)
if event:
logger.info(f"Calendar event created with AI recommendations: {event.id}")
# Store AI analytics
await self._store_ai_analytics(event.strategy_id, ai_recommendations, 'event_creation', event.id)
return event
else:
logger.error("Failed to create calendar event in database")
return None
else:
logger.error("Database service not available")
return None
except Exception as e:
logger.error(f"Error creating calendar event with AI: {str(e)}")
return None
async def get_calendar_events(self, strategy_id: Optional[int] = None) -> List[CalendarEvent]:
"""
Get calendar events from database.
Args:
strategy_id: Optional strategy ID to filter events
Returns:
List of calendar events
"""
try:
logger.info("Getting calendar events from database")
db_service = self._get_db_service()
if db_service:
if strategy_id:
events = await db_service.get_strategy_calendar_events(strategy_id)
else:
# TODO: Implement get_all_calendar_events method
events = []
logger.info(f"Retrieved {len(events)} calendar events")
return events
else:
logger.error("Database service not available")
return []
except Exception as e:
logger.error(f"Error getting calendar events: {str(e)}")
return []
async def analyze_content_gaps_with_ai(self, website_url: str, competitor_urls: List[str],
user_id: int, target_keywords: Optional[List[str]] = None) -> Optional[Dict[str, Any]]:
"""
Analyze content gaps with AI and store results in database.
Args:
website_url: Target website URL
competitor_urls: List of competitor URLs
user_id: User ID for database storage
target_keywords: Optional target keywords
Returns:
Content gap analysis results
"""
try:
logger.info(f"Analyzing content gaps with AI for: {website_url}")
# Generate AI analysis
ai_analysis_data = {
'website_url': website_url,
'competitor_urls': competitor_urls,
'target_keywords': target_keywords or []
}
ai_analysis = await self.ai_manager.generate_content_gap_analysis(ai_analysis_data)
# Store analysis in database
analysis_data = {
'user_id': user_id,
'website_url': website_url,
'competitor_urls': competitor_urls,
'target_keywords': target_keywords,
'analysis_results': ai_analysis.get('analysis_results', {}),
'recommendations': ai_analysis.get('recommendations', {}),
'opportunities': ai_analysis.get('opportunities', {})
}
db_service = self._get_db_service()
if db_service:
analysis = await db_service.create_content_gap_analysis(analysis_data)
if analysis:
logger.info(f"Content gap analysis stored in database: {analysis.id}")
# Store AI analytics
await self._store_ai_analytics(user_id, ai_analysis, 'gap_analysis')
return {
'analysis_id': analysis.id,
'results': ai_analysis,
'stored_at': analysis.created_at.isoformat()
}
else:
logger.error("Failed to store content gap analysis in database")
return None
else:
logger.error("Database service not available")
return None
except Exception as e:
logger.error(f"Error analyzing content gaps with AI: {str(e)}")
return None
async def generate_content_recommendations_with_ai(self, strategy_id: int) -> List[Dict[str, Any]]:
"""
Generate content recommendations with AI and store in database.
Args:
strategy_id: Strategy ID
Returns:
List of content recommendations
"""
try:
logger.info(f"Generating content recommendations with AI for strategy: {strategy_id}")
# Get strategy data
db_service = self._get_db_service()
if not db_service:
logger.error("Database service not available")
return []
strategy = await db_service.get_content_strategy(strategy_id)
if not strategy:
logger.error(f"Strategy not found: {strategy_id}")
return []
# Generate AI recommendations
recommendation_data = {
'strategy_id': strategy_id,
'industry': strategy.industry,
'target_audience': strategy.target_audience,
'content_pillars': strategy.content_pillars
}
ai_recommendations = await self.ai_manager.generate_content_gap_analysis(recommendation_data)
# Store recommendations in database
for rec in ai_recommendations.get('recommendations', []):
rec_data = {
'user_id': strategy.user_id,
'strategy_id': strategy_id,
'recommendation_type': rec.get('type', 'content'),
'title': rec.get('title', ''),
'description': rec.get('description', ''),
'priority': rec.get('priority', 'medium'),
'estimated_impact': rec.get('estimated_impact', 'medium'),
'ai_recommendations': rec
}
await db_service.create_content_recommendation(rec_data)
# Store AI analytics
await self._store_ai_analytics(strategy_id, ai_recommendations, 'recommendation_generation')
logger.info(f"Generated and stored {len(ai_recommendations.get('recommendations', []))} recommendations")
return ai_recommendations.get('recommendations', [])
except Exception as e:
logger.error(f"Error generating content recommendations with AI: {str(e)}")
return []
async def track_content_performance_with_ai(self, event_id: int) -> Optional[Dict[str, Any]]:
"""
Track content performance with AI predictions and store in database.
Args:
event_id: Calendar event ID
Returns:
Performance tracking results
"""
try:
logger.info(f"Tracking content performance with AI for event: {event_id}")
# Get event data
db_service = self._get_db_service()
if not db_service:
logger.error("Database service not available")
return None
event = await db_service.get_calendar_event(event_id)
if not event:
logger.error(f"Event not found: {event_id}")
return None
# Generate AI performance prediction
performance_data = {
'event_id': event_id,
'title': event.title,
'content_type': event.content_type,
'platform': event.platform,
'ai_recommendations': event.ai_recommendations
}
ai_prediction = await self.ai_manager.generate_content_gap_analysis(performance_data)
# Store analytics in database
analytics_data = {
'event_id': event_id,
'strategy_id': event.strategy_id,
'platform': event.platform,
'content_type': event.content_type,
'performance_score': ai_prediction.get('performance_score', 0),
'engagement_prediction': ai_prediction.get('engagement_prediction', 'medium'),
'ai_insights': ai_prediction.get('insights', {}),
'recommendations': ai_prediction.get('optimization_recommendations', [])
}
analytics = await db_service.create_content_analytics(analytics_data)
if analytics:
logger.info(f"Performance tracking stored in database: {analytics.id}")
# Store AI analytics
await self._store_ai_analytics(event.strategy_id, ai_prediction, 'performance_tracking', event_id)
return {
'analytics_id': analytics.id,
'performance_score': analytics.performance_score,
'engagement_prediction': analytics.engagement_prediction,
'ai_insights': analytics.ai_insights,
'recommendations': analytics.recommendations
}
else:
logger.error("Failed to store performance tracking in database")
return None
except Exception as e:
logger.error(f"Error tracking content performance with AI: {str(e)}")
return None
async def _generate_ai_recommendations(self, strategy_data: Dict[str, Any]) -> Dict[str, Any]:
"""Generate AI recommendations for content strategy."""
try:
ai_analysis_data = {
'industry': strategy_data.get('industry', ''),
'target_audience': strategy_data.get('target_audience', {}),
'content_preferences': strategy_data.get('content_preferences', {})
}
return await self.ai_manager.generate_content_gap_analysis(ai_analysis_data)
except Exception as e:
logger.error(f"Error generating AI recommendations: {str(e)}")
return {}
async def _generate_event_ai_recommendations(self, event_data: Dict[str, Any]) -> Dict[str, Any]:
"""Generate AI recommendations for calendar event."""
try:
ai_analysis_data = {
'content_type': event_data.get('content_type', ''),
'platform': event_data.get('platform', ''),
'title': event_data.get('title', ''),
'description': event_data.get('description', '')
}
return await self.ai_manager.generate_content_gap_analysis(ai_analysis_data)
except Exception as e:
logger.error(f"Error generating event AI recommendations: {str(e)}")
return {}
async def _store_ai_analytics(self, strategy_id: int, ai_results: Dict[str, Any],
analysis_type: str, event_id: Optional[int] = None) -> None:
"""Store AI analytics results in database."""
try:
db_service = self._get_db_service()
if not db_service:
return
analytics_data = {
'strategy_id': strategy_id,
'event_id': event_id,
'analysis_type': analysis_type,
'ai_results': ai_results,
'performance_score': ai_results.get('performance_score', 0),
'confidence_score': ai_results.get('confidence_score', 0.5),
'recommendations': ai_results.get('recommendations', [])
}
await db_service.create_content_analytics(analytics_data)
logger.info(f"AI analytics stored for {analysis_type}")
except Exception as e:
logger.error(f"Error storing AI analytics: {str(e)}")
def __del__(self):
"""Cleanup database session."""
if self.db_session:
try:
self.db_session.close()
except:
pass

View File

@@ -0,0 +1,79 @@
"""
Database service for ALwrity backend.
Handles database connections and sessions.
"""
import os
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, Session
from sqlalchemy.exc import SQLAlchemyError
from loguru import logger
from typing import Optional
# Import models
from models.onboarding import Base as OnboardingBase
from models.seo_analysis import Base as SEOAnalysisBase
from models.content_planning import Base as ContentPlanningBase
# Database configuration
DATABASE_URL = os.getenv('DATABASE_URL', 'sqlite:///./alwrity.db')
# Create engine
engine = create_engine(
DATABASE_URL,
echo=False, # Set to True for SQL debugging
pool_pre_ping=True,
pool_recycle=300,
)
# Create session factory
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
def get_db_session() -> Optional[Session]:
"""
Get a database session.
Returns:
Database session or None if connection fails
"""
try:
db = SessionLocal()
return db
except SQLAlchemyError as e:
logger.error(f"Error creating database session: {str(e)}")
return None
def init_database():
"""
Initialize the database by creating all tables.
"""
try:
# Create all tables for all models
OnboardingBase.metadata.create_all(bind=engine)
SEOAnalysisBase.metadata.create_all(bind=engine)
ContentPlanningBase.metadata.create_all(bind=engine)
logger.info("Database initialized successfully with all models")
except SQLAlchemyError as e:
logger.error(f"Error initializing database: {str(e)}")
raise
def close_database():
"""
Close database connections.
"""
try:
engine.dispose()
logger.info("Database connections closed")
except Exception as e:
logger.error(f"Error closing database connections: {str(e)}")
# Database dependency for FastAPI
def get_db():
"""
Database dependency for FastAPI endpoints.
"""
db = SessionLocal()
try:
yield db
finally:
db.close()

View File

@@ -0,0 +1,416 @@
"""
Enhanced Strategy Database Service
Handles database operations for enhanced content strategy models.
"""
from typing import Dict, Any, List, Optional
from datetime import datetime
from loguru import logger
from sqlalchemy.orm import Session
from sqlalchemy import and_, or_, desc
# Import enhanced strategy models
from models.enhanced_strategy_models import EnhancedContentStrategy, EnhancedAIAnalysisResult, OnboardingDataIntegration
class EnhancedStrategyDBService:
"""Database service for enhanced content strategy operations."""
def __init__(self, db: Session):
self.db = db
async def create_enhanced_strategy(self, strategy_data: Dict[str, Any]) -> EnhancedContentStrategy:
"""Create a new enhanced content strategy."""
try:
logger.info(f"Creating enhanced strategy: {strategy_data.get('name', 'Unknown')}")
# Create the enhanced strategy
enhanced_strategy = EnhancedContentStrategy(**strategy_data)
# Calculate completion percentage
enhanced_strategy.calculate_completion_percentage()
# Add to database
self.db.add(enhanced_strategy)
self.db.commit()
self.db.refresh(enhanced_strategy)
logger.info(f"Enhanced strategy created successfully: {enhanced_strategy.id}")
return enhanced_strategy
except Exception as e:
logger.error(f"Error creating enhanced strategy: {str(e)}")
self.db.rollback()
raise
async def get_enhanced_strategy(self, strategy_id: int) -> Optional[EnhancedContentStrategy]:
"""Get an enhanced content strategy by ID."""
try:
strategy = self.db.query(EnhancedContentStrategy).filter(
EnhancedContentStrategy.id == strategy_id
).first()
if strategy:
strategy.calculate_completion_percentage()
return strategy
except Exception as e:
logger.error(f"Error getting enhanced strategy: {str(e)}")
raise
async def get_enhanced_strategies_by_user(self, user_id: int) -> List[EnhancedContentStrategy]:
"""Get all enhanced strategies for a user."""
try:
strategies = self.db.query(EnhancedContentStrategy).filter(
EnhancedContentStrategy.user_id == user_id
).order_by(desc(EnhancedContentStrategy.created_at)).all()
# Calculate completion percentage for each strategy
for strategy in strategies:
strategy.calculate_completion_percentage()
return strategies
except Exception as e:
logger.error(f"Error getting enhanced strategies for user: {str(e)}")
raise
async def update_enhanced_strategy(self, strategy_id: int, update_data: Dict[str, Any]) -> Optional[EnhancedContentStrategy]:
"""Update an enhanced content strategy."""
try:
strategy = await self.get_enhanced_strategy(strategy_id)
if not strategy:
return None
# Update fields
for field, value in update_data.items():
if hasattr(strategy, field):
setattr(strategy, field, value)
# Update timestamp
strategy.updated_at = datetime.utcnow()
# Recalculate completion percentage
strategy.calculate_completion_percentage()
self.db.commit()
self.db.refresh(strategy)
logger.info(f"Enhanced strategy updated successfully: {strategy_id}")
return strategy
except Exception as e:
logger.error(f"Error updating enhanced strategy: {str(e)}")
self.db.rollback()
raise
async def delete_enhanced_strategy(self, strategy_id: int) -> bool:
"""Delete an enhanced content strategy."""
try:
strategy = await self.get_enhanced_strategy(strategy_id)
if not strategy:
return False
self.db.delete(strategy)
self.db.commit()
logger.info(f"Enhanced strategy deleted successfully: {strategy_id}")
return True
except Exception as e:
logger.error(f"Error deleting enhanced strategy: {str(e)}")
self.db.rollback()
raise
async def get_enhanced_strategies_with_analytics(self, user_id: Optional[int] = None, strategy_id: Optional[int] = None) -> List[Dict[str, Any]]:
"""Get enhanced strategies with comprehensive analytics and AI analysis."""
try:
# Build base query
query = self.db.query(EnhancedContentStrategy)
if user_id:
query = query.filter(EnhancedContentStrategy.user_id == user_id)
if strategy_id:
query = query.filter(EnhancedContentStrategy.id == strategy_id)
strategies = query.order_by(desc(EnhancedContentStrategy.created_at)).all()
enhanced_strategies = []
for strategy in strategies:
# Calculate completion percentage
strategy.calculate_completion_percentage()
# Get latest AI analysis
latest_analysis = await self.get_latest_ai_analysis(strategy.id)
# Get onboarding integration
onboarding_integration = await self.get_onboarding_integration(strategy.id)
# Build comprehensive strategy data
strategy_data = strategy.to_dict()
strategy_data.update({
'ai_analysis': latest_analysis,
'onboarding_integration': onboarding_integration,
'completion_percentage': strategy.completion_percentage,
'strategic_insights': self._extract_strategic_insights(strategy),
'market_positioning': strategy.market_positioning,
'strategic_scores': strategy.strategic_scores,
'competitive_advantages': strategy.competitive_advantages,
'strategic_risks': strategy.strategic_risks,
'opportunity_analysis': strategy.opportunity_analysis
})
enhanced_strategies.append(strategy_data)
return enhanced_strategies
except Exception as e:
logger.error(f"Error getting enhanced strategies with analytics: {str(e)}")
raise
async def get_latest_ai_analysis(self, strategy_id: int) -> Optional[Dict[str, Any]]:
"""Get the latest AI analysis for a strategy."""
try:
analysis = self.db.query(EnhancedAIAnalysisResult).filter(
EnhancedAIAnalysisResult.strategy_id == strategy_id
).order_by(desc(EnhancedAIAnalysisResult.created_at)).first()
return analysis.to_dict() if analysis else None
except Exception as e:
logger.error(f"Error getting latest AI analysis: {str(e)}")
return None
async def get_onboarding_integration(self, strategy_id: int) -> Optional[Dict[str, Any]]:
"""Get onboarding data integration for a strategy."""
try:
integration = self.db.query(OnboardingDataIntegration).filter(
OnboardingDataIntegration.strategy_id == strategy_id
).first()
return integration.to_dict() if integration else None
except Exception as e:
logger.error(f"Error getting onboarding integration: {str(e)}")
return None
async def create_ai_analysis_result(self, analysis_data: Dict[str, Any]) -> EnhancedAIAnalysisResult:
"""Create a new AI analysis result."""
try:
analysis_result = EnhancedAIAnalysisResult(**analysis_data)
self.db.add(analysis_result)
self.db.commit()
self.db.refresh(analysis_result)
logger.info(f"AI analysis result created successfully: {analysis_result.id}")
return analysis_result
except Exception as e:
logger.error(f"Error creating AI analysis result: {str(e)}")
self.db.rollback()
raise
async def create_onboarding_integration(self, integration_data: Dict[str, Any]) -> OnboardingDataIntegration:
"""Create a new onboarding data integration."""
try:
integration = OnboardingDataIntegration(**integration_data)
self.db.add(integration)
self.db.commit()
self.db.refresh(integration)
logger.info(f"Onboarding integration created successfully: {integration.id}")
return integration
except Exception as e:
logger.error(f"Error creating onboarding integration: {str(e)}")
self.db.rollback()
raise
async def get_strategy_completion_stats(self, user_id: int) -> Dict[str, Any]:
"""Get completion statistics for a user's strategies."""
try:
strategies = await self.get_enhanced_strategies_by_user(user_id)
if not strategies:
return {
'total_strategies': 0,
'average_completion': 0.0,
'completion_distribution': {},
'recent_strategies': []
}
# Calculate statistics
total_strategies = len(strategies)
average_completion = sum(s.completion_percentage for s in strategies) / total_strategies
# Completion distribution
completion_distribution = {
'0-25%': len([s for s in strategies if s.completion_percentage <= 25]),
'26-50%': len([s for s in strategies if 25 < s.completion_percentage <= 50]),
'51-75%': len([s for s in strategies if 50 < s.completion_percentage <= 75]),
'76-100%': len([s for s in strategies if s.completion_percentage > 75])
}
# Recent strategies (last 5)
recent_strategies = [
{
'id': s.id,
'name': s.name,
'completion_percentage': s.completion_percentage,
'created_at': s.created_at.isoformat() if s.created_at else None
}
for s in strategies[:5]
]
return {
'total_strategies': total_strategies,
'average_completion': round(average_completion, 2),
'completion_distribution': completion_distribution,
'recent_strategies': recent_strategies
}
except Exception as e:
logger.error(f"Error getting strategy completion stats: {str(e)}")
raise
async def get_ai_analysis_history(self, strategy_id: int, limit: int = 10) -> List[Dict[str, Any]]:
"""Get AI analysis history for a strategy."""
try:
analyses = self.db.query(EnhancedAIAnalysisResult).filter(
EnhancedAIAnalysisResult.strategy_id == strategy_id
).order_by(desc(EnhancedAIAnalysisResult.created_at)).limit(limit).all()
return [analysis.to_dict() for analysis in analyses]
except Exception as e:
logger.error(f"Error getting AI analysis history: {str(e)}")
raise
async def update_strategy_ai_analysis(self, strategy_id: int, ai_analysis_data: Dict[str, Any]) -> bool:
"""Update strategy with new AI analysis data."""
try:
strategy = await self.get_enhanced_strategy(strategy_id)
if not strategy:
return False
# Update AI analysis fields
strategy.comprehensive_ai_analysis = ai_analysis_data.get('comprehensive_ai_analysis')
strategy.strategic_scores = ai_analysis_data.get('strategic_scores')
strategy.market_positioning = ai_analysis_data.get('market_positioning')
strategy.competitive_advantages = ai_analysis_data.get('competitive_advantages')
strategy.strategic_risks = ai_analysis_data.get('strategic_risks')
strategy.opportunity_analysis = ai_analysis_data.get('opportunity_analysis')
strategy.updated_at = datetime.utcnow()
self.db.commit()
logger.info(f"Strategy AI analysis updated successfully: {strategy_id}")
return True
except Exception as e:
logger.error(f"Error updating strategy AI analysis: {str(e)}")
self.db.rollback()
raise
def _extract_strategic_insights(self, strategy: EnhancedContentStrategy) -> List[str]:
"""Extract strategic insights from strategy data."""
insights = []
# Extract insights from business context
if strategy.business_objectives:
insights.append(f"Business objectives: {strategy.business_objectives}")
if strategy.target_metrics:
insights.append(f"Target metrics: {strategy.target_metrics}")
# Extract insights from audience intelligence
if strategy.content_preferences:
insights.append(f"Content preferences identified")
if strategy.audience_pain_points:
insights.append(f"Audience pain points mapped")
# Extract insights from competitive intelligence
if strategy.top_competitors:
insights.append(f"Competitor analysis completed")
if strategy.market_gaps:
insights.append(f"Market gaps identified")
# Extract insights from content strategy
if strategy.preferred_formats:
insights.append(f"Content formats selected")
if strategy.content_frequency:
insights.append(f"Publishing frequency defined")
# Extract insights from performance analytics
if strategy.traffic_sources:
insights.append(f"Traffic sources analyzed")
if strategy.conversion_rates:
insights.append(f"Conversion tracking established")
return insights
async def search_enhanced_strategies(self, user_id: int, search_term: str) -> List[EnhancedContentStrategy]:
"""Search enhanced strategies by name or content."""
try:
search_filter = or_(
EnhancedContentStrategy.name.ilike(f"%{search_term}%"),
EnhancedContentStrategy.industry.ilike(f"%{search_term}%")
)
strategies = self.db.query(EnhancedContentStrategy).filter(
and_(
EnhancedContentStrategy.user_id == user_id,
search_filter
)
).order_by(desc(EnhancedContentStrategy.created_at)).all()
# Calculate completion percentage for each strategy
for strategy in strategies:
strategy.calculate_completion_percentage()
return strategies
except Exception as e:
logger.error(f"Error searching enhanced strategies: {str(e)}")
raise
async def get_strategy_export_data(self, strategy_id: int) -> Dict[str, Any]:
"""Get comprehensive export data for a strategy."""
try:
strategy = await self.get_enhanced_strategy(strategy_id)
if not strategy:
return {}
# Get AI analysis history
ai_history = await self.get_ai_analysis_history(strategy_id)
# Get onboarding integration
onboarding_integration = await self.get_onboarding_integration(strategy_id)
export_data = {
'strategy': strategy.to_dict(),
'ai_analysis_history': ai_history,
'onboarding_integration': onboarding_integration,
'export_timestamp': datetime.utcnow().isoformat(),
'completion_percentage': strategy.completion_percentage,
'strategic_insights': self._extract_strategic_insights(strategy)
}
return export_data
except Exception as e:
logger.error(f"Error getting strategy export data: {str(e)}")
raise

View File

@@ -0,0 +1,22 @@
"""LLM Providers Service for ALwrity Backend.
This service handles all LLM (Language Model) provider integrations,
migrated from the legacy lib/gpt_providers functionality.
"""
from .main_text_generation import llm_text_gen
from .openai_provider import openai_chatgpt, test_openai_api_key
from .gemini_provider import gemini_text_response, gemini_structured_json_response, test_gemini_api_key
from .anthropic_provider import anthropic_text_response
from .deepseek_provider import deepseek_text_response
__all__ = [
"llm_text_gen",
"openai_chatgpt",
"test_openai_api_key",
"gemini_text_response",
"gemini_structured_json_response",
"test_gemini_api_key",
"anthropic_text_response",
"deepseek_text_response"
]

View File

@@ -0,0 +1,98 @@
"""Anthropic Provider Service for ALwrity Backend.
This service handles Anthropic API integrations,
migrated from the legacy lib/gpt_providers/text_generation/anthropic_text_gen.py
"""
import os
import json
import time
from typing import Dict, Any, Tuple
from loguru import logger
from tenacity import (
retry,
stop_after_attempt,
wait_random_exponential,
)
# Import APIKeyManager
from ..api_key_manager import APIKeyManager
try:
import anthropic
except ImportError:
anthropic = None
logger.warning("Anthropic library not available. Install with: pip install anthropic")
async def test_anthropic_api_key(api_key: str) -> Tuple[bool, str]:
"""
Test if the provided Anthropic API key is valid.
Args:
api_key (str): The Anthropic API key to test
Returns:
tuple[bool, str]: A tuple containing (is_valid, message)
"""
if not anthropic:
return False, "Anthropic library not available"
try:
# Create Anthropic client with the provided key
client = anthropic.Anthropic(api_key=api_key)
# Try to generate a simple response as a test
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=10,
messages=[{"role": "user", "content": "Hello"}]
)
# If we get here, the key is valid
return True, "Anthropic API key is valid"
except anthropic.AuthenticationError:
return False, "Invalid Anthropic API key"
except anthropic.RateLimitError:
return False, "Rate limit exceeded. Please try again later."
except Exception as e:
return False, f"Error testing Anthropic API key: {str(e)}"
@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def anthropic_text_response(prompt: str, model: str = "claude-3-5-sonnet-20241022",
temperature: float = 0.7, max_tokens: int = 4000,
system_prompt: str = None) -> str:
"""Get response from Anthropic Claude."""
if not anthropic:
logger.error("Anthropic library not available")
return "Anthropic library not available. Please install anthropic package."
try:
# Use APIKeyManager instead of direct environment variable access
api_key_manager = APIKeyManager()
api_key = api_key_manager.get_api_key("anthropic")
if not api_key:
raise ValueError("Anthropic API key not found. Please configure it in the onboarding process.")
client = anthropic.Anthropic(api_key=api_key)
# Prepare messages
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": prompt})
response = client.messages.create(
model=model,
max_tokens=max_tokens,
temperature=temperature,
messages=messages
)
logger.info(f"[anthropic_text_response] Generated response with {len(response.content[0].text)} characters")
return response.content[0].text
except Exception as err:
logger.error(f"Failed to get response from Anthropic: {err}. Retrying.")
raise

View File

@@ -0,0 +1,311 @@
"""
Gemini Audio Text Generation Module
This module provides a comprehensive interface for working with audio files using Google's Gemini API.
It supports various audio processing capabilities including transcription, summarization, and analysis.
Key Features:
------------
1. Audio Transcription: Convert speech in audio files to text
2. Audio Summarization: Generate concise summaries of audio content
3. Segment Analysis: Analyze specific time segments of audio files
4. Timestamped Transcription: Generate transcriptions with timestamps
5. Token Counting: Count tokens in audio files
6. Format Support: Information about supported audio formats
Supported Audio Formats:
----------------------
- WAV (audio/wav)
- MP3 (audio/mp3)
- AIFF (audio/aiff)
- AAC (audio/aac)
- OGG Vorbis (audio/ogg)
- FLAC (audio/flac)
Technical Details:
----------------
- Each second of audio is represented as 32 tokens
- Maximum supported length of audio data in a single prompt is 9.5 hours
- Audio files are downsampled to 16 Kbps data resolution
- Multi-channel audio is combined into a single channel
Usage:
------
```python
from lib.gpt_providers.audio_to_text_generation.gemini_audio_text import transcribe_audio, summarize_audio
# Basic transcription
transcript = transcribe_audio("path/to/audio.mp3")
print(transcript)
# Summarization
summary = summarize_audio("path/to/audio.mp3")
print(summary)
# Analyze specific segment
segment_analysis = analyze_audio_segment("path/to/audio.mp3", "02:30", "03:29")
print(segment_analysis)
```
Requirements:
------------
- GEMINI_API_KEY environment variable must be set
- google-generativeai Python package
- python-dotenv for environment variable management
- loguru for logging
Dependencies:
------------
- google.genai
- dotenv
- loguru
- os, sys, base64, typing
"""
import os
import sys
from pathlib import Path
import google.genai as genai
from google.genai import types
from loguru import logger
logger.remove()
logger.add(sys.stdout,
colorize=True,
format="<level>{level}</level>|<green>{file}:{line}:{function}</green>| {message}"
)
def load_environment():
"""Loads environment variables from a .env file."""
load_dotenv()
logger.info("Environment variables loaded successfully.")
def configure_google_api():
"""
Configures the Google Gemini API with the API key from environment variables.
Raises:
ValueError: If the GEMINI_API_KEY environment variable is not set.
"""
# Use APIKeyManager instead of direct environment variable access
api_key_manager = APIKeyManager()
api_key = api_key_manager.get_api_key("gemini")
if not api_key:
error_message = "Gemini API key not found. Please configure it in the onboarding process."
logger.error(error_message)
raise ValueError(error_message)
genai.configure(api_key=api_key)
logger.info("Google Gemini API configured successfully.")
def transcribe_audio(audio_file_path: str, prompt: str = "Transcribe the following audio:") -> Optional[str]:
"""
Transcribes audio using Google's Gemini model.
Args:
audio_file_path (str): The path to the audio file to be transcribed.
prompt (str, optional): The prompt to guide the transcription. Defaults to "Transcribe the following audio:".
Returns:
str: The transcribed text from the audio.
Returns None if transcription fails.
Raises:
FileNotFoundError: If the audio file is not found.
"""
try:
# Load environment variables and configure the Google API
load_environment()
configure_google_api()
logger.info(f"Attempting to transcribe audio file: {audio_file_path}")
# Check if file exists
if not os.path.exists(audio_file_path):
error_message = f"FileNotFoundError: The audio file at {audio_file_path} does not exist."
logger.error(error_message)
raise FileNotFoundError(error_message)
# Initialize a Gemini model appropriate for audio understanding
model = genai.GenerativeModel(model_name="gemini-1.5-flash")
# Upload the audio file
try:
audio_file = genai.upload_file(audio_file_path)
logger.info(f"Audio file uploaded successfully: {audio_file=}")
except FileNotFoundError:
error_message = f"FileNotFoundError: The audio file at {audio_file_path} does not exist."
logger.error(error_message)
raise FileNotFoundError(error_message)
except Exception as e:
logger.error(f"Error uploading audio file: {e}")
return None
# Generate the transcription
try:
response = model.generate_content([
prompt,
audio_file
])
# Check for valid response and extract text
if response and hasattr(response, 'text'):
transcript = response.text
logger.info(f"Transcription successful:\n{transcript}")
return transcript
else:
logger.warning("Transcription failed: Invalid or empty response from API.")
return None
except Exception as e:
logger.error(f"Error during transcription: {e}")
return None
except Exception as e:
logger.error(f"An unexpected error occurred: {e}")
return None
def summarize_audio(audio_file_path: str) -> Optional[str]:
"""
Summarizes the content of an audio file using Google's Gemini model.
Args:
audio_file_path (str): The path to the audio file to be summarized.
Returns:
str: A summary of the audio content.
Returns None if summarization fails.
"""
return transcribe_audio(audio_file_path, prompt="Please summarize the audio content:")
def analyze_audio_segment(audio_file_path: str, start_time: str, end_time: str) -> Optional[str]:
"""
Analyzes a specific segment of an audio file using timestamps.
Args:
audio_file_path (str): The path to the audio file.
start_time (str): Start time in MM:SS format.
end_time (str): End time in MM:SS format.
Returns:
str: Analysis of the specified audio segment.
Returns None if analysis fails.
"""
prompt = f"Analyze the audio content from {start_time} to {end_time}."
return transcribe_audio(audio_file_path, prompt=prompt)
def transcribe_with_timestamps(audio_file_path: str) -> Optional[str]:
"""
Transcribes audio with timestamps for each segment.
Args:
audio_file_path (str): The path to the audio file.
Returns:
str: Transcription with timestamps.
Returns None if transcription fails.
"""
return transcribe_audio(audio_file_path, prompt="Transcribe the audio with timestamps for each segment:")
def count_tokens(audio_file_path: str) -> Optional[int]:
"""
Counts the number of tokens in an audio file.
Args:
audio_file_path (str): The path to the audio file.
Returns:
int: Number of tokens in the audio file.
Returns None if counting fails.
"""
try:
# Load environment variables and configure the Google API
load_environment()
configure_google_api()
logger.info(f"Attempting to count tokens in audio file: {audio_file_path}")
# Check if file exists
if not os.path.exists(audio_file_path):
error_message = f"FileNotFoundError: The audio file at {audio_file_path} does not exist."
logger.error(error_message)
raise FileNotFoundError(error_message)
# Initialize a Gemini model
model = genai.GenerativeModel(model_name="gemini-1.5-flash")
# Upload the audio file
try:
audio_file = genai.upload_file(audio_file_path)
logger.info(f"Audio file uploaded successfully: {audio_file=}")
except Exception as e:
logger.error(f"Error uploading audio file: {e}")
return None
# Count tokens
try:
response = model.count_tokens([audio_file])
token_count = response.total_tokens
logger.info(f"Token count: {token_count}")
return token_count
except Exception as e:
logger.error(f"Error counting tokens: {e}")
return None
except Exception as e:
logger.error(f"An unexpected error occurred: {e}")
return None
def get_supported_formats() -> List[str]:
"""
Returns a list of supported audio formats.
Returns:
List[str]: List of supported MIME types.
"""
return [
"audio/wav",
"audio/mp3",
"audio/aiff",
"audio/aac",
"audio/ogg",
"audio/flac"
]
# Example usage
if __name__ == "__main__":
# Example 1: Basic transcription
audio_path = "path/to/your/audio.mp3"
transcript = transcribe_audio(audio_path)
print(f"Transcript: {transcript}")
# Example 2: Summarization
summary = summarize_audio(audio_path)
print(f"Summary: {summary}")
# Example 3: Analyze specific segment
segment_analysis = analyze_audio_segment(audio_path, "02:30", "03:29")
print(f"Segment Analysis: {segment_analysis}")
# Example 4: Transcription with timestamps
timestamped_transcript = transcribe_with_timestamps(audio_path)
print(f"Timestamped Transcript: {timestamped_transcript}")
# Example 5: Count tokens
token_count = count_tokens(audio_path)
print(f"Token Count: {token_count}")
# Example 6: Get supported formats
formats = get_supported_formats()
print(f"Supported Formats: {formats}")

View File

@@ -0,0 +1,218 @@
import os
import re
import sys
import tempfile
from pytubefix import YouTube
from loguru import logger
from openai import OpenAI
from tqdm import tqdm
import streamlit as st
from tenacity import (
retry,
stop_after_attempt,
wait_random_exponential,
) # for exponential backoff
from .gemini_audio_text import transcribe_audio
# Import APIKeyManager
from ...api_key_manager import APIKeyManager
def progress_function(stream, chunk, bytes_remaining):
# Calculate the percentage completion
current = ((stream.filesize - bytes_remaining) / stream.filesize)
progress_bar.update(current - progress_bar.n) # Update the progress bar
def rename_file_with_underscores(file_path):
"""Rename a file by replacing spaces and special characters with underscores.
Args:
file_path (str): The original file path.
Returns:
str: The new file path with underscores.
"""
# Extract the directory and the filename
dir_name, original_filename = os.path.split(file_path)
# Replace spaces and special characters with underscores in the filename
new_filename = re.sub(r'[^\w\-_\.]', '_', original_filename)
# Create the new file path
new_file_path = os.path.join(dir_name, new_filename)
# Rename the file
os.rename(file_path, new_file_path)
return new_file_path
@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def speech_to_text(video_url):
"""
Transcribes speech to text from a YouTube video URL using OpenAI's Whisper model.
Args:
video_url (str): URL of the YouTube video to transcribe.
output_path (str, optional): Directory where the audio file will be saved. Defaults to '.'.
Returns:
str: The transcribed text from the video.
Raises:
SystemExit: If a critical error occurs that prevents successful execution.
"""
output_path = os.getenv("CONTENT_SAVE_DIR")
yt = None
audio_file = None
with st.status("Started Writing..", expanded=False) as status:
try:
if video_url.startswith("https://www.youtube.com/") or video_url.startswith("http://www.youtube.com/"):
logger.info(f"Accessing YouTube URL: {video_url}")
status.update(label=f"Accessing YouTube URL: {video_url}")
try:
vid_id = video_url.split("=")[1]
yt = YouTube(video_url, on_progress_callback=progress_function)
except Exception as err:
logger.error(f"Failed to get pytube stream object: {err}")
st.stop()
logger.info(f"Fetching the highest quality audio stream:{yt.title}")
status.update(label=f"Fetching the highest quality audio stream: {yt.title}")
try:
audio_stream = yt.streams.filter(only_audio=True).first()
except Exception as err:
logger.error(f"Failed to Download Youtube Audio: {err}")
st.stop()
if audio_stream is None:
logger.warning("No audio stream found for this video.")
st.warning("No audio stream found for this video.")
st.stop()
logger.info(f"Downloading audio for: {yt.title}")
status.update(label=f"Downloading audio for: {yt.title}")
global progress_bar
progress_bar = tqdm(total=1.0, unit='iB', unit_scale=True, desc=yt.title)
try:
audio_filename = re.sub(r'[^\w\-_\.]', '_', yt.title) + '.mp4'
audio_file = audio_stream.download(
output_path=os.getenv("CONTENT_SAVE_DIR"),
filename=audio_filename)
#audio_file = rename_file_with_underscores(audio_file)
except Exception as err:
logger.error(f"Failed to download audio file: {audio_file}")
progress_bar.close()
logger.info(f"Audio downloaded: {yt.title} to {audio_file}")
status.update(label=f"Audio downloaded: {yt.title} to {output_path}")
# Audio filepath from local directory.
elif os.path.exists(audio_input):
audio_file = video_url
# Checking file size
max_file_size = 24 * 1024 * 1024 # 24MB
file_size = os.path.getsize(audio_file)
# Convert file size to MB for logging
file_size_MB = file_size / (1024 * 1024) # Convert bytes to MB
logger.info(f"Downloaded Audio Size is: {file_size_MB:.2f} MB")
status.update(label=f"Downloaded Audio Size is: {file_size_MB:.2f} MB")
if file_size > max_file_size:
logger.error("File size exceeds 24MB limit.")
# FIXME: We can chunk hour long videos, the code is not tested.
#long_video(audio_file)
sys.exit("File size limit exceeded.")
st.error("Audio File size limit exceeded. File a fixme/issues at ALwrity github.")
try:
print(f"Audio File: {audio_file}")
transcript = transcribe_audio(audio_file)
print(f"\n\n\n--- Tracribe: {transcript} ----\n\n\n")
exit(1)
status.update(label=f"Initializing OpenAI client for transcription: {audio_file}")
logger.info(f"Initializing OpenAI client for transcription: {audio_file}")
# Use APIKeyManager instead of direct environment variable access
api_key_manager = APIKeyManager()
api_key = api_key_manager.get_api_key("openai")
if not api_key:
raise ValueError("OpenAI API key not found. Please configure it in the onboarding process.")
client = OpenAI(api_key=api_key)
logger.info("Transcribing using OpenAI's Whisper model.")
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=open(audio_file, "rb"),
response_format="text"
)
logger.info(f"\nYouTube video transcription:\n{yt.title}\n{transcript}\n")
status.update(label=f"\nYouTube video transcription:\n{yt.title}\n{transcript}\n")
return transcript, yt.title
except Exception as e:
logger.error(f"Failed in Whisper transcription: {e}")
st.warning(f"Failed in Openai Whisper transcription: {e}")
transcript = transcribe_audio(audio_file)
print(f"\n\n\n--- Tracribe: {transcript} ----\n\n\n")
return transcript, yt.title
except Exception as e:
st.error(f"An error occurred during YouTube video processing: {e}")
finally:
try:
if os.path.exists(audio_file):
os.remove(audio_file)
logger.info("Temporary audio file removed.")
except PermissionError:
st.error(f"Permission error: Cannot remove '{audio_file}'. Please make sure of necessary permissions.")
except Exception as e:
st.error(f"An error occurred removing audio file: {e}")
def long_video(temp_file_name):
"""
Transcribes a YouTube video using OpenAI's Whisper API by processing the video in chunks.
This function handles videos longer than the context limit of the Whisper API by dividing the video into
10-minute segments, transcribing each segment individually, and then combining the results.
Key Changes and Notes:
1. Video Splitting: Splits the audio into 10-minute chunks using the moviepy library.
2. Chunk Transcription: Each audio chunk is transcribed separately and the results are concatenated.
3. Temporary Files for Chunks: Uses temporary files for each audio chunk for transcription.
4. Error Handling: Exception handling is included to capture and return any errors during the process.
5. Logging: Process steps are logged for debugging and monitoring.
6. Cleaning Up: Removes temporary files for both the entire video and individual audio chunks after processing.
Args:
video_url (str): URL of the YouTube video to be transcribed.
"""
# Extract audio and split into chunks
logger.info(f"Processing the YT video: {temp_file_name}")
full_audio = mp.AudioFileClip(temp_file_name)
duration = full_audio.duration
chunk_length = 600 # 10 minutes in seconds
chunks = [full_audio.subclip(start, min(start + chunk_length, duration)) for start in range(0, int(duration), chunk_length)]
combined_transcript = ""
for i, chunk in enumerate(chunks):
with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as audio_chunk_file:
chunk.write_audiofile(audio_chunk_file.name, codec="mp3")
with open(audio_chunk_file.name, "rb", encoding="utf-8") as audio_file:
# Transcribe each chunk using OpenAI's Whisper API
app.logger.info(f"Transcribing chunk {i+1}/{len(chunks)}")
transcript = openai.Audio.transcribe("whisper-1", audio_file)
combined_transcript += transcript['text'] + "\n\n"
# Remove the chunk audio file
os.remove(audio_chunk_file.name)

View File

@@ -0,0 +1,105 @@
"""DeepSeek Provider Service for ALwrity Backend.
This service handles DeepSeek API integrations,
migrated from the legacy lib/gpt_providers/text_generation/deepseek_text_gen.py
"""
import os
import json
import time
from typing import Dict, Any, Tuple
from loguru import logger
from tenacity import (
retry,
stop_after_attempt,
wait_random_exponential,
)
# Import APIKeyManager
from ..api_key_manager import APIKeyManager
try:
import openai
except ImportError:
openai = None
logger.warning("OpenAI library not available. Install with: pip install openai")
async def test_deepseek_api_key(api_key: str) -> Tuple[bool, str]:
"""
Test if the provided DeepSeek API key is valid.
Args:
api_key (str): The DeepSeek API key to test
Returns:
tuple[bool, str]: A tuple containing (is_valid, message)
"""
if not openai:
return False, "OpenAI library not available"
try:
# Create DeepSeek client with the provided key
client = openai.OpenAI(
api_key=api_key,
base_url="https://api.deepseek.com/v1"
)
# Try to generate a simple response as a test
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Hello"}],
max_tokens=10,
temperature=0.1
)
# If we get here, the key is valid
return True, "DeepSeek API key is valid"
except openai.AuthenticationError:
return False, "Invalid DeepSeek API key"
except openai.RateLimitError:
return False, "Rate limit exceeded. Please try again later."
except Exception as e:
return False, f"Error testing DeepSeek API key: {str(e)}"
@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def deepseek_text_response(prompt: str, model: str = "deepseek-chat",
temperature: float = 0.7, max_tokens: int = 4000,
system_prompt: str = None) -> str:
"""Get response from DeepSeek."""
if not openai:
logger.error("OpenAI library not available")
return "OpenAI library not available. Please install openai package."
try:
# Use APIKeyManager instead of direct environment variable access
api_key_manager = APIKeyManager()
api_key = api_key_manager.get_api_key("deepseek")
if not api_key:
raise ValueError("DeepSeek API key not found. Please configure it in the onboarding process.")
client = openai.OpenAI(
api_key=api_key,
base_url="https://api.deepseek.com/v1"
)
# Prepare messages
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": prompt})
response = client.chat.completions.create(
model=model,
messages=messages,
max_tokens=max_tokens,
temperature=temperature
)
logger.info(f"[deepseek_text_response] Generated response with {len(response.choices[0].message.content)} characters")
return response.choices[0].message.content
except Exception as err:
logger.error(f"Failed to get response from DeepSeek: {err}. Retrying.")
raise

View File

@@ -0,0 +1,232 @@
# Using Gemini Pro LLM model
import os
import sys
from pathlib import Path
import google.genai as genai
from google.genai import types
from dotenv import load_dotenv
load_dotenv(Path('../../../.env'))
from loguru import logger
logger.remove()
logger.add(sys.stdout,
colorize=True,
format="<level>{level}</level>|<green>{file}:{line}:{function}</green>| {message}"
)
from tenacity import (
retry,
stop_after_attempt,
wait_random_exponential,
)
import asyncio
import json
import re
# Configure standard logging
import logging
logging.basicConfig(level=logging.INFO, format='[%(asctime)s-%(levelname)s-%(module)s-%(lineno)d]- %(message)s')
logger = logging.getLogger(__name__)
@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def gemini_text_response(prompt, temperature, top_p, n, max_tokens, system_prompt):
""" Common functiont to get response from gemini pro Text. """
#FIXME: Include : https://github.com/google-gemini/cookbook/blob/main/quickstarts/rest/System_instructions_REST.ipynb
try:
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
except Exception as err:
logger.error(f"Failed to configure Gemini: {err}")
logger.info(f"Temp: {temperature}, MaxTokens: {max_tokens}, TopP: {top_p}, N: {n}")
# Set up AI model config
generation_config = {
"temperature": temperature,
"top_p": top_p,
"top_k": n,
"max_output_tokens": max_tokens,
}
# FIXME: Expose model_name in main_config
try:
response = client.models.generate_content(
model='gemini-2.5-pro',
contents=prompt,
config=types.GenerateContentConfig(
system_instruction=system_prompt,
max_output_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
top_k=n,
),
)
#logger.info(f"Number of Token in Prompt Sent: {model.count_tokens(prompt)}")
return response.text
except Exception as err:
logger.error(f"Failed to get response from Gemini: {err}. Retrying.")
#@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
#def gemini_blog_metadata_json(blog_content):
# """ Common functiont to get response from gemini pro Text. """
# prompt = f"I will provide you with the content of a blog post. Based on this content, you need to generate the following elements in JSON format:\n\n1. **Blog Title**: A compelling and relevant title that summarizes the blog content.\n2. **Meta Description**: A concise meta description (up to 160 characters) that captures the essence of the blog post and encourages clicks.\n3. **Tags**: A list of 5-10 relevant tags that represent the key topics covered in the blog post.\n4. **Categories**: A list of 1-3 appropriate categories that best describe the blog post's main themes.\n\nOutput your response in the following JSON format:\n\n```json\n{\n \"type\": \"object\",\n \"properties\": {\n \"blog_title\": {\n \"type\": \"string\"\n },\n \"meta_description\": {\n \"type\": \"string\"\n },\n \"tags\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"string\"\n }\n },\n \"categories\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"string\"\n }\n }\n }\n}\n\n. The Blog Content is given below: \n\n{blog_content}\n\n"
#
# try:
# genai.configure(api_key=os.getenv('GEMINI_API_KEY'))
# except Exception as err:
# logger.error(f"Failed to configure Gemini: {err}")
#
# # Create the model
# generation_config = {
# "temperature": 1,
# "top_p": 0.95,
# "top_k": 64,
# "max_output_tokens": 8192,
# "response_schema": content.Schema(
# type = content.Type.OBJECT,
# properties = {
# "response": content.Schema(
# type = content.Type.STRING,
# ),
# },
# ),
# "response_mime_type": "application/json",
# }
#
# model = genai.GenerativeModel(
# model_name="gemini-1.5-flash",
# generation_config=generation_config,
# # safety_settings = Adjust safety settings
# # See https://ai.google.dev/gemini-api/docs/safety-settings
# )
#
# try:
# # text_response = []
# response = model.generate_content(prompt)
# if response:
# logger.info(f"Number of Token in Prompt Sent: {model.count_tokens(prompt)}")
# return response.text
# except Exception as err:
# logger.error(f"Failed to get SEO METADATA from Gemini: {err}. Retrying.")
async def test_gemini_api_key(api_key: str) -> tuple[bool, str]:
"""
Test if the provided Gemini API key is valid.
Args:
api_key (str): The Gemini API key to test
Returns:
tuple[bool, str]: A tuple containing (is_valid, message)
"""
try:
# Configure Gemini with the provided key
genai.configure(api_key=api_key)
# Try to list models as a simple API test
models = genai.list_models()
# Check if Gemini Pro is available
if any(model.name == "gemini-pro" for model in models):
return True, "Gemini API key is valid"
else:
return False, "Gemini Pro model not available with this API key"
except Exception as e:
return False, f"Error testing Gemini API key: {str(e)}"
def gemini_pro_text_gen(prompt, temperature=0.7, top_p=0.9, top_k=40, max_tokens=2048):
"""
Generate text using Google's Gemini Pro model.
Args:
prompt (str): The input text to generate completion for
temperature (float, optional): Controls randomness. Defaults to 0.7
top_p (float, optional): Controls diversity. Defaults to 0.9
top_k (int, optional): Controls vocabulary size. Defaults to 40
max_tokens (int, optional): Maximum number of tokens to generate. Defaults to 2048
Returns:
str: The generated text completion
"""
try:
# Configure the model
model = genai.GenerativeModel('gemini-pro')
# Generate content
response = model.generate_content(
prompt,
generation_config=genai.types.GenerationConfig(
temperature=temperature,
top_p=top_p,
top_k=top_k,
max_output_tokens=max_tokens,
)
)
# Return the generated text
return response.text
except Exception as e:
logger.error(f"Error in Gemini Pro text generation: {e}")
return str(e)
def gemini_structured_json_response(prompt, schema, temperature=0.7, top_p=0.9, top_k=40, max_tokens=2048, system_prompt=None):
"""
Generate structured JSON response using Google's Gemini Pro model.
Args:
prompt (str): The input text to generate completion for
schema (dict): The JSON schema to follow for the response
temperature (float, optional): Controls randomness. Defaults to 0.7
top_p (float, optional): Controls diversity. Defaults to 0.9
top_k (int, optional): Controls vocabulary size. Defaults to 40
max_tokens (int, optional): Maximum number of tokens to generate. Defaults to 2048
system_prompt (str, optional): System instructions for the model
Returns:
dict: The generated structured JSON response
"""
try:
# Configure the model
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
# Set up generation config
generation_config = {
"temperature": temperature,
"top_p": top_p,
"top_k": top_k,
"max_output_tokens": max_tokens,
}
# Generate content with structured response
response = client.models.generate_content(
model='gemini-2.5-pro',
contents=prompt,
config=types.GenerateContentConfig(
system_instruction=system_prompt,
max_output_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
top_k=top_k,
response_mime_type='application/json',
response_schema=schema
),
)
# Parse the response
try:
# First try to get the parsed response
if hasattr(response, 'parsed'):
return response.parsed
# If parsed is not available, try to parse the text
response_text = response.text
return json.loads(response_text)
except json.JSONDecodeError as e:
logger.error(f"Error parsing JSON response: {e}")
return {"error": f"Failed to parse JSON response: {e}", "raw_response": response_text}
except Exception as e:
logger.error(f"Error in Gemini Pro structured JSON generation: {e}")
return {"error": str(e)}

View File

@@ -0,0 +1,125 @@
"""
Gemini Image Description Module
This module provides functionality to generate text descriptions of images using Google's Gemini API.
"""
import os
import sys
from pathlib import Path
import base64
from typing import Optional, Dict, Any, List, Union
from dotenv import load_dotenv
import google.genai as genai
from google.genai import types
from PIL import Image
from loguru import logger
logger.remove()
logger.add(sys.stdout,
colorize=True,
format="<level>{level}</level>|<green>{file}:{line}:{function}</green>| {message}"
)
# Import APIKeyManager
from ...api_key_manager import APIKeyManager
try:
import google.generativeai as genai
except ImportError:
genai = None
logger.warning("Google genai library not available. Install with: pip install google-generativeai")
def describe_image(image_path: str, prompt: str = "Describe this image in detail:") -> Optional[str]:
"""
Describe an image using Google's Gemini API.
Parameters:
image_path (str): Path to the image file.
prompt (str): Prompt for describing the image.
Returns:
Optional[str]: The generated description of the image, or None if an error occurs.
"""
try:
if not genai:
logger.error("Google genai library not available")
return None
# Use APIKeyManager instead of direct environment variable access
api_key_manager = APIKeyManager()
api_key = api_key_manager.get_api_key("gemini")
if not api_key:
error_message = "Gemini API key not found. Please configure it in the onboarding process."
logger.error(error_message)
raise ValueError(error_message)
# Check if image file exists
if not os.path.exists(image_path):
error_message = f"Image file not found: {image_path}"
logger.error(error_message)
raise FileNotFoundError(error_message)
# Initialize the Gemini client
client = genai.Client(api_key=api_key)
# Open and process the image
try:
image = Image.open(image_path)
logger.info(f"Successfully opened image: {image_path}")
except Exception as e:
error_message = f"Failed to open image: {e}"
logger.error(error_message)
return None
# Generate content description
try:
response = client.models.generate_content(
model='gemini-2.0-flash',
contents=[
prompt,
image
]
)
# Extract and return the text
description = response.text
logger.info(f"Successfully generated description for image: {image_path}")
return description
except Exception as e:
error_message = f"Failed to generate content: {e}"
logger.error(error_message)
return None
except Exception as e:
error_message = f"An unexpected error occurred: {e}"
logger.error(error_message)
return None
def analyze_image_with_prompt(image_path: str, prompt: str) -> Optional[str]:
"""
Analyze an image with a custom prompt using Google's Gemini API.
Parameters:
image_path (str): Path to the image file.
prompt (str): Custom prompt for analyzing the image.
Returns:
Optional[str]: The generated analysis of the image, or None if an error occurs.
"""
return describe_image(image_path, prompt)
# Example usage
if __name__ == "__main__":
# Example usage of the function
image_path = "path/to/your/image.jpg"
description = describe_image(image_path)
if description:
print(f"Image description: {description}")
else:
print("Failed to generate image description")

View File

@@ -0,0 +1,79 @@
"""
This module provides functionality to analyze images using OpenAI's Vision API.
It encodes an image to a base64 string and sends a request to the OpenAI API
to interpret the contents of the image, returning a textual description.
"""
import requests
import sys
import re
import base64
def analyze_and_extract_details_from_image(image_path, api_key):
"""
Analyzes an image using OpenAI's Vision API and extracts Alt Text, Description, Title, and Caption.
Args:
image_path (str): Path to the image file.
api_key (str): Your OpenAI API key.
Returns:
dict: Extracted details including Alt Text, Description, Title, and Caption.
"""
def encode_image(path):
""" Encodes an image to a base64 string. """
with open(path, "rb", encoding="utf-8") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
base64_image = encode_image(image_path)
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
payload = {
"model": "gpt-4-vision-preview",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "The given image is used in blog content. Analyze the given image and suggest alternative(alt) test, description, title, caption."
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
"max_tokens": 300
}
try:
response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
response.raise_for_status()
assistant_message = response.json()['choices'][0]['message']['content']
# Extracting details using regular expressions
alt_text_match = re.search(r'Alt Text: "(.*?)"', assistant_message)
description_match = re.search(r'Description: (.*?)\n\n', assistant_message)
title_match = re.search(r'Title: "(.*?)"', assistant_message)
caption_match = re.search(r'Caption: "(.*?)"', assistant_message)
return {
'alt_text': alt_text_match.group(1) if alt_text_match else None,
'description': description_match.group(1) if description_match else None,
'title': title_match.group(1) if title_match else None,
'caption': caption_match.group(1) if caption_match else None
}
except requests.RequestException as e:
sys.exit(f"Error: Failed to communicate with OpenAI API. Error: {e}")
except Exception as e:
sys.exit(f"Error occurred: {e}")

View File

@@ -0,0 +1,306 @@
"""Main Text Generation Service for ALwrity Backend.
This service provides the main LLM text generation functionality,
migrated from the legacy lib/gpt_providers/text_generation/main_text_generation.py
"""
import os
import json
from typing import Optional, Dict, Any
from loguru import logger
from ..api_key_manager import APIKeyManager
from .openai_provider import openai_chatgpt
from .gemini_provider import gemini_text_response, gemini_structured_json_response
from .anthropic_provider import anthropic_text_response
from .deepseek_provider import deepseek_text_response
def llm_text_gen(prompt: str, system_prompt: Optional[str] = None, json_struct: Optional[Dict[str, Any]] = None) -> str:
"""
Generate text using Language Model (LLM) based on the provided prompt.
Args:
prompt (str): The prompt to generate text from.
system_prompt (str, optional): Custom system prompt to use instead of the default one.
json_struct (dict, optional): JSON schema structure for structured responses.
Returns:
str: Generated text based on the prompt.
"""
try:
logger.info("[llm_text_gen] Starting text generation")
logger.debug(f"[llm_text_gen] Prompt length: {len(prompt)} characters")
# Initialize API key manager
api_key_manager = APIKeyManager()
# Set default values for LLM parameters
gpt_provider = "google" # Default to Google Gemini
model = "gemini-2.0-flash-001"
temperature = 0.7
max_tokens = 4000
top_p = 0.9
n = 1
fp = 16
frequency_penalty = 0.0
presence_penalty = 0.0
# Default blog characteristics
blog_tone = "Professional"
blog_demographic = "Professional"
blog_type = "Informational"
blog_language = "English"
blog_output_format = "markdown"
blog_length = 2000
# Try to get provider from environment or config
try:
# Check which providers have API keys available
available_providers = []
if api_key_manager.get_api_key("openai"):
available_providers.append("openai")
if api_key_manager.get_api_key("gemini"):
available_providers.append("google")
if api_key_manager.get_api_key("anthropic"):
available_providers.append("anthropic")
if api_key_manager.get_api_key("deepseek"):
available_providers.append("deepseek")
# Prefer Google Gemini if available, otherwise use first available
if "google" in available_providers:
gpt_provider = "google"
model = "gemini-2.0-flash-001"
elif available_providers:
gpt_provider = available_providers[0]
if gpt_provider == "openai":
model = "gpt-4o"
elif gpt_provider == "anthropic":
model = "claude-3-5-sonnet-20241022"
elif gpt_provider == "deepseek":
model = "deepseek-chat"
else:
logger.warning("[llm_text_gen] No API keys found, using mock response")
return _get_mock_response(prompt)
logger.debug(f"[llm_text_gen] Using provider: {gpt_provider}, model: {model}")
except Exception as err:
logger.warning(f"[llm_text_gen] Error determining provider, using defaults: {err}")
gpt_provider = "google"
model = "gemini-2.0-flash-001"
# Construct the system prompt if not provided
if system_prompt is None:
system_instructions = f"""You are a highly skilled content writer with a knack for creating engaging and informative content.
Your expertise spans various writing styles and formats.
Writing Style Guidelines:
- Tone: {blog_tone}
- Target Audience: {blog_demographic}
- Content Type: {blog_type}
- Language: {blog_language}
- Output Format: {blog_output_format}
- Target Length: {blog_length} words
Please provide responses that are:
- Well-structured and easy to read
- Engaging and informative
- Tailored to the specified tone and audience
- Professional yet accessible
- Optimized for the target content type
"""
else:
system_instructions = system_prompt
# Generate response based on provider
try:
if gpt_provider == "openai":
return openai_chatgpt(
prompt=prompt,
model=model,
temperature=temperature,
max_tokens=max_tokens,
top_p=top_p,
n=n,
fp=fp,
system_prompt=system_instructions
)
elif gpt_provider == "google":
if json_struct:
return gemini_structured_json_response(
prompt=prompt,
schema=json_struct,
temperature=temperature,
top_p=top_p,
top_k=n,
max_tokens=max_tokens,
system_prompt=system_instructions
)
else:
return gemini_text_response(
prompt=prompt,
temperature=temperature,
top_p=top_p,
n=n,
max_tokens=max_tokens,
system_prompt=system_instructions
)
elif gpt_provider == "anthropic":
return anthropic_text_response(
prompt=prompt,
model=model,
temperature=temperature,
max_tokens=max_tokens,
system_prompt=system_instructions
)
elif gpt_provider == "deepseek":
return deepseek_text_response(
prompt=prompt,
model=model,
temperature=temperature,
max_tokens=max_tokens,
system_prompt=system_instructions
)
else:
logger.error(f"[llm_text_gen] Unknown provider: {gpt_provider}")
return _get_mock_response(prompt)
except Exception as provider_error:
logger.error(f"[llm_text_gen] Provider {gpt_provider} failed: {str(provider_error)}")
# Try to fallback to another provider
fallback_providers = ["openai", "anthropic", "deepseek"]
for fallback_provider in fallback_providers:
if fallback_provider in available_providers and fallback_provider != gpt_provider:
try:
logger.info(f"[llm_text_gen] Trying fallback provider: {fallback_provider}")
if fallback_provider == "openai":
return openai_chatgpt(
prompt=prompt,
model="gpt-4o",
temperature=temperature,
max_tokens=max_tokens,
top_p=top_p,
n=n,
fp=fp,
system_prompt=system_instructions
)
elif fallback_provider == "anthropic":
return anthropic_text_response(
prompt=prompt,
model="claude-3-5-sonnet-20241022",
temperature=temperature,
max_tokens=max_tokens,
system_prompt=system_instructions
)
elif fallback_provider == "deepseek":
return deepseek_text_response(
prompt=prompt,
model="deepseek-chat",
temperature=temperature,
max_tokens=max_tokens,
system_prompt=system_instructions
)
except Exception as fallback_error:
logger.error(f"[llm_text_gen] Fallback provider {fallback_provider} also failed: {str(fallback_error)}")
continue
# If all providers fail, return mock response
logger.warning("[llm_text_gen] All providers failed, using mock response")
return _get_mock_response(prompt)
except Exception as e:
logger.error(f"[llm_text_gen] Error during text generation: {str(e)}")
return _get_mock_response(prompt)
def _get_mock_response(prompt: str) -> str:
"""Get a mock response when no API keys are available."""
logger.warning("[llm_text_gen] Using mock response - no API keys configured")
# Return a structured mock response for style detection
if "style analysis" in prompt.lower() or "writing style" in prompt.lower():
return json.dumps({
"writing_style": {
"tone": "professional",
"voice": "active",
"complexity": "moderate",
"engagement_level": "high"
},
"content_characteristics": {
"sentence_structure": "well-structured",
"vocabulary_level": "intermediate",
"paragraph_organization": "logical flow",
"content_flow": "smooth transitions"
},
"target_audience": {
"demographics": ["professionals", "business users"],
"expertise_level": "intermediate",
"industry_focus": "technology",
"geographic_focus": "global"
},
"content_type": {
"primary_type": "blog",
"secondary_types": ["article", "guide"],
"purpose": "inform",
"call_to_action": "moderate"
},
"recommended_settings": {
"writing_tone": "professional",
"target_audience": "business professionals",
"content_type": "blog",
"creativity_level": "medium",
"geographic_location": "global"
}
})
# Handle pattern analysis requests
if "pattern" in prompt.lower() or "recurring" in prompt.lower():
return json.dumps({
"patterns": {
"sentence_length": "medium",
"vocabulary_patterns": ["technical terms", "professional language"],
"rhetorical_devices": ["examples", "analogies"],
"paragraph_structure": "topic sentence followed by supporting details",
"transition_phrases": ["furthermore", "additionally", "however"]
},
"style_consistency": "high",
"unique_elements": ["clear structure", "professional tone", "evidence-based content"]
})
# Handle guidelines generation requests
if "guidelines" in prompt.lower() or "recommendations" in prompt.lower():
return json.dumps({
"guidelines": {
"tone_recommendations": ["maintain professional tone", "use clear language"],
"structure_guidelines": ["start with introduction", "use headings", "conclude with summary"],
"vocabulary_suggestions": ["avoid jargon", "use industry-specific terms appropriately"],
"engagement_tips": ["include examples", "use active voice", "ask questions"],
"audience_considerations": ["consider technical level", "provide context"]
},
"best_practices": ["research thoroughly", "cite sources", "update regularly"],
"avoid_elements": ["overly technical language", "long paragraphs", "passive voice"],
"content_strategy": "focus on providing value while maintaining professional credibility"
})
# Generic mock response for other content generation
return "This is a mock response. Please configure API keys for real content generation. To get started, visit the onboarding process and configure your AI provider API keys."
def check_gpt_provider(gpt_provider: str) -> bool:
"""Check if the specified GPT provider is supported."""
supported_providers = ["openai", "google", "anthropic", "deepseek"]
return gpt_provider in supported_providers
def get_api_key(gpt_provider: str) -> Optional[str]:
"""Get API key for the specified provider."""
try:
api_key_manager = APIKeyManager()
provider_mapping = {
"openai": "openai",
"google": "gemini",
"anthropic": "anthropic",
"deepseek": "deepseek"
}
mapped_provider = provider_mapping.get(gpt_provider, gpt_provider)
return api_key_manager.get_api_key(mapped_provider)
except Exception as e:
logger.error(f"[get_api_key] Error getting API key for {gpt_provider}: {str(e)}")
return None

View File

@@ -0,0 +1,133 @@
"""OpenAI Provider Service for ALwrity Backend.
This service handles OpenAI API integrations,
migrated from the legacy lib/gpt_providers/text_generation/openai_text_gen.py
"""
import os
import time
import openai
import asyncio
from typing import Tuple
from loguru import logger
from tenacity import (
retry,
stop_after_attempt,
wait_random_exponential,
)
# Import APIKeyManager
from ..api_key_manager import APIKeyManager
async def test_openai_api_key(api_key: str) -> Tuple[bool, str]:
"""
Test if the provided OpenAI API key is valid.
Args:
api_key (str): The OpenAI API key to test
Returns:
tuple[bool, str]: A tuple containing (is_valid, message)
"""
try:
# Create OpenAI client with the provided key
client = openai.OpenAI(api_key=api_key)
# Try to list models as a simple API test
models = client.models.list()
# If we get here, the key is valid
return True, "OpenAI API key is valid"
except openai.AuthenticationError:
return False, "Invalid OpenAI API key"
except openai.RateLimitError:
return False, "Rate limit exceeded. Please try again later."
except Exception as e:
return False, f"Error testing OpenAI API key: {str(e)}"
@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def openai_chatgpt(prompt: str, model: str = "gpt-4o", temperature: float = 0.7,
max_tokens: int = 4000, top_p: float = 0.9, n: int = 1,
fp: int = 16, system_prompt: str = None) -> str:
"""
Wrapper function for OpenAI's ChatGPT completion.
Args:
prompt (str): The input text to generate completion for.
model (str, optional): Model to be used for the completion. Defaults to "gpt-4o".
temperature (float, optional): Controls randomness. Lower values make responses more deterministic. Defaults to 0.7.
max_tokens (int, optional): Maximum number of tokens to generate. Defaults to 4000.
top_p (float, optional): Controls diversity. Defaults to 0.9.
n (int, optional): Number of completions to generate. Defaults to 1.
fp (int, optional): Frequency penalty. Defaults to 16.
system_prompt (str, optional): System prompt for the conversation. Defaults to None.
Returns:
str: The generated text completion.
Raises:
SystemExit: If an API error, connection error, or rate limit error occurs.
"""
# Wait for 5 seconds to comply with rate limits
for _ in range(5):
time.sleep(1)
try:
# Create variables to collect the stream of chunks
collected_chunks = []
collected_messages = []
full_reply_content = None
# Use APIKeyManager instead of direct environment variable access
api_key_manager = APIKeyManager()
api_key = api_key_manager.get_api_key("openai")
if not api_key:
raise ValueError("OpenAI API key not found. Please configure it in the onboarding process.")
client = openai.OpenAI(api_key=api_key)
# Prepare messages
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": prompt})
response = client.chat.completions.create(
model=model,
messages=messages,
max_tokens=max_tokens,
n=n,
top_p=top_p,
stream=True,
frequency_penalty=fp,
temperature=temperature
)
# Iterate through the stream of events
for chunk in response:
collected_chunks.append(chunk) # save the event response
chunk_message = chunk.choices[0].delta.content # extract the message
collected_messages.append(chunk_message) # save the message
print(chunk.choices[0].delta.content, end="", flush=True)
# Clean None in collected_messages
collected_messages = [m for m in collected_messages if m is not None]
full_reply_content = ''.join([m for m in collected_messages])
logger.info(f"[openai_chatgpt] Generated response with {len(full_reply_content)} characters")
return full_reply_content
except openai.APIError as e:
logger.error(f"OpenAI API Error: {e}")
raise SystemExit from e
except openai.RateLimitError as e:
logger.error(f"OpenAI Rate Limit Error: {e}")
raise SystemExit from e
except openai.APIConnectionError as e:
logger.error(f"OpenAI API Connection Error: {e}")
raise SystemExit from e
except Exception as e:
logger.error(f"Unexpected error in OpenAI API call: {e}")
raise SystemExit from e

View File

@@ -0,0 +1,56 @@
from openai import OpenAI
from loguru import logger
import sys
from .save_image import save_generated_image
from tenacity import (
retry,
stop_after_attempt,
wait_random_exponential,
) # for exponential backoff
@retry(wait=wait_random_exponential(min=1, max=120), stop=stop_after_attempt(6))
def generate_dalle3_images(img_prompt, image_dir, size="1024x1024", quality="hd", n=1):
"""
Generates images using the DALL-E 3 model based on a given text prompt.
Args:
img_prompt (str): Text prompt to generate the image.
image_dir (str): Directory where the generated image will be saved.
size (str, optional): Size of the generated images. Defaults to "1024x1024".
quality (str, optional): Quality of the generated images. Defaults to "hd".
n (int, optional): Number of images to generate. Defaults to 1.
Returns:
str: Path to the saved image.
Raises:
SystemExit: If an error occurs in image generation or saving.
"""
try:
logger.info("Generating Dall-e-3 image for the blog.")
client = OpenAI()
img_generation_response = client.images.generate(
model="dall-e-3",
prompt=img_prompt,
size=size,
quality=quality,
n=n
)
# Save the generated image locally.
try:
img_path = save_generated_image(img_generation_response, image_dir)
return img_path
except Exception as err:
logger.error(f"Failed to Save generated image: {err}")
except openai.OpenAIError as e:
logger.error(f"Dalle-3 image generation error: HTTP Status {e.http_status}, Error: {e.error}")
sys.exit("Exiting due to Dalle-3 image generation error.")
except Exception as e:
logger.error(f"Failed to generate images with Dalle3: {e}")
sys.exit("Exiting due to a general error in image generation.")

View File

@@ -0,0 +1,53 @@
from openai import OpenAI
from loguru import logger
import sys
from tenacity import (
retry,
stop_after_attempt,
wait_random_exponential,
) # for exponential backoff
from .save_image import save_generated_image
@retry(wait=wait_random_exponential(min=1, max=120), stop=stop_after_attempt(6))
def generate_dalle3_images(img_prompt, image_dir, size="1024x1024", quality="hd", n=1):
"""
Generates images using the DALL-E 3 model based on a given text prompt.
Args:
img_prompt (str): Text prompt to generate the image.
image_dir (str): Directory where the generated image will be saved.
size (str, optional): Size of the generated images. Defaults to "1024x1024".
quality (str, optional): Quality of the generated images. Defaults to "hd".
n (int, optional): Number of images to generate. Defaults to 1.
Returns:
str: Path to the saved image.
Raises:
SystemExit: If an error occurs in image generation or saving.
"""
try:
logger.info("Generating Dall-e-3 image for the blog.")
client = OpenAI()
img_generation_response = client.images.generate(
model="dall-e-3",
prompt=img_prompt,
size=size,
quality=quality,
n=n
)
img_path = save_generated_image(img_generation_response, image_dir)
return img_path
except openai.OpenAIError as e:
logger.error(f"Dalle-3 image generation error: HTTP Status {e.http_status}, Error: {e.error}")
sys.exit("Exiting due to Dalle-3 image generation error.")
except Exception as e:
logger.error(f"Failed to generate images with Dalle3: {e}")
sys.exit("Exiting due to a general error in image generation.")

View File

@@ -0,0 +1,421 @@
import os
import sys
import time
import datetime
import streamlit as st
from PIL import Image
from io import BytesIO
from loguru import logger
from tenacity import retry, stop_after_attempt, wait_random_exponential
# Import APIKeyManager
from ...api_key_manager import APIKeyManager
try:
import google.generativeai as genai
from google.generativeai import types
except ImportError:
genai = None
logger.warning("Google genai library not available. Install with: pip install google-generativeai")
from .save_image import save_generated_image
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger('gemini_image_generator')
# With image generation in Gemini, your imagination is the limit.
# If what you see doesn't quite match what you had in mind, try adding more details to the prompt.
# The more specific you are, the better Gemini can create images that reflect your vision.
# Generate images using Gemini
# Gemini 2.0 Flash Experimental supports the ability to output text and inline images.
# This lets you use Gemini to conversationally edit images or generate outputs with interwoven text (for example, generating a blog post with text and images in a single turn).
# Note: Make sure to include responseModalities: ["Text", "Image"] in your generation configuration for text and image output with gemini-2.0-flash-exp-image-generation. Image only is not allowed.
class AIPromptGenerator:
"""
Generates enhanced AI image prompts based on user keywords,
following the guidelines of the Imagen documentation.
"""
def __init__(self):
self.photography_styles = ["photo", "photograph"]
self.art_styles = ["painting", "sketch", "drawing", "illustration", "digital art", "render"]
self.art_techniques = ["technical pencil drawing", "charcoal drawing", "color pencil drawing", "pastel painting", "digital art", "art deco (poster)", "impressionist painting", "renaissance painting", "pop art"]
self.camera_proximity = ["close-up", "zoomed out", "taken from far away"]
self.camera_position = ["aerial", "from below"]
self.lighting = ["natural lighting", "dramatic lighting", "warm lighting", "cold lighting", "studio lighting", "golden hour lighting"]
self.camera_settings = ["motion blur", "soft focus", "bokeh", "portrait"]
self.lens_types = ["35mm lens", "50mm lens", "fisheye lens", "wide angle lens", "macro lens", "telephoto lens"]
self.film_types = ["black and white film", "polaroid"]
self.materials = ["made of cheese", "made of paper", "made of neon tubes", "metallic", "glass", "wooden", "stone"]
self.shapes = ["in the shape of a bird", "angular", "curved", "geometric"]
self.quality_modifiers_general = ["high-quality", "beautiful", "stylized", "detailed", "epic", "grand"]
self.quality_modifiers_photo = ["4K", "HDR", "studio photo", "professional photo", "photorealistic"]
self.quality_modifiers_art = ["by a professional artist", "intricate details", "masterpiece"]
self.aspect_ratios = ["1:1 aspect ratio", "4:3 aspect ratio", "3:4 aspect ratio", "16:9 aspect ratio", "9:16 aspect ratio"]
self.photorealistic_modifiers = {
"portraits": ["prime lens", "zoom lens", "24-35mm", "black and white film", "film noir", "shallow depth of field", "duotone (mention two colors)"],
"objects": ["macro lens", "60-105mm", "high detail", "precise focusing", "controlled lighting"],
"motion": ["telephoto zoom lens", "100-400mm", "fast shutter speed", "action shot", "movement tracking"],
"wide-angle": ["wide-angle lens", "10-24mm", "long exposure", "sharp focus", "smooth water or clouds", "astro photography"]
}
def generate_prompt(self, keywords):
"""
Generates an enhanced AI image prompt based on user-provided keywords.
Args:
keywords (list): A list of keywords describing the desired image.
Returns:
str: An enhanced AI image prompt.
"""
if not keywords:
return "A beautiful image."
prompt_parts = []
subject = " ".join(keywords)
prompt_parts.append(subject)
# Add context and background (optional)
context_options = ["in a detailed background", "outdoors", "indoors", "in a studio", "with a blurred background"]
if random.random() < 0.6: # Add context with a probability
prompt_parts.append(random.choice(context_options))
# Add style (optional)
style_options = self.photography_styles + [f"{art} of" for art in self.art_styles]
if random.random() < 0.7:
prompt_parts.insert(0, random.choice(style_options))
if prompt_parts[0].startswith("painting of") or prompt_parts[0].startswith("sketch of") or prompt_parts[0].startswith("drawing of"):
if random.random() < 0.5:
prompt_parts.append(f"in the style of {random.choice(self.art_techniques)}")
# Add photography modifiers (if photography style is chosen)
if any(style in prompt_parts[0] for style in self.photography_styles):
if random.random() < 0.4:
prompt_parts.append(random.choice(self.camera_proximity))
if random.random() < 0.3:
prompt_parts.append(random.choice(self.camera_position))
if random.random() < 0.5:
prompt_parts.append(random.choice(self.lighting))
if random.random() < 0.3:
prompt_parts.append(random.choice(self.camera_settings))
if random.random() < 0.2:
prompt_parts.append(random.choice(self.lens_types))
if random.random() < 0.1:
prompt_parts.append(random.choice(self.film_types))
# Add shapes and materials (optional)
if random.random() < 0.3:
prompt_parts.append(random.choice(self.materials))
if random.random() < 0.2:
prompt_parts.append(random.choice(self.shapes))
# Add quality modifiers (optional)
if random.random() < 0.6:
quality_options = self.quality_modifiers_general
if any(style in prompt_parts[0] for style in self.photography_styles):
quality_options += self.quality_modifiers_photo
else:
quality_options += self.quality_modifiers_art
prompt_parts.append(random.choice(list(set(quality_options)))) # Avoid duplicates
# Add aspect ratio (optional)
if random.random() < 0.2:
prompt_parts.append(random.choice(self.aspect_ratios))
return ", ".join(prompt_parts)
def generate_photorealistic_prompt(self, keywords, focus=""):
"""
Generates an enhanced AI image prompt specifically for photorealistic images.
Args:
keywords (list): A list of keywords describing the desired image.
focus (str, optional): The focus of the photorealistic image (e.g., "portraits", "objects", "motion", "wide-angle"). Defaults to "".
Returns:
str: An enhanced photorealistic AI image prompt.
"""
if not keywords:
return "A photorealistic image."
prompt_parts = ["A photo of", "photorealistic"]
prompt_parts.append(" ".join(keywords))
if focus and focus in self.photorealistic_modifiers:
modifiers = self.photorealistic_modifiers[focus]
if modifiers:
num_modifiers = random.randint(1, min(3, len(modifiers)))
selected_modifiers = random.sample(modifiers, num_modifiers)
prompt_parts.extend(selected_modifiers)
# Add general quality modifiers
if random.random() < 0.5:
prompt_parts.append(random.choice(self.quality_modifiers_photo))
# Add lighting
if random.random() < 0.4:
prompt_parts.append(random.choice(self.lighting))
return ", ".join(prompt_parts)
def generate_gemini_image(prompt, keywords=None, style=None, focus=None, enhance_prompt=True, max_retries=3, initial_retry_delay=2, aspect_ratio="16:9"):
"""
Generate an image using Gemini's image generation capabilities.
Args:
prompt (str): The text prompt for image generation
keywords (list, optional): Keywords to enhance the prompt
style (str, optional): Style of the image (photorealistic, artistic, etc.)
focus (str, optional): Focus area for photorealistic images
enhance_prompt (bool, optional): Whether to enhance the prompt with AI
max_retries (int, optional): Maximum number of retry attempts
initial_retry_delay (int, optional): Initial delay between retries
aspect_ratio (str, optional): Aspect ratio for the generated image
Returns:
str: The path to the generated image.
"""
logger.info(f"Generating image with prompt: '{prompt[:100]}...'")
# Use APIKeyManager instead of direct environment variable access
api_key_manager = APIKeyManager()
api_key = api_key_manager.get_api_key("gemini")
if not api_key:
error_msg = "Gemini API key not found. Please configure it in the onboarding process."
logger.error(error_msg)
st.error(f"🔑 {error_msg}")
return None
# Enhance the prompt if requested
if enhance_prompt and keywords:
prompt_generator = AIPromptGenerator()
if style == "photorealistic" and focus:
logger.info(f"Generating photorealistic prompt with focus: {focus}")
enhanced_prompt = prompt_generator.generate_photorealistic_prompt(keywords, focus)
else:
logger.info("Generating enhanced prompt")
enhanced_prompt = prompt_generator.generate_prompt(keywords)
# Combine the enhanced prompt with the original prompt
prompt = f"{prompt}\n\nEnhanced prompt: {enhanced_prompt}"
logger.info(f"Final prompt: '{prompt[:100]}...'")
# Add aspect ratio to the prompt
if aspect_ratio:
prompt += f"\n\nPlease generate the image with {aspect_ratio} aspect ratio."
retry_count = 0
retry_delay = initial_retry_delay
while retry_count <= max_retries:
try:
client = genai.Client(api_key=api_key)
contents = (prompt)
logger.info("Sending request to Gemini API")
response = client.models.generate_content(
model="gemini-2.0-flash-exp-image-generation",
contents=contents,
config=types.GenerateContentConfig(
response_modalities=['Text', 'Image']
)
)
logger.info("Received response from Gemini API")
img_name = None
for part in response.candidates[0].content.parts:
if part.text is not None:
logger.info(f"Received text response: '{part.text[:100]}...'")
print(part.text)
elif part.inline_data is not None:
logger.info("Received image data from Gemini")
image = Image.open(BytesIO((part.inline_data.data)))
# Resize image to match aspect ratio if needed
if aspect_ratio:
current_width, current_height = image.size
target_width = current_width
target_height = current_height
# Calculate target dimensions based on aspect ratio
if aspect_ratio == "16:9":
target_height = int(current_width * 9/16)
elif aspect_ratio == "9:16":
target_width = int(current_height * 9/16)
elif aspect_ratio == "4:3":
target_height = int(current_width * 3/4)
elif aspect_ratio == "3:4":
target_width = int(current_height * 3/4)
elif aspect_ratio == "1:1":
target_size = min(current_width, current_height)
target_width = target_size
target_height = target_size
logger.info(f"Resizing image from {current_width}x{current_height} to {target_width}x{target_height}")
# Create a new image with the target dimensions
resized_image = Image.new('RGB', (target_width, target_height), (255, 255, 255))
# Calculate position to paste the original image
paste_x = (target_width - current_width) // 2
paste_y = (target_height - current_height) // 2
# Paste the original image onto the new canvas
resized_image.paste(image, (paste_x, paste_y))
image = resized_image
if part.text is not None:
img_name = f'{part.text}-gemini-native-image.png'
else:
img_name = f'gemini-native-image-{datetime.datetime.now().strftime("%Y%m%d-%H%M%S")}.png'
try:
logger.info(f"Saving image to: {img_name}")
image.save(img_name)
# Create a dictionary with the expected format for save_generated_image
img_response = {
"artifacts": [
{
"base64": base64.b64encode(open(img_name, "rb").read()).decode('utf-8')
}
]
}
# Call save_generated_image with the correct format
save_generated_image(img_response)
except Exception as err:
logger.error(f"Failed to save image: {err}")
st.error(f"Failed to save image: {err}")
logger.info(f"Image generation completed. Image name: {img_name}")
return img_name
except Exception as err:
error_message = str(err)
logger.error(f"Error in generate_gemini_image: {err}")
# Check if this is a 503 UNAVAILABLE error
if "503 UNAVAILABLE" in error_message and retry_count < max_retries:
retry_count += 1
logger.info(f"Model is overloaded. Retrying in {retry_delay} seconds (attempt {retry_count}/{max_retries})")
st.warning(f"The image generation service is currently busy. Retrying in {retry_delay} seconds...")
time.sleep(retry_delay)
# Exponential backoff
retry_delay *= 2
else:
st.error(f"Error generating image: {err}")
return None
# If we've exhausted all retries
st.error("The image generation service is currently unavailable. Please try again later.")
return None
def edit_image(image_path, prompt, max_retries=3, initial_retry_delay=2):
"""
- Image editing (text and image to image)
Example prompt: "Edit this image to make it look like a cartoon"
Example prompt: [image of a cat] + [image of a pillow] + "Create a cross stitch of my cat on this pillow."
- Multi-turn image editing (chat)
Example prompts: [upload an image of a blue car.] "Turn this car into a convertible." "Now change the color to yellow."
Image editing with Gemini
To perform image editing, add an image as input.
The following example demonstrats uploading base64 encoded images.
For multiple images and larger payloads, check the image input section.
Args:
image_path (str): The path to the image to edit.
prompt (str): The prompt to edit the image with.
max_retries (int, optional): Maximum number of retry attempts for handling 503 errors. Defaults to 3.
initial_retry_delay (int, optional): Initial delay in seconds before retrying. Defaults to 2.
Returns:
str: The path to the edited image.
"""
import PIL.Image
image = PIL.Image.open(image_path)
retry_count = 0
retry_delay = initial_retry_delay
while retry_count <= max_retries:
try:
client = genai.Client()
text_input = (prompt)
logger.info("Sending request to Gemini API for image editing")
response = client.models.generate_content(
model="gemini-2.0-flash-exp-image-generation",
contents=[text_input, image],
config=types.GenerateContentConfig(
response_modalities=['Text', 'Image']
)
)
logger.info("Received response from Gemini API for image editing")
edited_img_name = None
for part in response.candidates[0].content.parts:
if part.text is not None:
logger.info(f"Received text response: '{part.text[:100]}...'")
st.write(part.text)
elif part.inline_data is not None:
logger.info("Received edited image data from Gemini")
edited_image = Image.open(BytesIO(part.inline_data.data))
edited_image.show()
# Save the edited image
edited_img_name = f'edited-{os.path.basename(image_path)}'
try:
logger.info(f"Saving edited image to: {edited_img_name}")
edited_image.save(edited_img_name)
# Create a dictionary with the expected format for save_generated_image
img_response = {
"artifacts": [
{
"base64": base64.b64encode(open(edited_img_name, "rb").read()).decode('utf-8')
}
]
}
# Call save_generated_image with the correct format
save_generated_image(img_response)
except Exception as err:
logger.error(f"Failed to save edited image: {err}")
st.error(f"Failed to save edited image: {err}")
logger.info(f"Image editing completed. Edited image name: {edited_img_name}")
return edited_img_name
except Exception as err:
error_message = str(err)
logger.error(f"Error in edit_image: {err}")
# Check if this is a 503 UNAVAILABLE error
if "503 UNAVAILABLE" in error_message and retry_count < max_retries:
retry_count += 1
logger.info(f"Model is overloaded. Retrying in {retry_delay} seconds (attempt {retry_count}/{max_retries})")
st.warning(f"The image editing service is currently busy. Retrying in {retry_delay} seconds...")
time.sleep(retry_delay)
# Exponential backoff
retry_delay *= 2
else:
st.error(f"Error editing image: {err}")
return None
# If we've exhausted all retries
st.error("The image editing service is currently unavailable. Please try again later.")
return None

View File

@@ -0,0 +1,69 @@
# Ensure you sign up for an account to obtain an API key:
# https://platform.stability.ai/
# Your API key can be found here after account creation:
# https://platform.stability.ai/account/keys
import os
import requests
import base64
from PIL import Image
from io import BytesIO
import streamlit as st
from loguru import logger
# Import APIKeyManager
from ...api_key_manager import APIKeyManager
def save_generated_image(data):
"""Save the generated image to a file."""
# Implementation for saving image
pass
def generate_stable_diffusion_image(prompt):
engine_id = "stable-diffusion-xl-1024-v1-0"
api_host = os.getenv('API_HOST', 'https://api.stability.ai')
# Use APIKeyManager instead of direct environment variable access
api_key_manager = APIKeyManager()
api_key = api_key_manager.get_api_key("stability")
if api_key is None:
st.warning("Missing Stability API key. Please configure it in the onboarding process.")
return None
response = requests.post(
f"{api_host}/v1/generation/{engine_id}/text-to-image",
headers={
"Content-Type": "application/json",
"Accept": "application/json",
"Authorization": f"Bearer {api_key}"
},
json={
"text_prompts": [
{
"text": prompt
}
],
"cfg_scale": 7,
"height": 1024,
"width": 1024,
"samples": 1,
"steps": 30,
},
)
if response.status_code != 200:
raise Exception("Non-200 response: " + str(response.text))
data = response.json()
img_path = save_generated_image(data)
for i, image in enumerate(data["artifacts"]):
# Decode base64 image data
img_data = base64.b64decode(image["base64"])
# Open image using PIL
img = Image.open(BytesIO(img_data))
# Display the image
img.show()
return img_path

View File

@@ -0,0 +1,51 @@
from loguru import logger
import sys
from PIL import Image
from openai import OpenAI
def gen_new_from_given_img(img_path, image_dir, num_img=1, img_size="1024x1024", response_format="url"):
"""
Generates variations of a given image using OpenAI's image variation API.
This function takes an existing image, processes it, and generates a specified number of new images based on it.
These generated images are variations of the original, providing creative flexibility.
Args:
img_path (str): Path to the original image file.
image_dir (str): Directory where the generated images will be saved.
num_img (int, optional): Number of image variations to generate. Defaults to 1.
img_size (str, optional): Size of the generated images. Defaults to "1024x1024".
response_format (str, optional): Format in which the generated images are returned. Defaults to "url".
Returns:
str: Path to the saved image variation.
Raises:
SystemExit: If a critical error occurs that prevents successful execution.
"""
try:
logger.info(f"Starting image variation generation for: {img_path}")
# Convert and prepare the image
png = Image.open(img_path).convert('RGBA')
background = Image.new('RGBA', png.size, (255, 255, 255))
alpha_composite = Image.alpha_composite(background, png)
alpha_composite.save(img_path, 'PNG', quality=80)
logger.info("Image prepared for variation generation.")
client = OpenAI()
variation_response = client.images.create_variation(
image=open(img_path, "rb", encoding="utf-8"),
n=num_img,
size=img_size,
response_format=response_format
)
# Saving the generated image
generated_image_path = save_generated_image(variation_response, image_dir)
logger.info(f"Image variation generated and saved to: {generated_image_path}")
return generated_image_path
except Exception as e:
logger.error(f"Error occurred during image variation generation: {e}")
sys.exit(f"Exiting due to critical error: {e}")

View File

@@ -0,0 +1,163 @@
#########################################################
#
# This module will generate images for the blogs using APIs
# from Dall-E and other free resources. Given a prompt, the
# images will be stored in local directory.
# Required: openai API key.
#
#########################################################
# imports
import os
import sys
import datetime
import streamlit as st
import openai # OpenAI Python library to make API calls
from loguru import logger
logger.remove()
logger.add(sys.stdout,
colorize=True,
format="<level>{level}</level>|<green>{file}:{line}:{function}</green>| {message}"
)
#from .gen_dali2_images
from .gen_dali3_images import generate_dalle3_images
from .gen_stabl_diff_img import generate_stable_diffusion_image
from ..text_generation.main_text_generation import llm_text_gen
from .gen_gemini_images import generate_gemini_image
def generate_image(user_prompt, title=None, description=None, tags=None, content=None, aspect_ratio="16:9"):
"""
The generation API endpoint creates an image based on a text prompt.
Required inputs:
prompt (str): A text description of the desired image(s). The maximum length is 1000 characters.
Optional inputs:
--> image_engine: dalle2, dalle3, stable diffusion are supported.
--> num_images (int): The number of images to generate. Must be between 1 and 10. Defaults to 1.
--> size (str): The size of the generated images. Must be one of "256x256", "512x512", or "1024x1024".
Smaller images are faster. Defaults to "1024x1024".
-->response_format (str): The format in which the generated images are returned.
Must be one of "url" or "b64_json". Defaults to "url".
--> user (str): A unique identifier representing your end-user, which will help OpenAI to monitor and detect abuse.
--> aspect_ratio (str): The aspect ratio for the generated image. Must be one of "16:9", "4:3", or "1:1". Defaults to "16:9".
"""
# FIXME: Need to remove default value to match sidebar input.
image_engine = 'Gemini-AI'
image_stored_at = None
if user_prompt:
try:
# Use enhanced prompt generator with all available parameters
img_prompt = generate_enhanced_img_prompt(user_prompt, title, description, tags, content)
# Add aspect ratio to the prompt
if aspect_ratio:
img_prompt += f"\n\nAspect ratio: {aspect_ratio}"
if 'Dalle3' in image_engine:
logger.info(f"Calling Dalle3 text-to-image with prompt: {img_prompt}")
image_stored_at = generate_dalle3_images(img_prompt)
elif 'Stability-AI' in image_engine:
logger.info(f"Calling Stable diffusion text-to-image with prompt: \n{img_prompt}")
image_stored_at = generate_stable_diffusion_image(img_prompt)
elif 'Gemini-AI' in image_engine:
logger.info(f"Calling Gemini text-to-image with prompt: \n{img_prompt}")
image_stored_at = generate_gemini_image(img_prompt, aspect_ratio=aspect_ratio)
return image_stored_at
except Exception as err:
logger.error(f"Failed to generate Image: {err}")
st.warning(f"Failed to generate Image: {err}")
else:
logger.error("Skipping Image creation, No prompt provided.")
def generate_img_prompt(user_prompt):
"""
Given prompt, this functions generated a prompt for image generation.
"""
prompt = f"""
As an expert prompt generator for AI text to image models and artist, I will provide you with 'user text' for creating images.
Your task is to create a prompt for a highly relevant image from given 'user text'.
\n
Choose from various art styles, utilize light & shadow effects etc.
Make sure to avoid common image generation mistakes.
Reply with only one answer, no descrition and in plaintext.
Make sure your prompt is detailed and creative descriptions that will inspire unique and interesting images from the AI.
\n\nuser text:
'''{user_prompt}'''"""
response = llm_text_gen(prompt)
return response
def generate_enhanced_img_prompt(user_prompt, title=None, description=None, tags=None, content=None):
"""
Given user prompt and additional context (title, description, tags, content),
this function generates an enhanced prompt for better image generation.
Args:
user_prompt (str): Base prompt from the user
title (str, optional): Blog title or content title
description (str, optional): Blog or content description/summary
tags (list, optional): List of tags related to the content
content (str, optional): Actual content or excerpt
Returns:
str: Enhanced prompt for image generation
"""
# Start with the base prompt
context_parts = [user_prompt]
# Add relevant context if available
if title:
context_parts.append(f"Title: {title}")
if description:
context_parts.append(f"Description: {description}")
if tags and len(tags) > 0:
tag_text = ", ".join(tags[:5]) # Limit to 5 tags to avoid too much noise
context_parts.append(f"Tags: {tag_text}")
# Create a combined context
combined_context = "\n".join(context_parts)
# Add some content excerpt if available (limited to avoid token limits)
content_excerpt = ""
if content:
# Just use the first few hundred characters as excerpt
content_excerpt = content[:300] + "..." if len(content) > 300 else content
# Create the prompt for LLM
prompt = f"""
As an expert prompt engineer for AI image generation models, create a detailed, creative prompt
for generating a high-quality, relevant image based on the following context:
{combined_context}
Additional content excerpt:
{content_excerpt}
Your task is to:
1. Analyze the context and content to understand the main theme and subject
2. Create a rich, detailed prompt for image generation (50-75 words)
3. Include specific visual details, art style, mood, lighting, composition
4. Make sure the prompt is highly relevant to the original context
5. Avoid prohibited content or anything that violates image generation guidelines
Reply with ONLY the final prompt. No explanations or other text.
"""
# Generate the enhanced prompt
try:
enhanced_prompt = llm_text_gen(prompt)
logger.info(f"Generated enhanced image prompt: {enhanced_prompt[:100]}...")
return enhanced_prompt
except Exception as e:
logger.error(f"Error generating enhanced prompt: {e}")
# Fall back to the simple prompt generation if enhanced fails
return generate_img_prompt(user_prompt)

View File

@@ -0,0 +1,39 @@
import base64
import datetime
import os
import requests
from PIL import Image
import logging
def save_generated_image(img_generation_response):
"""
Save generated images for blog, ensuring unique names for SEO.
"""
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Get image save directory with fallback to a local directory
image_save_dir = os.getenv('IMG_SAVE_DIR', 'generated_images')
# Create the directory if it doesn't exist
if not os.path.exists(image_save_dir):
logger.info(f"Creating image save directory: {image_save_dir}")
os.makedirs(image_save_dir, exist_ok=True)
generated_image_name = f"generated_image_{datetime.datetime.now():%Y-%m-%d-%H-%M-%S}.webp"
generated_image_filepath = os.path.join(image_save_dir, generated_image_name)
try:
for i, image in enumerate(img_generation_response["artifacts"]):
with open(generated_image_filepath, "wb") as f:
f.write(base64.b64decode(image["base64"]))
except requests.exceptions.RequestException as e:
logger.error(f"Failed to get generated image content: {e}")
return None
except Exception as e:
logger.error(f"Error saving image: {e}")
return None
logger.info(f"Saved image at path: {generated_image_filepath}")
return generated_image_filepath

View File

@@ -0,0 +1,290 @@
"""
Onboarding Data Service
Extracts real user data from onboarding to personalize AI inputs
"""
from typing import Dict, Any, List, Optional
from sqlalchemy.orm import Session
from loguru import logger
from datetime import datetime
import json
from services.database import get_db_session
from models.onboarding import OnboardingSession, WebsiteAnalysis, ResearchPreferences
class OnboardingDataService:
"""Service to extract and use real onboarding data for AI personalization."""
def __init__(self):
"""Initialize the onboarding data service."""
logger.info("OnboardingDataService initialized")
def get_user_website_analysis(self, user_id: int) -> Optional[Dict[str, Any]]:
"""
Get website analysis data for a specific user.
Args:
user_id: User ID to get data for
Returns:
Website analysis data or None if not found
"""
try:
session = get_db_session()
# Find onboarding session for user
onboarding_session = session.query(OnboardingSession).filter(
OnboardingSession.user_id == user_id
).first()
if not onboarding_session:
logger.warning(f"No onboarding session found for user {user_id}")
return None
# Get website analysis for this session
website_analysis = session.query(WebsiteAnalysis).filter(
WebsiteAnalysis.session_id == onboarding_session.id
).first()
if not website_analysis:
logger.warning(f"No website analysis found for user {user_id}")
return None
return website_analysis.to_dict()
except Exception as e:
logger.error(f"Error getting website analysis for user {user_id}: {str(e)}")
return None
def get_user_research_preferences(self, user_id: int) -> Optional[Dict[str, Any]]:
"""
Get research preferences for a specific user.
Args:
user_id: User ID to get data for
Returns:
Research preferences data or None if not found
"""
try:
session = get_db_session()
# Find onboarding session for user
onboarding_session = session.query(OnboardingSession).filter(
OnboardingSession.user_id == user_id
).first()
if not onboarding_session:
logger.warning(f"No onboarding session found for user {user_id}")
return None
# Get research preferences for this session
research_prefs = session.query(ResearchPreferences).filter(
ResearchPreferences.session_id == onboarding_session.id
).first()
if not research_prefs:
logger.warning(f"No research preferences found for user {user_id}")
return None
return research_prefs.to_dict()
except Exception as e:
logger.error(f"Error getting research preferences for user {user_id}: {str(e)}")
return None
def get_personalized_ai_inputs(self, user_id: int) -> Dict[str, Any]:
"""
Get personalized AI inputs based on user's onboarding data.
Args:
user_id: User ID to get personalized data for
Returns:
Personalized data for AI analysis
"""
try:
logger.info(f"Getting personalized AI inputs for user {user_id}")
# Get website analysis
website_analysis = self.get_user_website_analysis(user_id)
research_prefs = self.get_user_research_preferences(user_id)
if not website_analysis:
logger.warning(f"No onboarding data found for user {user_id}, using defaults")
return self._get_default_ai_inputs()
# Extract real data from website analysis
writing_style = website_analysis.get('writing_style', {})
target_audience = website_analysis.get('target_audience', {})
content_type = website_analysis.get('content_type', {})
recommended_settings = website_analysis.get('recommended_settings', {})
# Build personalized AI inputs
personalized_inputs = {
"website_analysis": {
"website_url": website_analysis.get('website_url', ''),
"content_types": self._extract_content_types(content_type),
"writing_style": writing_style.get('tone', 'professional'),
"target_audience": target_audience.get('demographics', ['professionals']),
"industry_focus": target_audience.get('industry_focus', 'general'),
"expertise_level": target_audience.get('expertise_level', 'intermediate')
},
"competitor_analysis": {
"top_performers": self._generate_competitor_suggestions(target_audience),
"industry": target_audience.get('industry_focus', 'general'),
"target_demographics": target_audience.get('demographics', [])
},
"gap_analysis": {
"content_gaps": self._identify_content_gaps(content_type, writing_style),
"target_keywords": self._generate_target_keywords(target_audience),
"content_opportunities": self._identify_opportunities(content_type)
},
"keyword_analysis": {
"high_value_keywords": self._generate_high_value_keywords(target_audience),
"content_topics": self._generate_content_topics(content_type),
"search_intent": self._analyze_search_intent(target_audience)
}
}
# Add research preferences if available
if research_prefs:
personalized_inputs["research_preferences"] = {
"research_depth": research_prefs.get('research_depth', 'Standard'),
"content_types": research_prefs.get('content_types', []),
"auto_research": research_prefs.get('auto_research', True),
"factual_content": research_prefs.get('factual_content', True)
}
logger.info(f"✅ Generated personalized AI inputs for user {user_id}")
return personalized_inputs
except Exception as e:
logger.error(f"Error generating personalized AI inputs for user {user_id}: {str(e)}")
return self._get_default_ai_inputs()
def _extract_content_types(self, content_type: Dict[str, Any]) -> List[str]:
"""Extract content types from content type analysis."""
types = []
if content_type.get('primary_type'):
types.append(content_type['primary_type'])
if content_type.get('secondary_types'):
types.extend(content_type['secondary_types'])
return types if types else ['blog', 'article']
def _generate_competitor_suggestions(self, target_audience: Dict[str, Any]) -> List[str]:
"""Generate competitor suggestions based on target audience."""
industry = target_audience.get('industry_focus', 'general')
demographics = target_audience.get('demographics', ['professionals'])
# Generate industry-specific competitors
if industry == 'technology':
return ['techcrunch.com', 'wired.com', 'theverge.com']
elif industry == 'marketing':
return ['hubspot.com', 'marketingland.com', 'moz.com']
else:
return ['competitor1.com', 'competitor2.com', 'competitor3.com']
def _identify_content_gaps(self, content_type: Dict[str, Any], writing_style: Dict[str, Any]) -> List[str]:
"""Identify content gaps based on current content type and style."""
gaps = []
primary_type = content_type.get('primary_type', 'blog')
if primary_type == 'blog':
gaps.extend(['Video tutorials', 'Case studies', 'Infographics'])
elif primary_type == 'video':
gaps.extend(['Blog posts', 'Whitepapers', 'Webinars'])
# Add style-based gaps
tone = writing_style.get('tone', 'professional')
if tone == 'professional':
gaps.append('Personal stories')
elif tone == 'casual':
gaps.append('Expert interviews')
return gaps
def _generate_target_keywords(self, target_audience: Dict[str, Any]) -> List[str]:
"""Generate target keywords based on audience analysis."""
industry = target_audience.get('industry_focus', 'general')
expertise = target_audience.get('expertise_level', 'intermediate')
if industry == 'technology':
return ['AI tools', 'Digital transformation', 'Tech trends']
elif industry == 'marketing':
return ['Content marketing', 'SEO strategies', 'Social media']
else:
return ['Industry insights', 'Best practices', 'Expert tips']
def _identify_opportunities(self, content_type: Dict[str, Any]) -> List[str]:
"""Identify content opportunities based on current content type."""
opportunities = []
purpose = content_type.get('purpose', 'informational')
if purpose == 'informational':
opportunities.extend(['How-to guides', 'Tutorials', 'Educational content'])
elif purpose == 'promotional':
opportunities.extend(['Case studies', 'Testimonials', 'Success stories'])
return opportunities
def _generate_high_value_keywords(self, target_audience: Dict[str, Any]) -> List[str]:
"""Generate high-value keywords based on audience analysis."""
industry = target_audience.get('industry_focus', 'general')
if industry == 'technology':
return ['AI marketing', 'Content automation', 'Digital strategy']
elif industry == 'marketing':
return ['Content marketing', 'SEO optimization', 'Social media strategy']
else:
return ['Industry trends', 'Best practices', 'Expert insights']
def _generate_content_topics(self, content_type: Dict[str, Any]) -> List[str]:
"""Generate content topics based on content type analysis."""
topics = []
primary_type = content_type.get('primary_type', 'blog')
if primary_type == 'blog':
topics.extend(['Industry trends', 'How-to guides', 'Expert insights'])
elif primary_type == 'video':
topics.extend(['Tutorials', 'Product demos', 'Expert interviews'])
return topics
def _analyze_search_intent(self, target_audience: Dict[str, Any]) -> Dict[str, Any]:
"""Analyze search intent based on target audience."""
expertise = target_audience.get('expertise_level', 'intermediate')
if expertise == 'beginner':
return {'intent': 'educational', 'focus': 'basic concepts'}
elif expertise == 'intermediate':
return {'intent': 'practical', 'focus': 'implementation'}
else:
return {'intent': 'advanced', 'focus': 'strategic insights'}
def _get_default_ai_inputs(self) -> Dict[str, Any]:
"""Get default AI inputs when no onboarding data is available."""
return {
"website_analysis": {
"content_types": ["blog", "video", "social"],
"writing_style": "professional",
"target_audience": ["professionals"],
"industry_focus": "general",
"expertise_level": "intermediate"
},
"competitor_analysis": {
"top_performers": ["competitor1.com", "competitor2.com"],
"industry": "general",
"target_demographics": ["professionals"]
},
"gap_analysis": {
"content_gaps": ["AI content", "Video tutorials", "Case studies"],
"target_keywords": ["Industry insights", "Best practices"],
"content_opportunities": ["How-to guides", "Tutorials"]
},
"keyword_analysis": {
"high_value_keywords": ["AI marketing", "Content automation", "Digital strategy"],
"content_topics": ["Industry trends", "Expert insights"],
"search_intent": {"intent": "practical", "focus": "implementation"}
}
}

View File

@@ -0,0 +1,202 @@
"""
Research Preferences Service for Onboarding Step 3
Handles storage and retrieval of research preferences and style detection data.
"""
from typing import Dict, Any, Optional
from sqlalchemy.orm import Session
from sqlalchemy.exc import SQLAlchemyError
from datetime import datetime
import json
from loguru import logger
from models.onboarding import ResearchPreferences, OnboardingSession, WebsiteAnalysis
class ResearchPreferencesService:
"""Service for managing research preferences data during onboarding."""
def __init__(self, db_session: Session):
"""Initialize the service with database session."""
self.db = db_session
def save_research_preferences(self, session_id: int, preferences_data: Dict[str, Any], style_data: Optional[Dict[str, Any]] = None) -> Optional[int]:
"""
Save research preferences to database.
Args:
session_id: Onboarding session ID
preferences_data: Research preferences from step 3
style_data: Style detection data from step 2 (optional)
Returns:
Preferences ID if successful, None otherwise
"""
try:
# Check if preferences already exist for this session
existing_preferences = self.db.query(ResearchPreferences).filter_by(session_id=session_id).first()
if existing_preferences:
# Update existing preferences
existing_preferences.research_depth = preferences_data.get('research_depth', 'Comprehensive')
existing_preferences.content_types = preferences_data.get('content_types', [])
existing_preferences.auto_research = preferences_data.get('auto_research', True)
existing_preferences.factual_content = preferences_data.get('factual_content', True)
# Update style data if provided
if style_data:
existing_preferences.writing_style = style_data.get('writing_style')
existing_preferences.content_characteristics = style_data.get('content_characteristics')
existing_preferences.target_audience = style_data.get('target_audience')
existing_preferences.recommended_settings = style_data.get('recommended_settings')
existing_preferences.updated_at = datetime.utcnow()
self.db.commit()
logger.info(f"Updated research preferences for session {session_id}")
return existing_preferences.id
else:
# Create new preferences
preferences = ResearchPreferences(
session_id=session_id,
research_depth=preferences_data.get('research_depth', 'Comprehensive'),
content_types=preferences_data.get('content_types', []),
auto_research=preferences_data.get('auto_research', True),
factual_content=preferences_data.get('factual_content', True),
writing_style=style_data.get('writing_style') if style_data else None,
content_characteristics=style_data.get('content_characteristics') if style_data else None,
target_audience=style_data.get('target_audience') if style_data else None,
recommended_settings=style_data.get('recommended_settings') if style_data else None
)
self.db.add(preferences)
self.db.commit()
logger.info(f"Created research preferences for session {session_id}")
return preferences.id
except SQLAlchemyError as e:
self.db.rollback()
logger.error(f"Database error saving research preferences: {e}")
return None
except Exception as e:
self.db.rollback()
logger.error(f"Error saving research preferences: {e}")
return None
def get_research_preferences(self, session_id: int) -> Optional[Dict[str, Any]]:
"""
Get research preferences for a session.
Args:
session_id: Onboarding session ID
Returns:
Research preferences data or None if not found
"""
try:
preferences = self.db.query(ResearchPreferences).filter_by(session_id=session_id).first()
if preferences:
return preferences.to_dict()
return None
except Exception as e:
logger.error(f"Error getting research preferences: {e}")
return None
def get_style_data_from_analysis(self, session_id: int) -> Optional[Dict[str, Any]]:
"""
Get style detection data from website analysis for a session.
Args:
session_id: Onboarding session ID
Returns:
Style data from website analysis or None if not found
"""
try:
analysis = self.db.query(WebsiteAnalysis).filter_by(session_id=session_id).first()
if analysis:
return {
'writing_style': analysis.writing_style,
'content_characteristics': analysis.content_characteristics,
'target_audience': analysis.target_audience,
'recommended_settings': analysis.recommended_settings
}
return None
except Exception as e:
logger.error(f"Error getting style data from analysis: {e}")
return None
def save_preferences_with_style_data(self, session_id: int, preferences_data: Dict[str, Any]) -> Optional[int]:
"""
Save research preferences with style data from website analysis.
Args:
session_id: Onboarding session ID
preferences_data: Research preferences from step 3
Returns:
Preferences ID if successful, None otherwise
"""
# Get style data from website analysis
style_data = self.get_style_data_from_analysis(session_id)
# Save preferences with style data
return self.save_research_preferences(session_id, preferences_data, style_data)
def update_preferences(self, preferences_id: int, updates: Dict[str, Any]) -> bool:
"""
Update existing research preferences.
Args:
preferences_id: Research preferences ID
updates: Dictionary of fields to update
Returns:
True if successful, False otherwise
"""
try:
preferences = self.db.query(ResearchPreferences).filter_by(id=preferences_id).first()
if not preferences:
logger.warning(f"Research preferences {preferences_id} not found")
return False
# Update fields
for field, value in updates.items():
if hasattr(preferences, field):
setattr(preferences, field, value)
preferences.updated_at = datetime.utcnow()
self.db.commit()
logger.info(f"Updated research preferences {preferences_id}")
return True
except SQLAlchemyError as e:
self.db.rollback()
logger.error(f"Database error updating research preferences: {e}")
return False
except Exception as e:
self.db.rollback()
logger.error(f"Error updating research preferences: {e}")
return False
def delete_preferences(self, session_id: int) -> bool:
"""
Delete research preferences for a session.
Args:
session_id: Onboarding session ID
Returns:
True if successful, False otherwise
"""
try:
preferences = self.db.query(ResearchPreferences).filter_by(session_id=session_id).first()
if preferences:
self.db.delete(preferences)
self.db.commit()
logger.info(f"Deleted research preferences for session {session_id}")
return True
return False
except Exception as e:
self.db.rollback()
logger.error(f"Error deleting research preferences: {e}")
return False

View File

@@ -0,0 +1,288 @@
# SEO Analyzer Module
A comprehensive, modular SEO analysis system for web applications that provides detailed insights and actionable recommendations for improving search engine optimization.
## 🚀 Features
### ✅ **Currently Implemented**
#### **Core Analysis Components**
- **URL Structure Analysis**: Checks URL length, HTTPS usage, special characters, and URL formatting
- **Meta Data Analysis**: Analyzes title tags, meta descriptions, viewport settings, and character encoding
- **Content Analysis**: Evaluates content quality, word count, heading structure, and readability
- **Technical SEO Analysis**: Checks robots.txt, sitemaps, structured data, and canonical URLs
- **Performance Analysis**: Measures page load speed, compression, caching, and optimization
- **Accessibility Analysis**: Ensures alt text, form labels, heading structure, and color contrast
- **User Experience Analysis**: Checks mobile responsiveness, navigation, contact info, and social links
- **Security Headers Analysis**: Analyzes security headers for protection against common vulnerabilities
- **Keyword Analysis**: Evaluates keyword usage and optimization for target keywords
#### **AI-Powered Insights**
- **Intelligent Issue Detection**: Automatically identifies critical SEO problems
- **Actionable Recommendations**: Provides specific fixes with code examples
- **Priority-Based Suggestions**: Categorizes issues by severity and impact
- **Context-Aware Solutions**: Offers location-specific fixes and improvements
#### **Advanced Features**
- **Progressive Analysis**: Runs faster analyses first, then slower ones with graceful fallbacks
- **Timeout Handling**: Robust error handling for network issues and timeouts
- **Detailed Reporting**: Comprehensive analysis with scores, issues, warnings, and recommendations
- **Modular Architecture**: Reusable components for easy maintenance and extension
### 🔄 **Coming Soon**
#### **Enhanced Analysis Features**
- **Core Web Vitals Analysis**: LCP, FID, CLS measurements
- **Mobile-First Analysis**: Comprehensive mobile optimization checks
- **Schema Markup Validation**: Advanced structured data analysis
- **Image Optimization Analysis**: Alt text, compression, and format recommendations
- **Internal Linking Analysis**: Site structure and internal link optimization
- **Social Media Optimization**: Open Graph and Twitter Card analysis
#### **AI-Powered Enhancements**
- **Natural Language Processing**: Advanced content analysis using NLP
- **Competitive Analysis**: Compare against competitor websites
- **Trend Analysis**: Identify SEO trends and opportunities
- **Predictive Insights**: Forecast potential ranking improvements
- **Automated Fix Generation**: AI-generated code fixes and optimizations
#### **Advanced Features**
- **Bulk Analysis**: Analyze multiple URLs simultaneously
- **Historical Tracking**: Monitor SEO improvements over time
- **Custom Rule Engine**: User-defined analysis rules and thresholds
- **API Integration**: Connect with Google Search Console, Analytics, and other tools
- **White-Label Support**: Customizable branding and reporting
#### **Enterprise Features**
- **Multi-User Support**: Team collaboration and role-based access
- **Advanced Reporting**: Custom dashboards and detailed analytics
- **API Rate Limiting**: Intelligent request management
- **Caching System**: Optimized performance for repeated analyses
- **Webhook Support**: Real-time notifications and integrations
## 📁 **Module Structure**
```
seo_analyzer/
├── __init__.py # Package initialization and exports
├── core.py # Main analyzer class and data structures
├── analyzers.py # Individual analysis components
├── utils.py # Utility classes (HTML fetcher, AI insights)
├── service.py # Database service for storing/retrieving results
└── README.md # This documentation
```
### **Core Components**
#### **`core.py`**
- `ComprehensiveSEOAnalyzer`: Main orchestrator class
- `SEOAnalysisResult`: Data structure for analysis results
- Progressive analysis with error handling
#### **`analyzers.py`**
- `BaseAnalyzer`: Base class for all analyzers
- `URLStructureAnalyzer`: URL analysis and security checks
- `MetaDataAnalyzer`: Meta tags and technical SEO
- `ContentAnalyzer`: Content quality and structure
- `TechnicalSEOAnalyzer`: Technical SEO elements
- `PerformanceAnalyzer`: Page speed and optimization
- `AccessibilityAnalyzer`: Accessibility compliance
- `UserExperienceAnalyzer`: UX and mobile optimization
- `SecurityHeadersAnalyzer`: Security header analysis
- `KeywordAnalyzer`: Keyword optimization
#### **`utils.py`**
- `HTMLFetcher`: Robust HTML content fetching
- `AIInsightGenerator`: AI-powered insights generation
#### **`service.py`**
- `SEOAnalysisService`: Database operations for storing and retrieving analysis results
- Analysis history tracking
- Statistics and reporting
- CRUD operations for analysis data
## 🛠 **Usage**
### **Basic Usage**
```python
from services.seo_analyzer import ComprehensiveSEOAnalyzer
# Initialize analyzer
analyzer = ComprehensiveSEOAnalyzer()
# Analyze a URL
result = analyzer.analyze_url_progressive(
url="https://example.com",
target_keywords=["seo", "optimization"]
)
# Access results
print(f"Overall Score: {result.overall_score}")
print(f"Health Status: {result.health_status}")
print(f"Critical Issues: {len(result.critical_issues)}")
```
### **Individual Analyzer Usage**
```python
from services.seo_analyzer import URLStructureAnalyzer, MetaDataAnalyzer
# URL analysis
url_analyzer = URLStructureAnalyzer()
url_result = url_analyzer.analyze("https://example.com")
# Meta data analysis
meta_analyzer = MetaDataAnalyzer()
meta_result = meta_analyzer.analyze(html_content, "https://example.com")
```
## 📊 **Analysis Categories**
### **URL Structure & Security**
- URL length optimization
- HTTPS implementation
- Special character handling
- URL readability and formatting
### **Meta Data & Technical SEO**
- Title tag optimization (30-60 characters)
- Meta description analysis (70-160 characters)
- Viewport meta tag presence
- Character encoding declaration
### **Content Analysis**
- Word count evaluation (minimum 300 words)
- Heading hierarchy (H1, H2, H3 structure)
- Image alt text compliance
- Internal linking analysis
- Spelling error detection
### **Technical SEO**
- Robots.txt accessibility
- XML sitemap presence
- Structured data markup
- Canonical URL implementation
### **Performance**
- Page load time measurement
- GZIP compression detection
- Caching header analysis
- Resource optimization recommendations
### **Accessibility**
- Image alt text compliance
- Form label associations
- Heading hierarchy validation
- Color contrast recommendations
### **User Experience**
- Mobile responsiveness checks
- Navigation menu analysis
- Contact information presence
- Social media link integration
### **Security Headers**
- X-Frame-Options
- X-Content-Type-Options
- X-XSS-Protection
- Strict-Transport-Security
- Content-Security-Policy
- Referrer-Policy
### **Keyword Analysis**
- Title keyword presence
- Content keyword density
- Natural keyword integration
- Target keyword optimization
## 🎯 **Scoring System**
### **Overall Health Status**
- **Excellent (80-100)**: Optimal SEO performance
- **Good (60-79)**: Good performance with minor improvements needed
- **Needs Improvement (40-59)**: Significant issues requiring attention
- **Poor (0-39)**: Critical issues requiring immediate action
### **Issue Categories**
- **Critical Issues**: Major problems affecting rankings (25 points each)
- **Warnings**: Important improvements for better performance (10 points each)
- **Recommendations**: Optional enhancements for optimal results
## 🔧 **Configuration**
### **Timeout Settings**
- HTML Fetching: 30 seconds
- Security Headers: 15 seconds
- Performance Analysis: 20 seconds
- Progressive Analysis: Graceful fallbacks
### **Scoring Thresholds**
- URL Length: 2000 characters maximum
- Title Length: 30-60 characters optimal
- Meta Description: 70-160 characters optimal
- Content Length: 300 words minimum
- Load Time: 3 seconds maximum
## 🚀 **Performance Features**
### **Progressive Analysis**
1. **Fast Analyses**: URL structure, meta data, content, technical SEO, accessibility, UX
2. **Slower Analyses**: Security headers, performance (with timeout handling)
3. **Graceful Fallbacks**: Partial results when analyses fail
### **Error Handling**
- Network timeout management
- Partial result generation
- Detailed error reporting
- Fallback recommendations
## 📈 **Future Roadmap**
### **Phase 1 (Q1 2024)**
- [ ] Core Web Vitals integration
- [ ] Enhanced mobile analysis
- [ ] Schema markup validation
- [ ] Image optimization analysis
### **Phase 2 (Q2 2024)**
- [ ] NLP-powered content analysis
- [ ] Competitive analysis features
- [ ] Bulk analysis capabilities
- [ ] Historical tracking
### **Phase 3 (Q3 2024)**
- [ ] Predictive insights
- [ ] Automated fix generation
- [ ] API integrations
- [ ] White-label support
### **Phase 4 (Q4 2024)**
- [ ] Enterprise features
- [ ] Advanced reporting
- [ ] Multi-user support
- [ ] Webhook integrations
## 🤝 **Contributing**
### **Adding New Analyzers**
1. Create a new analyzer class inheriting from `BaseAnalyzer`
2. Implement the `analyze()` method
3. Return standardized result format
4. Add to the main orchestrator in `core.py`
### **Extending Existing Features**
1. Follow the modular architecture
2. Maintain backward compatibility
3. Add comprehensive error handling
4. Include detailed documentation
## 📝 **License**
This module is part of the AI-Writer project and follows the same licensing terms.
---
**Version**: 1.0.0
**Last Updated**: January 2024
**Maintainer**: AI-Writer Team

View File

@@ -0,0 +1,52 @@
"""
SEO Analyzer Package
A comprehensive, modular SEO analysis system for web applications.
This package provides:
- URL structure analysis
- Meta data analysis
- Content analysis
- Technical SEO analysis
- Performance analysis
- Accessibility analysis
- User experience analysis
- Security headers analysis
- Keyword analysis
- AI-powered insights generation
- Database service for storing and retrieving analysis results
"""
from .core import ComprehensiveSEOAnalyzer, SEOAnalysisResult
from .analyzers import (
URLStructureAnalyzer,
MetaDataAnalyzer,
ContentAnalyzer,
TechnicalSEOAnalyzer,
PerformanceAnalyzer,
AccessibilityAnalyzer,
UserExperienceAnalyzer,
SecurityHeadersAnalyzer,
KeywordAnalyzer
)
from .utils import HTMLFetcher, AIInsightGenerator
from .service import SEOAnalysisService
__version__ = "1.0.0"
__author__ = "AI-Writer Team"
__all__ = [
'ComprehensiveSEOAnalyzer',
'SEOAnalysisResult',
'URLStructureAnalyzer',
'MetaDataAnalyzer',
'ContentAnalyzer',
'TechnicalSEOAnalyzer',
'PerformanceAnalyzer',
'AccessibilityAnalyzer',
'UserExperienceAnalyzer',
'SecurityHeadersAnalyzer',
'KeywordAnalyzer',
'HTMLFetcher',
'AIInsightGenerator',
'SEOAnalysisService'
]

View File

@@ -0,0 +1,796 @@
"""
SEO Analyzers Module
Contains all individual SEO analysis components.
"""
import re
import time
import requests
from urllib.parse import urlparse, urljoin
from typing import Dict, List, Any, Optional
from bs4 import BeautifulSoup
from loguru import logger
class BaseAnalyzer:
"""Base class for all SEO analyzers"""
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
})
class URLStructureAnalyzer(BaseAnalyzer):
"""Analyzes URL structure and security"""
def analyze(self, url: str) -> Dict[str, Any]:
"""Enhanced URL structure analysis with specific fixes"""
parsed = urlparse(url)
issues = []
warnings = []
recommendations = []
# Check URL length
if len(url) > 2000:
issues.append({
'type': 'critical',
'message': f'URL is too long ({len(url)} characters)',
'location': 'URL',
'current_value': url,
'fix': 'Shorten URL to under 2000 characters',
'code_example': f'<a href="/shorter-path">Link</a>',
'action': 'shorten_url'
})
# Check for hyphens
if '_' in parsed.path and '-' not in parsed.path:
issues.append({
'type': 'critical',
'message': 'URL uses underscores instead of hyphens',
'location': 'URL',
'current_value': parsed.path,
'fix': 'Replace underscores with hyphens',
'code_example': f'<a href="{parsed.path.replace("_", "-")}">Link</a>',
'action': 'replace_underscores'
})
# Check for special characters
special_chars = re.findall(r'[^a-zA-Z0-9\-_/]', parsed.path)
if special_chars:
warnings.append({
'type': 'warning',
'message': f'URL contains special characters: {", ".join(set(special_chars))}',
'location': 'URL',
'current_value': parsed.path,
'fix': 'Remove special characters from URL',
'code_example': f'<a href="/clean-url">Link</a>',
'action': 'remove_special_chars'
})
# Check for HTTPS
if parsed.scheme != 'https':
issues.append({
'type': 'critical',
'message': 'URL is not using HTTPS',
'location': 'URL',
'current_value': parsed.scheme,
'fix': 'Redirect to HTTPS',
'code_example': 'RewriteEngine On\nRewriteCond %{HTTPS} off\nRewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]',
'action': 'enable_https'
})
score = max(0, 100 - len(issues) * 25 - len(warnings) * 10)
return {
'score': score,
'issues': issues,
'warnings': warnings,
'recommendations': recommendations,
'url_length': len(url),
'has_https': parsed.scheme == 'https',
'has_hyphens': '-' in parsed.path,
'special_chars_count': len(special_chars)
}
class MetaDataAnalyzer(BaseAnalyzer):
"""Analyzes meta data and technical SEO elements"""
def analyze(self, html_content: str, url: str) -> Dict[str, Any]:
"""Enhanced meta data analysis with specific element locations"""
soup = BeautifulSoup(html_content, 'html.parser')
issues = []
warnings = []
recommendations = []
# Title analysis
title_tag = soup.find('title')
if not title_tag:
issues.append({
'type': 'critical',
'message': 'Missing title tag',
'location': '<head>',
'fix': 'Add title tag to head section',
'code_example': '<title>Your Page Title</title>',
'action': 'add_title_tag'
})
else:
title_text = title_tag.get_text().strip()
if len(title_text) < 30:
warnings.append({
'type': 'warning',
'message': f'Title too short ({len(title_text)} characters)',
'location': '<title>',
'current_value': title_text,
'fix': 'Make title 30-60 characters',
'code_example': f'<title>{title_text} - Additional Context</title>',
'action': 'extend_title'
})
elif len(title_text) > 60:
warnings.append({
'type': 'warning',
'message': f'Title too long ({len(title_text)} characters)',
'location': '<title>',
'current_value': title_text,
'fix': 'Shorten title to 30-60 characters',
'code_example': f'<title>{title_text[:55]}...</title>',
'action': 'shorten_title'
})
# Meta description analysis
meta_desc = soup.find('meta', attrs={'name': 'description'})
if not meta_desc:
issues.append({
'type': 'critical',
'message': 'Missing meta description',
'location': '<head>',
'fix': 'Add meta description',
'code_example': '<meta name="description" content="Your page description here">',
'action': 'add_meta_description'
})
else:
desc_content = meta_desc.get('content', '').strip()
if len(desc_content) < 70:
warnings.append({
'type': 'warning',
'message': f'Meta description too short ({len(desc_content)} characters)',
'location': '<meta name="description">',
'current_value': desc_content,
'fix': 'Extend description to 70-160 characters',
'code_example': f'<meta name="description" content="{desc_content} - Additional context about your page">',
'action': 'extend_meta_description'
})
elif len(desc_content) > 160:
warnings.append({
'type': 'warning',
'message': f'Meta description too long ({len(desc_content)} characters)',
'location': '<meta name="description">',
'current_value': desc_content,
'fix': 'Shorten description to 70-160 characters',
'code_example': f'<meta name="description" content="{desc_content[:155]}...">',
'action': 'shorten_meta_description'
})
# Viewport meta tag
viewport = soup.find('meta', attrs={'name': 'viewport'})
if not viewport:
issues.append({
'type': 'critical',
'message': 'Missing viewport meta tag',
'location': '<head>',
'fix': 'Add viewport meta tag for mobile optimization',
'code_example': '<meta name="viewport" content="width=device-width, initial-scale=1.0">',
'action': 'add_viewport_meta'
})
# Charset declaration
charset = soup.find('meta', attrs={'charset': True}) or soup.find('meta', attrs={'http-equiv': 'Content-Type'})
if not charset:
warnings.append({
'type': 'warning',
'message': 'Missing charset declaration',
'location': '<head>',
'fix': 'Add charset meta tag',
'code_example': '<meta charset="UTF-8">',
'action': 'add_charset_meta'
})
score = max(0, 100 - len(issues) * 25 - len(warnings) * 10)
return {
'score': score,
'issues': issues,
'warnings': warnings,
'recommendations': recommendations,
'title_length': len(title_tag.get_text().strip()) if title_tag else 0,
'description_length': len(meta_desc.get('content', '')) if meta_desc else 0,
'has_viewport': bool(viewport),
'has_charset': bool(charset)
}
class ContentAnalyzer(BaseAnalyzer):
"""Analyzes content quality and structure"""
def analyze(self, html_content: str, url: str) -> Dict[str, Any]:
"""Enhanced content analysis with specific text locations"""
soup = BeautifulSoup(html_content, 'html.parser')
issues = []
warnings = []
recommendations = []
# Get all text content
text_content = soup.get_text()
words = text_content.split()
word_count = len(words)
# Check word count
if word_count < 300:
issues.append({
'type': 'critical',
'message': f'Content too short ({word_count} words)',
'location': 'Page content',
'current_value': f'{word_count} words',
'fix': 'Add more valuable content (minimum 300 words)',
'code_example': 'Add relevant paragraphs with useful information',
'action': 'add_more_content'
})
# Check for H1 tags
h1_tags = soup.find_all('h1')
if len(h1_tags) == 0:
issues.append({
'type': 'critical',
'message': 'Missing H1 tag',
'location': 'Page structure',
'fix': 'Add one H1 tag per page',
'code_example': '<h1>Your Main Page Title</h1>',
'action': 'add_h1_tag'
})
elif len(h1_tags) > 1:
warnings.append({
'type': 'warning',
'message': f'Multiple H1 tags found ({len(h1_tags)})',
'location': 'Page structure',
'current_value': f'{len(h1_tags)} H1 tags',
'fix': 'Use only one H1 tag per page',
'code_example': 'Keep only the main H1, change others to H2',
'action': 'reduce_h1_tags'
})
# Check for images without alt text
images = soup.find_all('img')
images_without_alt = [img for img in images if not img.get('alt')]
if images_without_alt:
warnings.append({
'type': 'warning',
'message': f'Images without alt text ({len(images_without_alt)} found)',
'location': 'Images',
'current_value': f'{len(images_without_alt)} images without alt',
'fix': 'Add descriptive alt text to all images',
'code_example': '<img src="image.jpg" alt="Descriptive text about the image">',
'action': 'add_alt_text'
})
# Check for internal links
internal_links = soup.find_all('a', href=re.compile(r'^[^http]'))
if len(internal_links) < 3:
warnings.append({
'type': 'warning',
'message': f'Few internal links ({len(internal_links)} found)',
'location': 'Page content',
'current_value': f'{len(internal_links)} internal links',
'fix': 'Add more internal links to improve site structure',
'code_example': '<a href="/related-page">Related content</a>',
'action': 'add_internal_links'
})
# Check for spelling errors (basic check)
common_words = ['the', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by']
potential_errors = []
for word in words[:100]: # Check first 100 words
if len(word) > 3 and word.lower() not in common_words:
# Basic spell check (this is simplified - in production you'd use a proper spell checker)
if re.search(r'[a-z]{15,}', word.lower()): # Very long words might be misspelled
potential_errors.append(word)
if potential_errors:
issues.append({
'type': 'critical',
'message': f'Potential spelling errors found: {", ".join(potential_errors[:5])}',
'location': 'Page content',
'current_value': f'{len(potential_errors)} potential errors',
'fix': 'Review and correct spelling errors',
'code_example': 'Use spell checker or proofread content',
'action': 'fix_spelling'
})
score = max(0, 100 - len(issues) * 25 - len(warnings) * 10)
return {
'score': score,
'issues': issues,
'warnings': warnings,
'recommendations': recommendations,
'word_count': word_count,
'h1_count': len(h1_tags),
'images_count': len(images),
'images_without_alt': len(images_without_alt),
'internal_links_count': len(internal_links),
'potential_spelling_errors': len(potential_errors)
}
class TechnicalSEOAnalyzer(BaseAnalyzer):
"""Analyzes technical SEO elements"""
def analyze(self, html_content: str, url: str) -> Dict[str, Any]:
"""Enhanced technical SEO analysis with specific fixes"""
soup = BeautifulSoup(html_content, 'html.parser')
issues = []
warnings = []
recommendations = []
# Check for robots.txt
robots_url = urljoin(url, '/robots.txt')
try:
robots_response = self.session.get(robots_url, timeout=5)
if robots_response.status_code != 200:
warnings.append({
'type': 'warning',
'message': 'Robots.txt not accessible',
'location': 'Server',
'fix': 'Create robots.txt file',
'code_example': 'User-agent: *\nAllow: /',
'action': 'create_robots_txt'
})
except:
warnings.append({
'type': 'warning',
'message': 'Robots.txt not found',
'location': 'Server',
'fix': 'Create robots.txt file',
'code_example': 'User-agent: *\nAllow: /',
'action': 'create_robots_txt'
})
# Check for sitemap
sitemap_url = urljoin(url, '/sitemap.xml')
try:
sitemap_response = self.session.get(sitemap_url, timeout=5)
if sitemap_response.status_code != 200:
warnings.append({
'type': 'warning',
'message': 'Sitemap not accessible',
'location': 'Server',
'fix': 'Create XML sitemap',
'code_example': '<?xml version="1.0" encoding="UTF-8"?>\n<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n<url>\n<loc>https://example.com/</loc>\n</url>\n</urlset>',
'action': 'create_sitemap'
})
except:
warnings.append({
'type': 'warning',
'message': 'Sitemap not found',
'location': 'Server',
'fix': 'Create XML sitemap',
'code_example': '<?xml version="1.0" encoding="UTF-8"?>\n<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n<url>\n<loc>https://example.com/</loc>\n</url>\n</urlset>',
'action': 'create_sitemap'
})
# Check for structured data
structured_data = soup.find_all('script', type='application/ld+json')
if not structured_data:
warnings.append({
'type': 'warning',
'message': 'No structured data found',
'location': '<head> or <body>',
'fix': 'Add structured data markup',
'code_example': '<script type="application/ld+json">{"@context":"https://schema.org","@type":"WebPage","name":"Page Title"}</script>',
'action': 'add_structured_data'
})
# Check for canonical URL
canonical = soup.find('link', rel='canonical')
if not canonical:
issues.append({
'type': 'critical',
'message': 'Missing canonical URL',
'location': '<head>',
'fix': 'Add canonical URL',
'code_example': '<link rel="canonical" href="https://example.com/page">',
'action': 'add_canonical_url'
})
score = max(0, 100 - len(issues) * 25 - len(warnings) * 10)
return {
'score': score,
'issues': issues,
'warnings': warnings,
'recommendations': recommendations,
'has_robots_txt': len([w for w in warnings if 'robots.txt' in w['message']]) == 0,
'has_sitemap': len([w for w in warnings if 'sitemap' in w['message']]) == 0,
'has_structured_data': bool(structured_data),
'has_canonical': bool(canonical)
}
class PerformanceAnalyzer(BaseAnalyzer):
"""Analyzes page performance"""
def analyze(self, url: str) -> Dict[str, Any]:
"""Enhanced performance analysis with specific fixes"""
try:
start_time = time.time()
response = self.session.get(url, timeout=20)
load_time = time.time() - start_time
issues = []
warnings = []
recommendations = []
# Check load time
if load_time > 3:
issues.append({
'type': 'critical',
'message': f'Page load time too slow ({load_time:.2f}s)',
'location': 'Page performance',
'current_value': f'{load_time:.2f}s',
'fix': 'Optimize page speed (target < 3 seconds)',
'code_example': 'Optimize images, minify CSS/JS, use CDN',
'action': 'optimize_page_speed'
})
elif load_time > 2:
warnings.append({
'type': 'warning',
'message': f'Page load time could be improved ({load_time:.2f}s)',
'location': 'Page performance',
'current_value': f'{load_time:.2f}s',
'fix': 'Optimize for faster loading',
'code_example': 'Compress images, enable caching',
'action': 'improve_page_speed'
})
# Check for compression
content_encoding = response.headers.get('Content-Encoding')
if not content_encoding:
warnings.append({
'type': 'warning',
'message': 'No compression detected',
'location': 'Server configuration',
'fix': 'Enable GZIP compression',
'code_example': 'Add to .htaccess: SetOutputFilter DEFLATE',
'action': 'enable_compression'
})
# Check for caching headers
cache_headers = ['Cache-Control', 'Expires', 'ETag']
has_cache = any(response.headers.get(header) for header in cache_headers)
if not has_cache:
warnings.append({
'type': 'warning',
'message': 'No caching headers found',
'location': 'Server configuration',
'fix': 'Add caching headers',
'code_example': 'Cache-Control: max-age=31536000',
'action': 'add_caching_headers'
})
score = max(0, 100 - len(issues) * 25 - len(warnings) * 10)
return {
'score': score,
'load_time': load_time,
'is_compressed': bool(content_encoding),
'has_cache': has_cache,
'issues': issues,
'warnings': warnings,
'recommendations': recommendations
}
except Exception as e:
logger.warning(f"Performance analysis failed for {url}: {e}")
return {
'score': 0, 'error': f'Performance analysis failed: {str(e)}',
'load_time': 0, 'is_compressed': False, 'has_cache': False,
'issues': [{'type': 'critical', 'message': 'Performance analysis failed', 'location': 'Page', 'fix': 'Check page speed manually', 'action': 'manual_check'}],
'warnings': [{'type': 'warning', 'message': 'Could not analyze performance', 'location': 'Page', 'fix': 'Use PageSpeed Insights', 'action': 'manual_check'}],
'recommendations': [{'type': 'recommendation', 'message': 'Check page speed manually', 'priority': 'medium', 'action': 'manual_check'}]
}
class AccessibilityAnalyzer(BaseAnalyzer):
"""Analyzes accessibility features"""
def analyze(self, html_content: str) -> Dict[str, Any]:
"""Enhanced accessibility analysis with specific fixes"""
soup = BeautifulSoup(html_content, 'html.parser')
issues = []
warnings = []
recommendations = []
# Check for alt text on images
images = soup.find_all('img')
images_without_alt = [img for img in images if not img.get('alt')]
if images_without_alt:
issues.append({
'type': 'critical',
'message': f'Images without alt text ({len(images_without_alt)} found)',
'location': 'Images',
'current_value': f'{len(images_without_alt)} images without alt',
'fix': 'Add descriptive alt text to all images',
'code_example': '<img src="image.jpg" alt="Descriptive text about the image">',
'action': 'add_alt_text'
})
# Check for form labels
forms = soup.find_all('form')
for form in forms:
inputs = form.find_all(['input', 'textarea', 'select'])
for input_elem in inputs:
if input_elem.get('type') not in ['hidden', 'submit', 'button']:
input_id = input_elem.get('id')
if input_id:
label = soup.find('label', attrs={'for': input_id})
if not label:
warnings.append({
'type': 'warning',
'message': f'Input without label (ID: {input_id})',
'location': 'Form',
'current_value': f'Input ID: {input_id}',
'fix': 'Add label for input field',
'code_example': f'<label for="{input_id}">Field Label</label>',
'action': 'add_form_label'
})
# Check for heading hierarchy
headings = soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])
if headings:
h1_count = len([h for h in headings if h.name == 'h1'])
if h1_count == 0:
issues.append({
'type': 'critical',
'message': 'No H1 heading found',
'location': 'Page structure',
'fix': 'Add H1 heading for main content',
'code_example': '<h1>Main Page Heading</h1>',
'action': 'add_h1_heading'
})
# Check for color contrast (basic check)
style_tags = soup.find_all('style')
inline_styles = soup.find_all(style=True)
if style_tags or inline_styles:
warnings.append({
'type': 'warning',
'message': 'Custom styles found - check color contrast',
'location': 'CSS',
'fix': 'Ensure sufficient color contrast (4.5:1 for normal text)',
'code_example': 'Use tools like WebAIM Contrast Checker',
'action': 'check_color_contrast'
})
score = max(0, 100 - len(issues) * 25 - len(warnings) * 10)
return {
'score': score,
'issues': issues,
'warnings': warnings,
'recommendations': recommendations,
'images_count': len(images),
'images_without_alt': len(images_without_alt),
'forms_count': len(forms),
'headings_count': len(headings)
}
class UserExperienceAnalyzer(BaseAnalyzer):
"""Analyzes user experience elements"""
def analyze(self, html_content: str, url: str) -> Dict[str, Any]:
"""Enhanced user experience analysis with specific fixes"""
soup = BeautifulSoup(html_content, 'html.parser')
issues = []
warnings = []
recommendations = []
# Check for mobile responsiveness indicators
viewport = soup.find('meta', attrs={'name': 'viewport'})
if not viewport:
issues.append({
'type': 'critical',
'message': 'Missing viewport meta tag for mobile',
'location': '<head>',
'fix': 'Add viewport meta tag',
'code_example': '<meta name="viewport" content="width=device-width, initial-scale=1.0">',
'action': 'add_viewport_meta'
})
# Check for navigation menu
nav_elements = soup.find_all(['nav', 'ul', 'ol'])
if not nav_elements:
warnings.append({
'type': 'warning',
'message': 'No navigation menu found',
'location': 'Page structure',
'fix': 'Add navigation menu',
'code_example': '<nav><ul><li><a href="/">Home</a></li></ul></nav>',
'action': 'add_navigation'
})
# Check for contact information
contact_patterns = ['contact', 'phone', 'email', '@', 'tel:']
page_text = soup.get_text().lower()
has_contact = any(pattern in page_text for pattern in contact_patterns)
if not has_contact:
warnings.append({
'type': 'warning',
'message': 'No contact information found',
'location': 'Page content',
'fix': 'Add contact information',
'code_example': '<p>Contact us: <a href="mailto:info@example.com">info@example.com</a></p>',
'action': 'add_contact_info'
})
# Check for social media links
social_patterns = ['facebook', 'twitter', 'linkedin', 'instagram']
has_social = any(pattern in page_text for pattern in social_patterns)
if not has_social:
recommendations.append({
'type': 'recommendation',
'message': 'No social media links found',
'location': 'Page content',
'fix': 'Add social media links',
'code_example': '<a href="https://facebook.com/yourpage">Facebook</a>',
'action': 'add_social_links',
'priority': 'low'
})
score = max(0, 100 - len(issues) * 25 - len(warnings) * 10)
return {
'score': score,
'issues': issues,
'warnings': warnings,
'recommendations': recommendations,
'has_viewport': bool(viewport),
'has_navigation': bool(nav_elements),
'has_contact': has_contact,
'has_social': has_social
}
class SecurityHeadersAnalyzer(BaseAnalyzer):
"""Analyzes security headers"""
def analyze(self, url: str) -> Dict[str, Any]:
"""Enhanced security headers analysis with specific fixes"""
try:
response = self.session.get(url, timeout=15, allow_redirects=True)
security_headers = {
'X-Frame-Options': response.headers.get('X-Frame-Options'),
'X-Content-Type-Options': response.headers.get('X-Content-Type-Options'),
'X-XSS-Protection': response.headers.get('X-XSS-Protection'),
'Strict-Transport-Security': response.headers.get('Strict-Transport-Security'),
'Content-Security-Policy': response.headers.get('Content-Security-Policy'),
'Referrer-Policy': response.headers.get('Referrer-Policy')
}
issues = []
warnings = []
recommendations = []
present_headers = []
missing_headers = []
for header_name, header_value in security_headers.items():
if header_value:
present_headers.append(header_name)
else:
missing_headers.append(header_name)
if header_name in ['X-Frame-Options', 'X-Content-Type-Options']:
issues.append({
'type': 'critical',
'message': f'Missing {header_name} header',
'location': 'Server configuration',
'fix': f'Add {header_name} header',
'code_example': f'{header_name}: DENY' if header_name == 'X-Frame-Options' else f'{header_name}: nosniff',
'action': f'add_{header_name.lower().replace("-", "_")}_header'
})
else:
warnings.append({
'type': 'warning',
'message': f'Missing {header_name} header',
'location': 'Server configuration',
'fix': f'Add {header_name} header for better security',
'code_example': f'{header_name}: max-age=31536000',
'action': f'add_{header_name.lower().replace("-", "_")}_header'
})
score = min(100, len(present_headers) * 16)
return {
'score': score,
'present_headers': present_headers,
'missing_headers': missing_headers,
'total_headers': len(present_headers),
'issues': issues,
'warnings': warnings,
'recommendations': recommendations
}
except Exception as e:
logger.warning(f"Security headers analysis failed for {url}: {e}")
return {
'score': 0, 'error': f'Error analyzing headers: {str(e)}',
'present_headers': [], 'missing_headers': ['All security headers'],
'total_headers': 0, 'issues': [{'type': 'critical', 'message': 'Could not analyze security headers', 'location': 'Server', 'fix': 'Check security headers manually', 'action': 'manual_check'}],
'warnings': [{'type': 'warning', 'message': 'Security headers analysis failed', 'location': 'Server', 'fix': 'Verify security headers manually', 'action': 'manual_check'}],
'recommendations': [{'type': 'recommendation', 'message': 'Check security headers manually', 'priority': 'medium', 'action': 'manual_check'}]
}
class KeywordAnalyzer(BaseAnalyzer):
"""Analyzes keyword usage and optimization"""
def analyze(self, html_content: str, target_keywords: Optional[List[str]] = None) -> Dict[str, Any]:
"""Enhanced keyword analysis with specific locations"""
if not target_keywords:
return {'score': 0, 'issues': [], 'warnings': [], 'recommendations': []}
soup = BeautifulSoup(html_content, 'html.parser')
issues = []
warnings = []
recommendations = []
page_text = soup.get_text().lower()
title_text = soup.find('title')
title_text = title_text.get_text().lower() if title_text else ""
for keyword in target_keywords:
keyword_lower = keyword.lower()
# Check if keyword is in title
if keyword_lower not in title_text:
issues.append({
'type': 'critical',
'message': f'Target keyword "{keyword}" not in title',
'location': '<title>',
'current_value': title_text,
'fix': f'Include keyword "{keyword}" in title',
'code_example': f'<title>{keyword} - Your Page Title</title>',
'action': 'add_keyword_to_title'
})
# Check keyword density
keyword_count = page_text.count(keyword_lower)
if keyword_count == 0:
issues.append({
'type': 'critical',
'message': f'Target keyword "{keyword}" not found in content',
'location': 'Page content',
'current_value': '0 occurrences',
'fix': f'Include keyword "{keyword}" naturally in content',
'code_example': f'Add "{keyword}" to your page content',
'action': 'add_keyword_to_content'
})
elif keyword_count < 2:
warnings.append({
'type': 'warning',
'message': f'Target keyword "{keyword}" appears only {keyword_count} time(s)',
'location': 'Page content',
'current_value': f'{keyword_count} occurrence(s)',
'fix': f'Include keyword "{keyword}" more naturally',
'code_example': f'Add more instances of "{keyword}" to content',
'action': 'increase_keyword_density'
})
score = max(0, 100 - len(issues) * 25 - len(warnings) * 10)
return {
'score': score,
'issues': issues,
'warnings': warnings,
'recommendations': recommendations,
'target_keywords': target_keywords,
'keywords_found': [kw for kw in target_keywords if kw.lower() in page_text]
}

View File

@@ -0,0 +1,208 @@
"""
Core SEO Analyzer Module
Contains the main ComprehensiveSEOAnalyzer class and data structures.
"""
from datetime import datetime
from dataclasses import dataclass
from typing import Dict, List, Any, Optional
from loguru import logger
from .analyzers import (
URLStructureAnalyzer,
MetaDataAnalyzer,
ContentAnalyzer,
TechnicalSEOAnalyzer,
PerformanceAnalyzer,
AccessibilityAnalyzer,
UserExperienceAnalyzer,
SecurityHeadersAnalyzer,
KeywordAnalyzer
)
from .utils import HTMLFetcher, AIInsightGenerator
@dataclass
class SEOAnalysisResult:
"""Data class for SEO analysis results"""
url: str
timestamp: datetime
overall_score: int
health_status: str
critical_issues: List[Dict[str, Any]]
warnings: List[Dict[str, Any]]
recommendations: List[Dict[str, Any]]
data: Dict[str, Any]
class ComprehensiveSEOAnalyzer:
"""
Comprehensive SEO Analyzer
Orchestrates all individual analyzers to provide complete SEO analysis.
"""
def __init__(self):
"""Initialize the comprehensive SEO analyzer with all sub-analyzers"""
self.html_fetcher = HTMLFetcher()
self.ai_insight_generator = AIInsightGenerator()
# Initialize all analyzers
self.url_analyzer = URLStructureAnalyzer()
self.meta_analyzer = MetaDataAnalyzer()
self.content_analyzer = ContentAnalyzer()
self.technical_analyzer = TechnicalSEOAnalyzer()
self.performance_analyzer = PerformanceAnalyzer()
self.accessibility_analyzer = AccessibilityAnalyzer()
self.ux_analyzer = UserExperienceAnalyzer()
self.security_analyzer = SecurityHeadersAnalyzer()
self.keyword_analyzer = KeywordAnalyzer()
def analyze_url_progressive(self, url: str, target_keywords: Optional[List[str]] = None) -> SEOAnalysisResult:
"""
Progressive analysis method that runs all analyses with enhanced AI insights
"""
try:
logger.info(f"Starting enhanced SEO analysis for URL: {url}")
# Fetch HTML content
html_content = self.html_fetcher.fetch_html(url)
if not html_content:
return self._create_error_result(url, "Failed to fetch HTML content")
# Run all analyzers
analysis_data = {}
logger.info("Running enhanced analyses...")
analysis_data.update({
'url_structure': self.url_analyzer.analyze(url),
'meta_data': self.meta_analyzer.analyze(html_content, url),
'content_analysis': self.content_analyzer.analyze(html_content, url),
'keyword_analysis': self.keyword_analyzer.analyze(html_content, target_keywords) if target_keywords else {},
'technical_seo': self.technical_analyzer.analyze(html_content, url),
'accessibility': self.accessibility_analyzer.analyze(html_content),
'user_experience': self.ux_analyzer.analyze(html_content, url)
})
# Run potentially slower analyses with error handling
logger.info("Running security headers analysis...")
try:
analysis_data['security_headers'] = self.security_analyzer.analyze(url)
except Exception as e:
logger.warning(f"Security headers analysis failed: {e}")
analysis_data['security_headers'] = self._create_fallback_result('security_headers', str(e))
logger.info("Running performance analysis...")
try:
analysis_data['performance'] = self.performance_analyzer.analyze(url)
except Exception as e:
logger.warning(f"Performance analysis failed: {e}")
analysis_data['performance'] = self._create_fallback_result('performance', str(e))
# Generate AI-powered insights
ai_insights = self.ai_insight_generator.generate_insights(analysis_data, url)
# Calculate overall health
overall_score, health_status, critical_issues, warnings, recommendations = self._calculate_overall_health(analysis_data, ai_insights)
result = SEOAnalysisResult(
url=url,
timestamp=datetime.now(),
overall_score=overall_score,
health_status=health_status,
critical_issues=critical_issues,
warnings=warnings,
recommendations=recommendations,
data=analysis_data
)
logger.info(f"Enhanced SEO analysis completed for {url}. Overall score: {overall_score}")
return result
except Exception as e:
logger.error(f"Error in enhanced SEO analysis for {url}: {str(e)}")
return self._create_error_result(url, str(e))
def _calculate_overall_health(self, analysis_data: Dict[str, Any], ai_insights: List[Dict[str, Any]]) -> tuple:
"""Calculate overall health with enhanced scoring"""
scores = []
all_critical_issues = []
all_warnings = []
all_recommendations = []
for category, data in analysis_data.items():
if isinstance(data, dict) and 'score' in data:
scores.append(data['score'])
all_critical_issues.extend(data.get('issues', []))
all_warnings.extend(data.get('warnings', []))
all_recommendations.extend(data.get('recommendations', []))
# Calculate overall score
overall_score = sum(scores) // len(scores) if scores else 0
# Determine health status
if overall_score >= 80:
health_status = 'excellent'
elif overall_score >= 60:
health_status = 'good'
elif overall_score >= 40:
health_status = 'needs_improvement'
else:
health_status = 'poor'
# Add AI insights to recommendations
for insight in ai_insights:
all_recommendations.append({
'type': 'ai_insight',
'message': insight['message'],
'priority': insight['priority'],
'action': insight['action'],
'description': insight['description']
})
return overall_score, health_status, all_critical_issues, all_warnings, all_recommendations
def _create_fallback_result(self, category: str, error_message: str) -> Dict[str, Any]:
"""Create a fallback result when analysis fails"""
return {
'score': 0,
'error': f'{category} analysis failed: {error_message}',
'issues': [{
'type': 'critical',
'message': f'{category} analysis timed out',
'location': 'System',
'fix': f'Check {category} manually',
'action': 'manual_check'
}],
'warnings': [{
'type': 'warning',
'message': f'Could not analyze {category}',
'location': 'System',
'fix': f'Verify {category} manually',
'action': 'manual_check'
}],
'recommendations': [{
'type': 'recommendation',
'message': f'Check {category} manually',
'priority': 'medium',
'action': 'manual_check'
}]
}
def _create_error_result(self, url: str, error_message: str) -> SEOAnalysisResult:
"""Create error result with enhanced structure"""
return SEOAnalysisResult(
url=url,
timestamp=datetime.now(),
overall_score=0,
health_status='error',
critical_issues=[{
'type': 'critical',
'message': f'Analysis failed: {error_message}',
'location': 'System',
'fix': 'Check URL accessibility and try again',
'action': 'retry_analysis'
}],
warnings=[],
recommendations=[],
data={}
)

View File

@@ -0,0 +1,268 @@
"""
SEO Analysis Service
Handles storing and retrieving SEO analysis data from the database.
"""
from typing import Optional, List, Dict, Any
from datetime import datetime
from sqlalchemy.orm import Session
from sqlalchemy import func
from loguru import logger
from models.seo_analysis import (
SEOAnalysis,
SEOIssue,
SEOWarning,
SEORecommendation,
SEOCategoryScore,
SEOAnalysisHistory,
create_analysis_from_result,
create_issues_from_result,
create_warnings_from_result,
create_recommendations_from_result,
create_category_scores_from_result
)
from .core import SEOAnalysisResult
class SEOAnalysisService:
"""Service for managing SEO analysis data in the database."""
def __init__(self, db_session: Session):
self.db = db_session
def store_analysis_result(self, result: SEOAnalysisResult) -> Optional[SEOAnalysis]:
"""
Store SEO analysis result in the database.
Args:
result: SEOAnalysisResult from the analyzer
Returns:
Stored SEOAnalysis record or None if failed
"""
try:
# Create main analysis record
analysis_record = create_analysis_from_result(result)
self.db.add(analysis_record)
self.db.flush() # Get the ID
# Create related records
issues = create_issues_from_result(analysis_record.id, result)
warnings = create_warnings_from_result(analysis_record.id, result)
recommendations = create_recommendations_from_result(analysis_record.id, result)
category_scores = create_category_scores_from_result(analysis_record.id, result)
# Add all related records
for issue in issues:
self.db.add(issue)
for warning in warnings:
self.db.add(warning)
for recommendation in recommendations:
self.db.add(recommendation)
for score in category_scores:
self.db.add(score)
# Create history record
history_record = SEOAnalysisHistory(
url=result.url,
analysis_date=result.timestamp,
overall_score=result.overall_score,
health_status=result.health_status,
score_change=0, # Will be calculated later
critical_issues_count=len(result.critical_issues),
warnings_count=len(result.warnings),
recommendations_count=len(result.recommendations)
)
# Add category scores to history
for category, data in result.data.items():
if isinstance(data, dict) and 'score' in data:
if category == 'url_structure':
history_record.url_structure_score = data['score']
elif category == 'meta_data':
history_record.meta_data_score = data['score']
elif category == 'content_analysis':
history_record.content_score = data['score']
elif category == 'technical_seo':
history_record.technical_score = data['score']
elif category == 'performance':
history_record.performance_score = data['score']
elif category == 'accessibility':
history_record.accessibility_score = data['score']
elif category == 'user_experience':
history_record.user_experience_score = data['score']
elif category == 'security_headers':
history_record.security_score = data['score']
self.db.add(history_record)
self.db.commit()
logger.info(f"Stored SEO analysis for {result.url} with score {result.overall_score}")
return analysis_record
except Exception as e:
logger.error(f"Error storing SEO analysis: {str(e)}")
self.db.rollback()
return None
def get_latest_analysis(self, url: str) -> Optional[SEOAnalysis]:
"""
Get the latest SEO analysis for a URL.
Args:
url: The URL to get analysis for
Returns:
Latest SEOAnalysis record or None
"""
try:
return self.db.query(SEOAnalysis).filter(
SEOAnalysis.url == url
).order_by(SEOAnalysis.timestamp.desc()).first()
except Exception as e:
logger.error(f"Error getting latest analysis for {url}: {str(e)}")
return None
def get_analysis_history(self, url: str, limit: int = 10) -> List[SEOAnalysisHistory]:
"""
Get analysis history for a URL.
Args:
url: The URL to get history for
limit: Maximum number of records to return
Returns:
List of SEOAnalysisHistory records
"""
try:
return self.db.query(SEOAnalysisHistory).filter(
SEOAnalysisHistory.url == url
).order_by(SEOAnalysisHistory.analysis_date.desc()).limit(limit).all()
except Exception as e:
logger.error(f"Error getting analysis history for {url}: {str(e)}")
return []
def get_analysis_by_id(self, analysis_id: int) -> Optional[SEOAnalysis]:
"""
Get SEO analysis by ID.
Args:
analysis_id: The analysis ID
Returns:
SEOAnalysis record or None
"""
try:
return self.db.query(SEOAnalysis).filter(
SEOAnalysis.id == analysis_id
).first()
except Exception as e:
logger.error(f"Error getting analysis by ID {analysis_id}: {str(e)}")
return None
def get_all_analyses(self, limit: int = 50) -> List[SEOAnalysis]:
"""
Get all SEO analyses with pagination.
Args:
limit: Maximum number of records to return
Returns:
List of SEOAnalysis records
"""
try:
return self.db.query(SEOAnalysis).order_by(
SEOAnalysis.timestamp.desc()
).limit(limit).all()
except Exception as e:
logger.error(f"Error getting all analyses: {str(e)}")
return []
def delete_analysis(self, analysis_id: int) -> bool:
"""
Delete an SEO analysis.
Args:
analysis_id: The analysis ID to delete
Returns:
True if successful, False otherwise
"""
try:
analysis = self.db.query(SEOAnalysis).filter(
SEOAnalysis.id == analysis_id
).first()
if analysis:
self.db.delete(analysis)
self.db.commit()
logger.info(f"Deleted SEO analysis {analysis_id}")
return True
else:
logger.warning(f"Analysis {analysis_id} not found for deletion")
return False
except Exception as e:
logger.error(f"Error deleting analysis {analysis_id}: {str(e)}")
self.db.rollback()
return False
def get_analysis_statistics(self) -> Dict[str, Any]:
"""
Get overall statistics for SEO analyses.
Returns:
Dictionary with analysis statistics
"""
try:
total_analyses = self.db.query(SEOAnalysis).count()
total_urls = self.db.query(SEOAnalysis.url).distinct().count()
# Get average scores by health status
excellent_count = self.db.query(SEOAnalysis).filter(
SEOAnalysis.health_status == 'excellent'
).count()
good_count = self.db.query(SEOAnalysis).filter(
SEOAnalysis.health_status == 'good'
).count()
needs_improvement_count = self.db.query(SEOAnalysis).filter(
SEOAnalysis.health_status == 'needs_improvement'
).count()
poor_count = self.db.query(SEOAnalysis).filter(
SEOAnalysis.health_status == 'poor'
).count()
# Calculate average overall score
avg_score_result = self.db.query(
func.avg(SEOAnalysis.overall_score)
).scalar()
avg_score = float(avg_score_result) if avg_score_result else 0
return {
'total_analyses': total_analyses,
'total_urls': total_urls,
'average_score': round(avg_score, 2),
'health_distribution': {
'excellent': excellent_count,
'good': good_count,
'needs_improvement': needs_improvement_count,
'poor': poor_count
}
}
except Exception as e:
logger.error(f"Error getting analysis statistics: {str(e)}")
return {
'total_analyses': 0,
'total_urls': 0,
'average_score': 0,
'health_distribution': {
'excellent': 0,
'good': 0,
'needs_improvement': 0,
'poor': 0
}
}

View File

@@ -0,0 +1,106 @@
"""
SEO Analyzer Utilities
Contains utility classes for HTML fetching and AI insight generation.
"""
import requests
from typing import Optional, Dict, List, Any
from loguru import logger
class HTMLFetcher:
"""Utility class for fetching HTML content from URLs"""
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
})
def fetch_html(self, url: str) -> Optional[str]:
"""Fetch HTML content with error handling"""
try:
response = self.session.get(url, timeout=30)
response.raise_for_status()
return response.text
except Exception as e:
logger.error(f"Error fetching HTML from {url}: {e}")
return None
class AIInsightGenerator:
"""Utility class for generating AI-powered insights from analysis data"""
def generate_insights(self, analysis_data: Dict[str, Any], url: str) -> List[Dict[str, Any]]:
"""Generate AI-powered insights based on analysis data"""
insights = []
# Analyze overall performance
total_issues = sum(len(data.get('issues', [])) for data in analysis_data.values() if isinstance(data, dict))
total_warnings = sum(len(data.get('warnings', [])) for data in analysis_data.values() if isinstance(data, dict))
if total_issues > 5:
insights.append({
'type': 'critical',
'message': f'High number of critical issues ({total_issues}) detected',
'priority': 'high',
'action': 'fix_critical_issues',
'description': 'Multiple critical SEO issues need immediate attention to improve search rankings.'
})
# Content quality insights
content_data = analysis_data.get('content_analysis', {})
if content_data.get('word_count', 0) < 300:
insights.append({
'type': 'warning',
'message': 'Content is too thin for good SEO',
'priority': 'medium',
'action': 'expand_content',
'description': 'Add more valuable, relevant content to improve search rankings and user engagement.'
})
# Technical SEO insights
technical_data = analysis_data.get('technical_seo', {})
if not technical_data.get('has_canonical', False):
insights.append({
'type': 'critical',
'message': 'Missing canonical URL can cause duplicate content issues',
'priority': 'high',
'action': 'add_canonical',
'description': 'Canonical URLs help prevent duplicate content penalties.'
})
# Security insights
security_data = analysis_data.get('security_headers', {})
if security_data.get('total_headers', 0) < 3:
insights.append({
'type': 'warning',
'message': 'Insufficient security headers',
'priority': 'medium',
'action': 'improve_security',
'description': 'Security headers protect against common web vulnerabilities.'
})
# Performance insights
performance_data = analysis_data.get('performance', {})
if performance_data.get('load_time', 0) > 3:
insights.append({
'type': 'critical',
'message': 'Page load time is too slow',
'priority': 'high',
'action': 'optimize_performance',
'description': 'Slow loading pages negatively impact user experience and search rankings.'
})
# URL structure insights
url_data = analysis_data.get('url_structure', {})
if not url_data.get('has_https', False):
insights.append({
'type': 'critical',
'message': 'Website is not using HTTPS',
'priority': 'high',
'action': 'enable_https',
'description': 'HTTPS is required for security and is a ranking factor for search engines.'
})
return insights

View File

@@ -0,0 +1,143 @@
"""
User Data Service
Handles fetching user data from the onboarding database.
"""
from typing import Optional, List, Dict, Any
from sqlalchemy.orm import Session
from loguru import logger
from models.onboarding import OnboardingSession, WebsiteAnalysis, APIKey, ResearchPreferences
class UserDataService:
"""Service for managing user data from onboarding."""
def __init__(self, db_session: Session):
self.db = db_session
def get_user_website_url(self, user_id: int = 1) -> Optional[str]:
"""
Get the website URL for a user from their onboarding data.
Args:
user_id: The user ID (defaults to 1 for single-user setup)
Returns:
Website URL or None if not found
"""
try:
# Get the latest onboarding session for the user
session = self.db.query(OnboardingSession).filter(
OnboardingSession.user_id == user_id
).order_by(OnboardingSession.updated_at.desc()).first()
if not session:
logger.warning(f"No onboarding session found for user {user_id}")
return None
# Get the latest website analysis for this session
website_analysis = self.db.query(WebsiteAnalysis).filter(
WebsiteAnalysis.session_id == session.id
).order_by(WebsiteAnalysis.updated_at.desc()).first()
if not website_analysis:
logger.warning(f"No website analysis found for session {session.id}")
return None
logger.info(f"Found website URL: {website_analysis.website_url}")
return website_analysis.website_url
except Exception as e:
logger.error(f"Error getting user website URL: {str(e)}")
return None
def get_user_onboarding_data(self, user_id: int = 1) -> Optional[Dict[str, Any]]:
"""
Get comprehensive onboarding data for a user.
Args:
user_id: The user ID (defaults to 1 for single-user setup)
Returns:
Dictionary with onboarding data or None if not found
"""
try:
# Get the latest onboarding session
session = self.db.query(OnboardingSession).filter(
OnboardingSession.user_id == user_id
).order_by(OnboardingSession.updated_at.desc()).first()
if not session:
return None
# Get website analysis
website_analysis = self.db.query(WebsiteAnalysis).filter(
WebsiteAnalysis.session_id == session.id
).order_by(WebsiteAnalysis.updated_at.desc()).first()
# Get API keys
api_keys = self.db.query(APIKey).filter(
APIKey.session_id == session.id
).all()
# Get research preferences
research_preferences = self.db.query(ResearchPreferences).filter(
ResearchPreferences.session_id == session.id
).first()
return {
'session': {
'id': session.id,
'current_step': session.current_step,
'progress': session.progress,
'started_at': session.started_at.isoformat() if session.started_at else None,
'updated_at': session.updated_at.isoformat() if session.updated_at else None
},
'website_analysis': website_analysis.to_dict() if website_analysis else None,
'api_keys': [
{
'id': key.id,
'provider': key.provider,
'created_at': key.created_at.isoformat() if key.created_at else None
}
for key in api_keys
],
'research_preferences': research_preferences.to_dict() if research_preferences else None
}
except Exception as e:
logger.error(f"Error getting user onboarding data: {str(e)}")
return None
def get_user_website_analysis(self, user_id: int = 1) -> Optional[Dict[str, Any]]:
"""
Get website analysis data for a user.
Args:
user_id: The user ID (defaults to 1 for single-user setup)
Returns:
Website analysis data or None if not found
"""
try:
# Get the latest onboarding session
session = self.db.query(OnboardingSession).filter(
OnboardingSession.user_id == user_id
).order_by(OnboardingSession.updated_at.desc()).first()
if not session:
return None
# Get website analysis
website_analysis = self.db.query(WebsiteAnalysis).filter(
WebsiteAnalysis.session_id == session.id
).order_by(WebsiteAnalysis.updated_at.desc()).first()
if not website_analysis:
return None
return website_analysis.to_dict()
except Exception as e:
logger.error(f"Error getting user website analysis: {str(e)}")
return None

View File

@@ -0,0 +1,376 @@
"""Enhanced validation service for ALwrity backend."""
import os
import re
from typing import Dict, Any, List, Tuple
from loguru import logger
from dotenv import load_dotenv
def check_all_api_keys(api_manager) -> Dict[str, Any]:
"""Enhanced API key validation with comprehensive checking.
Args:
api_manager: The API key manager instance
Returns:
Dict[str, Any]: Comprehensive validation results
"""
try:
logger.info("Starting comprehensive API key validation process...")
# Load environment variables
current_dir = os.getcwd()
env_path = os.path.join(current_dir, '.env')
logger.info(f"Looking for .env file at: {env_path}")
# Check if .env file exists
if not os.path.exists(env_path):
logger.warning(f".env file not found at {env_path}")
# Continue without .env file for now
# Load environment variables if file exists
if os.path.exists(env_path):
load_dotenv(env_path, override=True)
logger.debug("Environment variables loaded")
# Log available environment variables
logger.debug("Available environment variables:")
for key in os.environ.keys():
if any(provider in key for provider in ['API_KEY', 'SERPAPI', 'TAVILY', 'METAPHOR', 'FIRECRAWL']):
logger.debug(f"Found environment variable: {key}")
# Step 1: Check for at least one AI provider
logger.info("Checking AI provider API keys...")
ai_providers = [
'OPENAI_API_KEY',
'GEMINI_API_KEY',
'ANTHROPIC_API_KEY',
'MISTRAL_API_KEY'
]
ai_provider_results = {}
has_ai_provider = False
for provider in ai_providers:
value = os.getenv(provider)
if value:
validation_result = validate_api_key(provider.lower().replace('_api_key', ''), value)
ai_provider_results[provider] = validation_result
if validation_result.get('valid', False):
has_ai_provider = True
logger.info(f"Found valid {provider} (length: {len(value)})")
else:
logger.warning(f"Found invalid {provider}: {validation_result.get('error', 'Unknown error')}")
else:
ai_provider_results[provider] = {
'valid': False,
'error': 'API key not configured'
}
logger.debug(f"Missing {provider}")
# Step 2: Check for at least one research provider
logger.info("Checking research provider API keys...")
research_providers = [
'SERPAPI_KEY',
'TAVILY_API_KEY',
'METAPHOR_API_KEY',
'FIRECRAWL_API_KEY'
]
research_provider_results = {}
has_research_provider = False
for provider in research_providers:
value = os.getenv(provider)
if value:
validation_result = validate_api_key(provider.lower().replace('_key', ''), value)
research_provider_results[provider] = validation_result
if validation_result.get('valid', False):
has_research_provider = True
logger.info(f"Found valid {provider} (length: {len(value)})")
else:
logger.warning(f"Found invalid {provider}: {validation_result.get('error', 'Unknown error')}")
else:
research_provider_results[provider] = {
'valid': False,
'error': 'API key not configured'
}
logger.debug(f"Missing {provider}")
# Step 3: Check for website URL
logger.info("Checking website URL...")
website_url = os.getenv('WEBSITE_URL')
website_valid = False
if website_url:
website_valid = validate_website_url(website_url)
if website_valid:
logger.success(f"✓ Website URL found and valid: {website_url}")
else:
logger.warning(f"Website URL found but invalid: {website_url}")
else:
logger.warning("No website URL found in environment variables")
# Step 4: Check for personalization status
logger.info("Checking personalization status...")
personalization_done = os.getenv('PERSONALIZATION_DONE', 'false').lower() == 'true'
if personalization_done:
logger.success("✓ Personalization completed")
else:
logger.warning("Personalization not completed")
# Step 5: Check for integration status
logger.info("Checking integration status...")
integration_done = os.getenv('INTEGRATION_DONE', 'false').lower() == 'true'
if integration_done:
logger.success("✓ Integrations completed")
else:
logger.warning("Integrations not completed")
# Step 6: Check for final setup status
logger.info("Checking final setup status...")
final_setup_complete = os.getenv('FINAL_SETUP_COMPLETE', 'false').lower() == 'true'
if final_setup_complete:
logger.success("✓ Final setup completed successfully")
else:
logger.warning("Final setup not completed")
# Determine overall validation status
all_valid = (
has_ai_provider and
has_research_provider and
website_valid and
personalization_done and
integration_done and
final_setup_complete
)
if all_valid:
logger.success("All required API keys and setup steps validated successfully!")
else:
logger.warning("Some validation checks failed")
return {
'all_valid': all_valid,
'results': {
'ai_providers': ai_provider_results,
'research_providers': research_provider_results,
'website_url': {
'valid': website_valid,
'url': website_url,
'error': None if website_valid else 'Invalid or missing website URL'
},
'personalization': {
'valid': personalization_done,
'status': 'completed' if personalization_done else 'pending'
},
'integrations': {
'valid': integration_done,
'status': 'completed' if integration_done else 'pending'
},
'final_setup': {
'valid': final_setup_complete,
'status': 'completed' if final_setup_complete else 'pending'
}
},
'summary': {
'has_ai_provider': has_ai_provider,
'has_research_provider': has_research_provider,
'website_valid': website_valid,
'personalization_done': personalization_done,
'integration_done': integration_done,
'final_setup_complete': final_setup_complete
}
}
except Exception as e:
logger.error(f"Error checking API keys: {str(e)}", exc_info=True)
return {
'all_valid': False,
'error': str(e),
'results': {}
}
def validate_api_key(provider: str, api_key: str) -> Dict[str, Any]:
"""Enhanced API key validation with provider-specific checks."""
try:
if not api_key or len(api_key.strip()) < 10:
return {'valid': False, 'error': 'API key too short or empty'}
# Provider-specific format validation
if provider == "openai":
if not api_key.startswith("sk-"):
return {'valid': False, 'error': 'OpenAI API key must start with "sk-"'}
if len(api_key) < 20:
return {'valid': False, 'error': 'OpenAI API key seems too short'}
elif provider == "gemini":
if not api_key.startswith("AIza"):
return {'valid': False, 'error': 'Google API key must start with "AIza"'}
if len(api_key) < 30:
return {'valid': False, 'error': 'Google API key seems too short'}
elif provider == "anthropic":
if not api_key.startswith("sk-ant-"):
return {'valid': False, 'error': 'Anthropic API key must start with "sk-ant-"'}
if len(api_key) < 20:
return {'valid': False, 'error': 'Anthropic API key seems too short'}
elif provider == "mistral":
if not api_key.startswith("mistral-"):
return {'valid': False, 'error': 'Mistral API key must start with "mistral-"'}
if len(api_key) < 20:
return {'valid': False, 'error': 'Mistral API key seems too short'}
elif provider == "tavily":
if len(api_key) < 10:
return {'valid': False, 'error': 'Tavily API key seems too short'}
elif provider == "serper":
if len(api_key) < 10:
return {'valid': False, 'error': 'Serper API key seems too short'}
elif provider == "metaphor":
if len(api_key) < 10:
return {'valid': False, 'error': 'Metaphor API key seems too short'}
elif provider == "firecrawl":
if len(api_key) < 10:
return {'valid': False, 'error': 'Firecrawl API key seems too short'}
else:
# Generic validation for unknown providers
if len(api_key) < 10:
return {'valid': False, 'error': 'API key seems too short'}
return {'valid': True, 'error': None}
except Exception as e:
logger.error(f"Error validating {provider} API key: {str(e)}")
return {'valid': False, 'error': f'Validation error: {str(e)}'}
def validate_website_url(url: str) -> bool:
"""Validate website URL format and accessibility."""
try:
if not url:
return False
# Basic URL format validation
url_pattern = re.compile(
r'^https?://' # http:// or https://
r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+[A-Z]{2,6}\.?|' # domain...
r'localhost|' # localhost...
r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
r'(?::\d+)?' # optional port
r'(?:/?|[/?]\S+)$', re.IGNORECASE)
if not url_pattern.match(url):
return False
# Additional checks can be added here (accessibility, content, etc.)
return True
except Exception as e:
logger.error(f"Error validating website URL: {str(e)}")
return False
def validate_step_data(step_number: int, data: Dict[str, Any]) -> List[str]:
"""Validate step-specific data with enhanced logic."""
errors = []
if step_number == 1: # AI LLM Providers
if not data or 'api_keys' not in data:
errors.append("At least one API key must be configured")
elif not data['api_keys']:
errors.append("At least one API key must be configured")
else:
# Validate each configured API key
for provider in data['api_keys']:
if provider not in ['openai', 'gemini', 'anthropic', 'mistral']:
errors.append(f"Unknown provider: {provider}")
elif step_number == 2: # Website Analysis
if not data or 'website_url' not in data:
errors.append("Website URL is required")
elif not validate_website_url(data['website_url']):
errors.append("Invalid website URL format")
elif step_number == 3: # AI Research
if not data or 'research_providers' not in data:
errors.append("At least one research provider must be configured")
elif not data['research_providers']:
errors.append("At least one research provider must be configured")
elif step_number == 4: # Personalization
# Optional step, no validation required
pass
elif step_number == 5: # Integrations
# Optional step, no validation required
pass
elif step_number == 6: # Complete Setup
# This step requires all previous steps to be completed
# Validation is handled by the progress tracking system
pass
return errors
def validate_environment_setup() -> Dict[str, Any]:
"""Validate the overall environment setup."""
issues = []
warnings = []
# Check for required directories
required_dirs = [
"lib/workspace/alwrity_content",
"lib/workspace/alwrity_web_research",
"lib/workspace/alwrity_prompts",
"lib/workspace/alwrity_config"
]
for dir_path in required_dirs:
if not os.path.exists(dir_path):
try:
os.makedirs(dir_path, exist_ok=True)
warnings.append(f"Created missing directory: {dir_path}")
except Exception as e:
issues.append(f"Cannot create directory {dir_path}: {str(e)}")
# Check for .env file
if not os.path.exists(".env"):
warnings.append(".env file not found. API keys will need to be configured.")
# Check for write permissions
try:
test_file = ".test_write_permission"
with open(test_file, 'w') as f:
f.write("test")
os.remove(test_file)
except Exception as e:
issues.append(f"Cannot write to current directory: {str(e)}")
return {
'valid': len(issues) == 0,
'issues': issues,
'warnings': warnings
}
def validate_api_key_format(provider: str, api_key: str) -> bool:
"""Quick format validation for API keys."""
if not api_key or len(api_key.strip()) < 10:
return False
# Provider-specific format checks
if provider == "openai" and not api_key.startswith("sk-"):
return False
if provider == "gemini" and not api_key.startswith("AIza"):
return False
if provider == "anthropic" and not api_key.startswith("sk-ant-"):
return False
if provider == "mistral" and not api_key.startswith("mistral-"):
return False
return True

View File

@@ -0,0 +1,263 @@
"""
Website Analysis Service for Onboarding Step 2
Handles storage and retrieval of website analysis results.
"""
from typing import Dict, Any, Optional, List
from sqlalchemy.orm import Session
from sqlalchemy.exc import SQLAlchemyError
from datetime import datetime
import json
from loguru import logger
from models.onboarding import WebsiteAnalysis, OnboardingSession
class WebsiteAnalysisService:
"""Service for managing website analysis data during onboarding."""
def __init__(self, db_session: Session):
"""Initialize the service with database session."""
self.db = db_session
def save_analysis(self, session_id: int, website_url: str, analysis_data: Dict[str, Any]) -> Optional[int]:
"""
Save website analysis results to database.
Args:
session_id: Onboarding session ID
website_url: The analyzed website URL
analysis_data: Complete analysis results from style detection
Returns:
Analysis ID if successful, None otherwise
"""
try:
# Check if analysis already exists for this URL and session
existing_analysis = self.db.query(WebsiteAnalysis).filter_by(
session_id=session_id,
website_url=website_url
).first()
if existing_analysis:
# Update existing analysis
existing_analysis.writing_style = analysis_data.get('style_analysis', {}).get('writing_style')
existing_analysis.content_characteristics = analysis_data.get('style_analysis', {}).get('content_characteristics')
existing_analysis.target_audience = analysis_data.get('style_analysis', {}).get('target_audience')
existing_analysis.content_type = analysis_data.get('style_analysis', {}).get('content_type')
existing_analysis.recommended_settings = analysis_data.get('style_analysis', {}).get('recommended_settings')
existing_analysis.crawl_result = analysis_data.get('crawl_result')
existing_analysis.style_patterns = analysis_data.get('style_patterns')
existing_analysis.style_guidelines = analysis_data.get('style_guidelines')
existing_analysis.status = 'completed'
existing_analysis.error_message = None
existing_analysis.warning_message = analysis_data.get('warning')
existing_analysis.updated_at = datetime.utcnow()
self.db.commit()
logger.info(f"Updated existing analysis for URL: {website_url}")
return existing_analysis.id
else:
# Create new analysis
analysis = WebsiteAnalysis(
session_id=session_id,
website_url=website_url,
writing_style=analysis_data.get('style_analysis', {}).get('writing_style'),
content_characteristics=analysis_data.get('style_analysis', {}).get('content_characteristics'),
target_audience=analysis_data.get('style_analysis', {}).get('target_audience'),
content_type=analysis_data.get('style_analysis', {}).get('content_type'),
recommended_settings=analysis_data.get('style_analysis', {}).get('recommended_settings'),
crawl_result=analysis_data.get('crawl_result'),
style_patterns=analysis_data.get('style_patterns'),
style_guidelines=analysis_data.get('style_guidelines'),
status='completed',
warning_message=analysis_data.get('warning')
)
self.db.add(analysis)
self.db.commit()
logger.info(f"Saved new analysis for URL: {website_url}")
return analysis.id
except SQLAlchemyError as e:
self.db.rollback()
logger.error(f"Error saving website analysis: {str(e)}")
return None
def get_analysis(self, analysis_id: int) -> Optional[Dict[str, Any]]:
"""
Retrieve website analysis by ID.
Args:
analysis_id: Analysis ID
Returns:
Analysis data dictionary or None if not found
"""
try:
analysis = self.db.query(WebsiteAnalysis).get(analysis_id)
if analysis:
return analysis.to_dict()
return None
except SQLAlchemyError as e:
logger.error(f"Error retrieving analysis {analysis_id}: {str(e)}")
return None
def get_analysis_by_url(self, session_id: int, website_url: str) -> Optional[Dict[str, Any]]:
"""
Get analysis for a specific URL in a session.
Args:
session_id: Onboarding session ID
website_url: Website URL
Returns:
Analysis data dictionary or None if not found
"""
try:
analysis = self.db.query(WebsiteAnalysis).filter_by(
session_id=session_id,
website_url=website_url
).first()
if analysis:
return analysis.to_dict()
return None
except SQLAlchemyError as e:
logger.error(f"Error retrieving analysis for URL {website_url}: {str(e)}")
return None
def get_session_analyses(self, session_id: int) -> List[Dict[str, Any]]:
"""
Get all analyses for a session.
Args:
session_id: Onboarding session ID
Returns:
List of analysis summaries
"""
try:
analyses = self.db.query(WebsiteAnalysis).filter_by(
session_id=session_id
).order_by(WebsiteAnalysis.created_at.desc()).all()
return [analysis.to_dict() for analysis in analyses]
except SQLAlchemyError as e:
logger.error(f"Error retrieving analyses for session {session_id}: {str(e)}")
return []
def get_analysis_by_session(self, session_id: int) -> Optional[Dict[str, Any]]:
"""
Get the latest analysis for a session.
Args:
session_id: Onboarding session ID
Returns:
Latest analysis data or None if not found
"""
try:
analysis = self.db.query(WebsiteAnalysis).filter_by(
session_id=session_id
).order_by(WebsiteAnalysis.created_at.desc()).first()
if analysis:
return analysis.to_dict()
return None
except SQLAlchemyError as e:
logger.error(f"Error retrieving latest analysis for session {session_id}: {str(e)}")
return None
def check_existing_analysis(self, session_id: int, website_url: str) -> Optional[Dict[str, Any]]:
"""
Check if analysis exists for a URL and return it if found.
Used for confirmation dialog in frontend.
Args:
session_id: Onboarding session ID
website_url: Website URL
Returns:
Analysis data if found, None otherwise
"""
try:
analysis = self.db.query(WebsiteAnalysis).filter_by(
session_id=session_id,
website_url=website_url
).first()
if analysis and analysis.status == 'completed':
return {
'exists': True,
'analysis_date': analysis.analysis_date.isoformat() if analysis.analysis_date else None,
'analysis_id': analysis.id,
'summary': {
'writing_style': analysis.writing_style,
'target_audience': analysis.target_audience,
'content_type': analysis.content_type
}
}
return {'exists': False}
except SQLAlchemyError as e:
logger.error(f"Error checking existing analysis for URL {website_url}: {str(e)}")
return {'exists': False, 'error': str(e)}
def delete_analysis(self, analysis_id: int) -> bool:
"""
Delete a website analysis.
Args:
analysis_id: Analysis ID
Returns:
True if successful, False otherwise
"""
try:
analysis = self.db.query(WebsiteAnalysis).get(analysis_id)
if analysis:
self.db.delete(analysis)
self.db.commit()
logger.info(f"Deleted analysis {analysis_id}")
return True
return False
except SQLAlchemyError as e:
self.db.rollback()
logger.error(f"Error deleting analysis {analysis_id}: {str(e)}")
return False
def save_error_analysis(self, session_id: int, website_url: str, error_message: str) -> Optional[int]:
"""
Save analysis record with error status.
Args:
session_id: Onboarding session ID
website_url: Website URL
error_message: Error message
Returns:
Analysis ID if successful, None otherwise
"""
try:
analysis = WebsiteAnalysis(
session_id=session_id,
website_url=website_url,
status='failed',
error_message=error_message
)
self.db.add(analysis)
self.db.commit()
logger.info(f"Saved error analysis for URL: {website_url}")
return analysis.id
except SQLAlchemyError as e:
self.db.rollback()
logger.error(f"Error saving error analysis: {str(e)}")
return None