Base code

This commit is contained in:
Kunthawat Greethong
2026-01-08 22:39:53 +07:00
parent 697115c61a
commit c35fa52117
2169 changed files with 626670 additions and 0 deletions

View File

@@ -0,0 +1,288 @@
# SEO Analyzer Module
A comprehensive, modular SEO analysis system for web applications that provides detailed insights and actionable recommendations for improving search engine optimization.
## 🚀 Features
### ✅ **Currently Implemented**
#### **Core Analysis Components**
- **URL Structure Analysis**: Checks URL length, HTTPS usage, special characters, and URL formatting
- **Meta Data Analysis**: Analyzes title tags, meta descriptions, viewport settings, and character encoding
- **Content Analysis**: Evaluates content quality, word count, heading structure, and readability
- **Technical SEO Analysis**: Checks robots.txt, sitemaps, structured data, and canonical URLs
- **Performance Analysis**: Measures page load speed, compression, caching, and optimization
- **Accessibility Analysis**: Ensures alt text, form labels, heading structure, and color contrast
- **User Experience Analysis**: Checks mobile responsiveness, navigation, contact info, and social links
- **Security Headers Analysis**: Analyzes security headers for protection against common vulnerabilities
- **Keyword Analysis**: Evaluates keyword usage and optimization for target keywords
#### **AI-Powered Insights**
- **Intelligent Issue Detection**: Automatically identifies critical SEO problems
- **Actionable Recommendations**: Provides specific fixes with code examples
- **Priority-Based Suggestions**: Categorizes issues by severity and impact
- **Context-Aware Solutions**: Offers location-specific fixes and improvements
#### **Advanced Features**
- **Progressive Analysis**: Runs faster analyses first, then slower ones with graceful fallbacks
- **Timeout Handling**: Robust error handling for network issues and timeouts
- **Detailed Reporting**: Comprehensive analysis with scores, issues, warnings, and recommendations
- **Modular Architecture**: Reusable components for easy maintenance and extension
### 🔄 **Coming Soon**
#### **Enhanced Analysis Features**
- **Core Web Vitals Analysis**: LCP, FID, CLS measurements
- **Mobile-First Analysis**: Comprehensive mobile optimization checks
- **Schema Markup Validation**: Advanced structured data analysis
- **Image Optimization Analysis**: Alt text, compression, and format recommendations
- **Internal Linking Analysis**: Site structure and internal link optimization
- **Social Media Optimization**: Open Graph and Twitter Card analysis
#### **AI-Powered Enhancements**
- **Natural Language Processing**: Advanced content analysis using NLP
- **Competitive Analysis**: Compare against competitor websites
- **Trend Analysis**: Identify SEO trends and opportunities
- **Predictive Insights**: Forecast potential ranking improvements
- **Automated Fix Generation**: AI-generated code fixes and optimizations
#### **Advanced Features**
- **Bulk Analysis**: Analyze multiple URLs simultaneously
- **Historical Tracking**: Monitor SEO improvements over time
- **Custom Rule Engine**: User-defined analysis rules and thresholds
- **API Integration**: Connect with Google Search Console, Analytics, and other tools
- **White-Label Support**: Customizable branding and reporting
#### **Enterprise Features**
- **Multi-User Support**: Team collaboration and role-based access
- **Advanced Reporting**: Custom dashboards and detailed analytics
- **API Rate Limiting**: Intelligent request management
- **Caching System**: Optimized performance for repeated analyses
- **Webhook Support**: Real-time notifications and integrations
## 📁 **Module Structure**
```
seo_analyzer/
├── __init__.py # Package initialization and exports
├── core.py # Main analyzer class and data structures
├── analyzers.py # Individual analysis components
├── utils.py # Utility classes (HTML fetcher, AI insights)
├── service.py # Database service for storing/retrieving results
└── README.md # This documentation
```
### **Core Components**
#### **`core.py`**
- `ComprehensiveSEOAnalyzer`: Main orchestrator class
- `SEOAnalysisResult`: Data structure for analysis results
- Progressive analysis with error handling
#### **`analyzers.py`**
- `BaseAnalyzer`: Base class for all analyzers
- `URLStructureAnalyzer`: URL analysis and security checks
- `MetaDataAnalyzer`: Meta tags and technical SEO
- `ContentAnalyzer`: Content quality and structure
- `TechnicalSEOAnalyzer`: Technical SEO elements
- `PerformanceAnalyzer`: Page speed and optimization
- `AccessibilityAnalyzer`: Accessibility compliance
- `UserExperienceAnalyzer`: UX and mobile optimization
- `SecurityHeadersAnalyzer`: Security header analysis
- `KeywordAnalyzer`: Keyword optimization
#### **`utils.py`**
- `HTMLFetcher`: Robust HTML content fetching
- `AIInsightGenerator`: AI-powered insights generation
#### **`service.py`**
- `SEOAnalysisService`: Database operations for storing and retrieving analysis results
- Analysis history tracking
- Statistics and reporting
- CRUD operations for analysis data
## 🛠 **Usage**
### **Basic Usage**
```python
from services.seo_analyzer import ComprehensiveSEOAnalyzer
# Initialize analyzer
analyzer = ComprehensiveSEOAnalyzer()
# Analyze a URL
result = analyzer.analyze_url_progressive(
url="https://example.com",
target_keywords=["seo", "optimization"]
)
# Access results
print(f"Overall Score: {result.overall_score}")
print(f"Health Status: {result.health_status}")
print(f"Critical Issues: {len(result.critical_issues)}")
```
### **Individual Analyzer Usage**
```python
from services.seo_analyzer import URLStructureAnalyzer, MetaDataAnalyzer
# URL analysis
url_analyzer = URLStructureAnalyzer()
url_result = url_analyzer.analyze("https://example.com")
# Meta data analysis
meta_analyzer = MetaDataAnalyzer()
meta_result = meta_analyzer.analyze(html_content, "https://example.com")
```
## 📊 **Analysis Categories**
### **URL Structure & Security**
- URL length optimization
- HTTPS implementation
- Special character handling
- URL readability and formatting
### **Meta Data & Technical SEO**
- Title tag optimization (30-60 characters)
- Meta description analysis (70-160 characters)
- Viewport meta tag presence
- Character encoding declaration
### **Content Analysis**
- Word count evaluation (minimum 300 words)
- Heading hierarchy (H1, H2, H3 structure)
- Image alt text compliance
- Internal linking analysis
- Spelling error detection
### **Technical SEO**
- Robots.txt accessibility
- XML sitemap presence
- Structured data markup
- Canonical URL implementation
### **Performance**
- Page load time measurement
- GZIP compression detection
- Caching header analysis
- Resource optimization recommendations
### **Accessibility**
- Image alt text compliance
- Form label associations
- Heading hierarchy validation
- Color contrast recommendations
### **User Experience**
- Mobile responsiveness checks
- Navigation menu analysis
- Contact information presence
- Social media link integration
### **Security Headers**
- X-Frame-Options
- X-Content-Type-Options
- X-XSS-Protection
- Strict-Transport-Security
- Content-Security-Policy
- Referrer-Policy
### **Keyword Analysis**
- Title keyword presence
- Content keyword density
- Natural keyword integration
- Target keyword optimization
## 🎯 **Scoring System**
### **Overall Health Status**
- **Excellent (80-100)**: Optimal SEO performance
- **Good (60-79)**: Good performance with minor improvements needed
- **Needs Improvement (40-59)**: Significant issues requiring attention
- **Poor (0-39)**: Critical issues requiring immediate action
### **Issue Categories**
- **Critical Issues**: Major problems affecting rankings (25 points each)
- **Warnings**: Important improvements for better performance (10 points each)
- **Recommendations**: Optional enhancements for optimal results
## 🔧 **Configuration**
### **Timeout Settings**
- HTML Fetching: 30 seconds
- Security Headers: 15 seconds
- Performance Analysis: 20 seconds
- Progressive Analysis: Graceful fallbacks
### **Scoring Thresholds**
- URL Length: 2000 characters maximum
- Title Length: 30-60 characters optimal
- Meta Description: 70-160 characters optimal
- Content Length: 300 words minimum
- Load Time: 3 seconds maximum
## 🚀 **Performance Features**
### **Progressive Analysis**
1. **Fast Analyses**: URL structure, meta data, content, technical SEO, accessibility, UX
2. **Slower Analyses**: Security headers, performance (with timeout handling)
3. **Graceful Fallbacks**: Partial results when analyses fail
### **Error Handling**
- Network timeout management
- Partial result generation
- Detailed error reporting
- Fallback recommendations
## 📈 **Future Roadmap**
### **Phase 1 (Q1 2024)**
- [ ] Core Web Vitals integration
- [ ] Enhanced mobile analysis
- [ ] Schema markup validation
- [ ] Image optimization analysis
### **Phase 2 (Q2 2024)**
- [ ] NLP-powered content analysis
- [ ] Competitive analysis features
- [ ] Bulk analysis capabilities
- [ ] Historical tracking
### **Phase 3 (Q3 2024)**
- [ ] Predictive insights
- [ ] Automated fix generation
- [ ] API integrations
- [ ] White-label support
### **Phase 4 (Q4 2024)**
- [ ] Enterprise features
- [ ] Advanced reporting
- [ ] Multi-user support
- [ ] Webhook integrations
## 🤝 **Contributing**
### **Adding New Analyzers**
1. Create a new analyzer class inheriting from `BaseAnalyzer`
2. Implement the `analyze()` method
3. Return standardized result format
4. Add to the main orchestrator in `core.py`
### **Extending Existing Features**
1. Follow the modular architecture
2. Maintain backward compatibility
3. Add comprehensive error handling
4. Include detailed documentation
## 📝 **License**
This module is part of the AI-Writer project and follows the same licensing terms.
---
**Version**: 1.0.0
**Last Updated**: January 2024
**Maintainer**: AI-Writer Team

View File

@@ -0,0 +1,52 @@
"""
SEO Analyzer Package
A comprehensive, modular SEO analysis system for web applications.
This package provides:
- URL structure analysis
- Meta data analysis
- Content analysis
- Technical SEO analysis
- Performance analysis
- Accessibility analysis
- User experience analysis
- Security headers analysis
- Keyword analysis
- AI-powered insights generation
- Database service for storing and retrieving analysis results
"""
from .core import ComprehensiveSEOAnalyzer, SEOAnalysisResult
from .analyzers import (
URLStructureAnalyzer,
MetaDataAnalyzer,
ContentAnalyzer,
TechnicalSEOAnalyzer,
PerformanceAnalyzer,
AccessibilityAnalyzer,
UserExperienceAnalyzer,
SecurityHeadersAnalyzer,
KeywordAnalyzer
)
from .utils import HTMLFetcher, AIInsightGenerator
from .service import SEOAnalysisService
__version__ = "1.0.0"
__author__ = "AI-Writer Team"
__all__ = [
'ComprehensiveSEOAnalyzer',
'SEOAnalysisResult',
'URLStructureAnalyzer',
'MetaDataAnalyzer',
'ContentAnalyzer',
'TechnicalSEOAnalyzer',
'PerformanceAnalyzer',
'AccessibilityAnalyzer',
'UserExperienceAnalyzer',
'SecurityHeadersAnalyzer',
'KeywordAnalyzer',
'HTMLFetcher',
'AIInsightGenerator',
'SEOAnalysisService'
]

View File

@@ -0,0 +1,796 @@
"""
SEO Analyzers Module
Contains all individual SEO analysis components.
"""
import re
import time
import requests
from urllib.parse import urlparse, urljoin
from typing import Dict, List, Any, Optional
from bs4 import BeautifulSoup
from loguru import logger
class BaseAnalyzer:
"""Base class for all SEO analyzers"""
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
})
class URLStructureAnalyzer(BaseAnalyzer):
"""Analyzes URL structure and security"""
def analyze(self, url: str) -> Dict[str, Any]:
"""Enhanced URL structure analysis with specific fixes"""
parsed = urlparse(url)
issues = []
warnings = []
recommendations = []
# Check URL length
if len(url) > 2000:
issues.append({
'type': 'critical',
'message': f'URL is too long ({len(url)} characters)',
'location': 'URL',
'current_value': url,
'fix': 'Shorten URL to under 2000 characters',
'code_example': f'<a href="/shorter-path">Link</a>',
'action': 'shorten_url'
})
# Check for hyphens
if '_' in parsed.path and '-' not in parsed.path:
issues.append({
'type': 'critical',
'message': 'URL uses underscores instead of hyphens',
'location': 'URL',
'current_value': parsed.path,
'fix': 'Replace underscores with hyphens',
'code_example': f'<a href="{parsed.path.replace("_", "-")}">Link</a>',
'action': 'replace_underscores'
})
# Check for special characters
special_chars = re.findall(r'[^a-zA-Z0-9\-_/]', parsed.path)
if special_chars:
warnings.append({
'type': 'warning',
'message': f'URL contains special characters: {", ".join(set(special_chars))}',
'location': 'URL',
'current_value': parsed.path,
'fix': 'Remove special characters from URL',
'code_example': f'<a href="/clean-url">Link</a>',
'action': 'remove_special_chars'
})
# Check for HTTPS
if parsed.scheme != 'https':
issues.append({
'type': 'critical',
'message': 'URL is not using HTTPS',
'location': 'URL',
'current_value': parsed.scheme,
'fix': 'Redirect to HTTPS',
'code_example': 'RewriteEngine On\nRewriteCond %{HTTPS} off\nRewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]',
'action': 'enable_https'
})
score = max(0, 100 - len(issues) * 25 - len(warnings) * 10)
return {
'score': score,
'issues': issues,
'warnings': warnings,
'recommendations': recommendations,
'url_length': len(url),
'has_https': parsed.scheme == 'https',
'has_hyphens': '-' in parsed.path,
'special_chars_count': len(special_chars)
}
class MetaDataAnalyzer(BaseAnalyzer):
"""Analyzes meta data and technical SEO elements"""
def analyze(self, html_content: str, url: str) -> Dict[str, Any]:
"""Enhanced meta data analysis with specific element locations"""
soup = BeautifulSoup(html_content, 'html.parser')
issues = []
warnings = []
recommendations = []
# Title analysis
title_tag = soup.find('title')
if not title_tag:
issues.append({
'type': 'critical',
'message': 'Missing title tag',
'location': '<head>',
'fix': 'Add title tag to head section',
'code_example': '<title>Your Page Title</title>',
'action': 'add_title_tag'
})
else:
title_text = title_tag.get_text().strip()
if len(title_text) < 30:
warnings.append({
'type': 'warning',
'message': f'Title too short ({len(title_text)} characters)',
'location': '<title>',
'current_value': title_text,
'fix': 'Make title 30-60 characters',
'code_example': f'<title>{title_text} - Additional Context</title>',
'action': 'extend_title'
})
elif len(title_text) > 60:
warnings.append({
'type': 'warning',
'message': f'Title too long ({len(title_text)} characters)',
'location': '<title>',
'current_value': title_text,
'fix': 'Shorten title to 30-60 characters',
'code_example': f'<title>{title_text[:55]}...</title>',
'action': 'shorten_title'
})
# Meta description analysis
meta_desc = soup.find('meta', attrs={'name': 'description'})
if not meta_desc:
issues.append({
'type': 'critical',
'message': 'Missing meta description',
'location': '<head>',
'fix': 'Add meta description',
'code_example': '<meta name="description" content="Your page description here">',
'action': 'add_meta_description'
})
else:
desc_content = meta_desc.get('content', '').strip()
if len(desc_content) < 70:
warnings.append({
'type': 'warning',
'message': f'Meta description too short ({len(desc_content)} characters)',
'location': '<meta name="description">',
'current_value': desc_content,
'fix': 'Extend description to 70-160 characters',
'code_example': f'<meta name="description" content="{desc_content} - Additional context about your page">',
'action': 'extend_meta_description'
})
elif len(desc_content) > 160:
warnings.append({
'type': 'warning',
'message': f'Meta description too long ({len(desc_content)} characters)',
'location': '<meta name="description">',
'current_value': desc_content,
'fix': 'Shorten description to 70-160 characters',
'code_example': f'<meta name="description" content="{desc_content[:155]}...">',
'action': 'shorten_meta_description'
})
# Viewport meta tag
viewport = soup.find('meta', attrs={'name': 'viewport'})
if not viewport:
issues.append({
'type': 'critical',
'message': 'Missing viewport meta tag',
'location': '<head>',
'fix': 'Add viewport meta tag for mobile optimization',
'code_example': '<meta name="viewport" content="width=device-width, initial-scale=1.0">',
'action': 'add_viewport_meta'
})
# Charset declaration
charset = soup.find('meta', attrs={'charset': True}) or soup.find('meta', attrs={'http-equiv': 'Content-Type'})
if not charset:
warnings.append({
'type': 'warning',
'message': 'Missing charset declaration',
'location': '<head>',
'fix': 'Add charset meta tag',
'code_example': '<meta charset="UTF-8">',
'action': 'add_charset_meta'
})
score = max(0, 100 - len(issues) * 25 - len(warnings) * 10)
return {
'score': score,
'issues': issues,
'warnings': warnings,
'recommendations': recommendations,
'title_length': len(title_tag.get_text().strip()) if title_tag else 0,
'description_length': len(meta_desc.get('content', '')) if meta_desc else 0,
'has_viewport': bool(viewport),
'has_charset': bool(charset)
}
class ContentAnalyzer(BaseAnalyzer):
"""Analyzes content quality and structure"""
def analyze(self, html_content: str, url: str) -> Dict[str, Any]:
"""Enhanced content analysis with specific text locations"""
soup = BeautifulSoup(html_content, 'html.parser')
issues = []
warnings = []
recommendations = []
# Get all text content
text_content = soup.get_text()
words = text_content.split()
word_count = len(words)
# Check word count
if word_count < 300:
issues.append({
'type': 'critical',
'message': f'Content too short ({word_count} words)',
'location': 'Page content',
'current_value': f'{word_count} words',
'fix': 'Add more valuable content (minimum 300 words)',
'code_example': 'Add relevant paragraphs with useful information',
'action': 'add_more_content'
})
# Check for H1 tags
h1_tags = soup.find_all('h1')
if len(h1_tags) == 0:
issues.append({
'type': 'critical',
'message': 'Missing H1 tag',
'location': 'Page structure',
'fix': 'Add one H1 tag per page',
'code_example': '<h1>Your Main Page Title</h1>',
'action': 'add_h1_tag'
})
elif len(h1_tags) > 1:
warnings.append({
'type': 'warning',
'message': f'Multiple H1 tags found ({len(h1_tags)})',
'location': 'Page structure',
'current_value': f'{len(h1_tags)} H1 tags',
'fix': 'Use only one H1 tag per page',
'code_example': 'Keep only the main H1, change others to H2',
'action': 'reduce_h1_tags'
})
# Check for images without alt text
images = soup.find_all('img')
images_without_alt = [img for img in images if not img.get('alt')]
if images_without_alt:
warnings.append({
'type': 'warning',
'message': f'Images without alt text ({len(images_without_alt)} found)',
'location': 'Images',
'current_value': f'{len(images_without_alt)} images without alt',
'fix': 'Add descriptive alt text to all images',
'code_example': '<img src="image.jpg" alt="Descriptive text about the image">',
'action': 'add_alt_text'
})
# Check for internal links
internal_links = soup.find_all('a', href=re.compile(r'^[^http]'))
if len(internal_links) < 3:
warnings.append({
'type': 'warning',
'message': f'Few internal links ({len(internal_links)} found)',
'location': 'Page content',
'current_value': f'{len(internal_links)} internal links',
'fix': 'Add more internal links to improve site structure',
'code_example': '<a href="/related-page">Related content</a>',
'action': 'add_internal_links'
})
# Check for spelling errors (basic check)
common_words = ['the', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by']
potential_errors = []
for word in words[:100]: # Check first 100 words
if len(word) > 3 and word.lower() not in common_words:
# Basic spell check (this is simplified - in production you'd use a proper spell checker)
if re.search(r'[a-z]{15,}', word.lower()): # Very long words might be misspelled
potential_errors.append(word)
if potential_errors:
issues.append({
'type': 'critical',
'message': f'Potential spelling errors found: {", ".join(potential_errors[:5])}',
'location': 'Page content',
'current_value': f'{len(potential_errors)} potential errors',
'fix': 'Review and correct spelling errors',
'code_example': 'Use spell checker or proofread content',
'action': 'fix_spelling'
})
score = max(0, 100 - len(issues) * 25 - len(warnings) * 10)
return {
'score': score,
'issues': issues,
'warnings': warnings,
'recommendations': recommendations,
'word_count': word_count,
'h1_count': len(h1_tags),
'images_count': len(images),
'images_without_alt': len(images_without_alt),
'internal_links_count': len(internal_links),
'potential_spelling_errors': len(potential_errors)
}
class TechnicalSEOAnalyzer(BaseAnalyzer):
"""Analyzes technical SEO elements"""
def analyze(self, html_content: str, url: str) -> Dict[str, Any]:
"""Enhanced technical SEO analysis with specific fixes"""
soup = BeautifulSoup(html_content, 'html.parser')
issues = []
warnings = []
recommendations = []
# Check for robots.txt
robots_url = urljoin(url, '/robots.txt')
try:
robots_response = self.session.get(robots_url, timeout=5)
if robots_response.status_code != 200:
warnings.append({
'type': 'warning',
'message': 'Robots.txt not accessible',
'location': 'Server',
'fix': 'Create robots.txt file',
'code_example': 'User-agent: *\nAllow: /',
'action': 'create_robots_txt'
})
except:
warnings.append({
'type': 'warning',
'message': 'Robots.txt not found',
'location': 'Server',
'fix': 'Create robots.txt file',
'code_example': 'User-agent: *\nAllow: /',
'action': 'create_robots_txt'
})
# Check for sitemap
sitemap_url = urljoin(url, '/sitemap.xml')
try:
sitemap_response = self.session.get(sitemap_url, timeout=5)
if sitemap_response.status_code != 200:
warnings.append({
'type': 'warning',
'message': 'Sitemap not accessible',
'location': 'Server',
'fix': 'Create XML sitemap',
'code_example': '<?xml version="1.0" encoding="UTF-8"?>\n<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n<url>\n<loc>https://example.com/</loc>\n</url>\n</urlset>',
'action': 'create_sitemap'
})
except:
warnings.append({
'type': 'warning',
'message': 'Sitemap not found',
'location': 'Server',
'fix': 'Create XML sitemap',
'code_example': '<?xml version="1.0" encoding="UTF-8"?>\n<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n<url>\n<loc>https://example.com/</loc>\n</url>\n</urlset>',
'action': 'create_sitemap'
})
# Check for structured data
structured_data = soup.find_all('script', type='application/ld+json')
if not structured_data:
warnings.append({
'type': 'warning',
'message': 'No structured data found',
'location': '<head> or <body>',
'fix': 'Add structured data markup',
'code_example': '<script type="application/ld+json">{"@context":"https://schema.org","@type":"WebPage","name":"Page Title"}</script>',
'action': 'add_structured_data'
})
# Check for canonical URL
canonical = soup.find('link', rel='canonical')
if not canonical:
issues.append({
'type': 'critical',
'message': 'Missing canonical URL',
'location': '<head>',
'fix': 'Add canonical URL',
'code_example': '<link rel="canonical" href="https://example.com/page">',
'action': 'add_canonical_url'
})
score = max(0, 100 - len(issues) * 25 - len(warnings) * 10)
return {
'score': score,
'issues': issues,
'warnings': warnings,
'recommendations': recommendations,
'has_robots_txt': len([w for w in warnings if 'robots.txt' in w['message']]) == 0,
'has_sitemap': len([w for w in warnings if 'sitemap' in w['message']]) == 0,
'has_structured_data': bool(structured_data),
'has_canonical': bool(canonical)
}
class PerformanceAnalyzer(BaseAnalyzer):
"""Analyzes page performance"""
def analyze(self, url: str) -> Dict[str, Any]:
"""Enhanced performance analysis with specific fixes"""
try:
start_time = time.time()
response = self.session.get(url, timeout=20)
load_time = time.time() - start_time
issues = []
warnings = []
recommendations = []
# Check load time
if load_time > 3:
issues.append({
'type': 'critical',
'message': f'Page load time too slow ({load_time:.2f}s)',
'location': 'Page performance',
'current_value': f'{load_time:.2f}s',
'fix': 'Optimize page speed (target < 3 seconds)',
'code_example': 'Optimize images, minify CSS/JS, use CDN',
'action': 'optimize_page_speed'
})
elif load_time > 2:
warnings.append({
'type': 'warning',
'message': f'Page load time could be improved ({load_time:.2f}s)',
'location': 'Page performance',
'current_value': f'{load_time:.2f}s',
'fix': 'Optimize for faster loading',
'code_example': 'Compress images, enable caching',
'action': 'improve_page_speed'
})
# Check for compression
content_encoding = response.headers.get('Content-Encoding')
if not content_encoding:
warnings.append({
'type': 'warning',
'message': 'No compression detected',
'location': 'Server configuration',
'fix': 'Enable GZIP compression',
'code_example': 'Add to .htaccess: SetOutputFilter DEFLATE',
'action': 'enable_compression'
})
# Check for caching headers
cache_headers = ['Cache-Control', 'Expires', 'ETag']
has_cache = any(response.headers.get(header) for header in cache_headers)
if not has_cache:
warnings.append({
'type': 'warning',
'message': 'No caching headers found',
'location': 'Server configuration',
'fix': 'Add caching headers',
'code_example': 'Cache-Control: max-age=31536000',
'action': 'add_caching_headers'
})
score = max(0, 100 - len(issues) * 25 - len(warnings) * 10)
return {
'score': score,
'load_time': load_time,
'is_compressed': bool(content_encoding),
'has_cache': has_cache,
'issues': issues,
'warnings': warnings,
'recommendations': recommendations
}
except Exception as e:
logger.warning(f"Performance analysis failed for {url}: {e}")
return {
'score': 0, 'error': f'Performance analysis failed: {str(e)}',
'load_time': 0, 'is_compressed': False, 'has_cache': False,
'issues': [{'type': 'critical', 'message': 'Performance analysis failed', 'location': 'Page', 'fix': 'Check page speed manually', 'action': 'manual_check'}],
'warnings': [{'type': 'warning', 'message': 'Could not analyze performance', 'location': 'Page', 'fix': 'Use PageSpeed Insights', 'action': 'manual_check'}],
'recommendations': [{'type': 'recommendation', 'message': 'Check page speed manually', 'priority': 'medium', 'action': 'manual_check'}]
}
class AccessibilityAnalyzer(BaseAnalyzer):
"""Analyzes accessibility features"""
def analyze(self, html_content: str) -> Dict[str, Any]:
"""Enhanced accessibility analysis with specific fixes"""
soup = BeautifulSoup(html_content, 'html.parser')
issues = []
warnings = []
recommendations = []
# Check for alt text on images
images = soup.find_all('img')
images_without_alt = [img for img in images if not img.get('alt')]
if images_without_alt:
issues.append({
'type': 'critical',
'message': f'Images without alt text ({len(images_without_alt)} found)',
'location': 'Images',
'current_value': f'{len(images_without_alt)} images without alt',
'fix': 'Add descriptive alt text to all images',
'code_example': '<img src="image.jpg" alt="Descriptive text about the image">',
'action': 'add_alt_text'
})
# Check for form labels
forms = soup.find_all('form')
for form in forms:
inputs = form.find_all(['input', 'textarea', 'select'])
for input_elem in inputs:
if input_elem.get('type') not in ['hidden', 'submit', 'button']:
input_id = input_elem.get('id')
if input_id:
label = soup.find('label', attrs={'for': input_id})
if not label:
warnings.append({
'type': 'warning',
'message': f'Input without label (ID: {input_id})',
'location': 'Form',
'current_value': f'Input ID: {input_id}',
'fix': 'Add label for input field',
'code_example': f'<label for="{input_id}">Field Label</label>',
'action': 'add_form_label'
})
# Check for heading hierarchy
headings = soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])
if headings:
h1_count = len([h for h in headings if h.name == 'h1'])
if h1_count == 0:
issues.append({
'type': 'critical',
'message': 'No H1 heading found',
'location': 'Page structure',
'fix': 'Add H1 heading for main content',
'code_example': '<h1>Main Page Heading</h1>',
'action': 'add_h1_heading'
})
# Check for color contrast (basic check)
style_tags = soup.find_all('style')
inline_styles = soup.find_all(style=True)
if style_tags or inline_styles:
warnings.append({
'type': 'warning',
'message': 'Custom styles found - check color contrast',
'location': 'CSS',
'fix': 'Ensure sufficient color contrast (4.5:1 for normal text)',
'code_example': 'Use tools like WebAIM Contrast Checker',
'action': 'check_color_contrast'
})
score = max(0, 100 - len(issues) * 25 - len(warnings) * 10)
return {
'score': score,
'issues': issues,
'warnings': warnings,
'recommendations': recommendations,
'images_count': len(images),
'images_without_alt': len(images_without_alt),
'forms_count': len(forms),
'headings_count': len(headings)
}
class UserExperienceAnalyzer(BaseAnalyzer):
"""Analyzes user experience elements"""
def analyze(self, html_content: str, url: str) -> Dict[str, Any]:
"""Enhanced user experience analysis with specific fixes"""
soup = BeautifulSoup(html_content, 'html.parser')
issues = []
warnings = []
recommendations = []
# Check for mobile responsiveness indicators
viewport = soup.find('meta', attrs={'name': 'viewport'})
if not viewport:
issues.append({
'type': 'critical',
'message': 'Missing viewport meta tag for mobile',
'location': '<head>',
'fix': 'Add viewport meta tag',
'code_example': '<meta name="viewport" content="width=device-width, initial-scale=1.0">',
'action': 'add_viewport_meta'
})
# Check for navigation menu
nav_elements = soup.find_all(['nav', 'ul', 'ol'])
if not nav_elements:
warnings.append({
'type': 'warning',
'message': 'No navigation menu found',
'location': 'Page structure',
'fix': 'Add navigation menu',
'code_example': '<nav><ul><li><a href="/">Home</a></li></ul></nav>',
'action': 'add_navigation'
})
# Check for contact information
contact_patterns = ['contact', 'phone', 'email', '@', 'tel:']
page_text = soup.get_text().lower()
has_contact = any(pattern in page_text for pattern in contact_patterns)
if not has_contact:
warnings.append({
'type': 'warning',
'message': 'No contact information found',
'location': 'Page content',
'fix': 'Add contact information',
'code_example': '<p>Contact us: <a href="mailto:info@example.com">info@example.com</a></p>',
'action': 'add_contact_info'
})
# Check for social media links
social_patterns = ['facebook', 'twitter', 'linkedin', 'instagram']
has_social = any(pattern in page_text for pattern in social_patterns)
if not has_social:
recommendations.append({
'type': 'recommendation',
'message': 'No social media links found',
'location': 'Page content',
'fix': 'Add social media links',
'code_example': '<a href="https://facebook.com/yourpage">Facebook</a>',
'action': 'add_social_links',
'priority': 'low'
})
score = max(0, 100 - len(issues) * 25 - len(warnings) * 10)
return {
'score': score,
'issues': issues,
'warnings': warnings,
'recommendations': recommendations,
'has_viewport': bool(viewport),
'has_navigation': bool(nav_elements),
'has_contact': has_contact,
'has_social': has_social
}
class SecurityHeadersAnalyzer(BaseAnalyzer):
"""Analyzes security headers"""
def analyze(self, url: str) -> Dict[str, Any]:
"""Enhanced security headers analysis with specific fixes"""
try:
response = self.session.get(url, timeout=15, allow_redirects=True)
security_headers = {
'X-Frame-Options': response.headers.get('X-Frame-Options'),
'X-Content-Type-Options': response.headers.get('X-Content-Type-Options'),
'X-XSS-Protection': response.headers.get('X-XSS-Protection'),
'Strict-Transport-Security': response.headers.get('Strict-Transport-Security'),
'Content-Security-Policy': response.headers.get('Content-Security-Policy'),
'Referrer-Policy': response.headers.get('Referrer-Policy')
}
issues = []
warnings = []
recommendations = []
present_headers = []
missing_headers = []
for header_name, header_value in security_headers.items():
if header_value:
present_headers.append(header_name)
else:
missing_headers.append(header_name)
if header_name in ['X-Frame-Options', 'X-Content-Type-Options']:
issues.append({
'type': 'critical',
'message': f'Missing {header_name} header',
'location': 'Server configuration',
'fix': f'Add {header_name} header',
'code_example': f'{header_name}: DENY' if header_name == 'X-Frame-Options' else f'{header_name}: nosniff',
'action': f'add_{header_name.lower().replace("-", "_")}_header'
})
else:
warnings.append({
'type': 'warning',
'message': f'Missing {header_name} header',
'location': 'Server configuration',
'fix': f'Add {header_name} header for better security',
'code_example': f'{header_name}: max-age=31536000',
'action': f'add_{header_name.lower().replace("-", "_")}_header'
})
score = min(100, len(present_headers) * 16)
return {
'score': score,
'present_headers': present_headers,
'missing_headers': missing_headers,
'total_headers': len(present_headers),
'issues': issues,
'warnings': warnings,
'recommendations': recommendations
}
except Exception as e:
logger.warning(f"Security headers analysis failed for {url}: {e}")
return {
'score': 0, 'error': f'Error analyzing headers: {str(e)}',
'present_headers': [], 'missing_headers': ['All security headers'],
'total_headers': 0, 'issues': [{'type': 'critical', 'message': 'Could not analyze security headers', 'location': 'Server', 'fix': 'Check security headers manually', 'action': 'manual_check'}],
'warnings': [{'type': 'warning', 'message': 'Security headers analysis failed', 'location': 'Server', 'fix': 'Verify security headers manually', 'action': 'manual_check'}],
'recommendations': [{'type': 'recommendation', 'message': 'Check security headers manually', 'priority': 'medium', 'action': 'manual_check'}]
}
class KeywordAnalyzer(BaseAnalyzer):
"""Analyzes keyword usage and optimization"""
def analyze(self, html_content: str, target_keywords: Optional[List[str]] = None) -> Dict[str, Any]:
"""Enhanced keyword analysis with specific locations"""
if not target_keywords:
return {'score': 0, 'issues': [], 'warnings': [], 'recommendations': []}
soup = BeautifulSoup(html_content, 'html.parser')
issues = []
warnings = []
recommendations = []
page_text = soup.get_text().lower()
title_text = soup.find('title')
title_text = title_text.get_text().lower() if title_text else ""
for keyword in target_keywords:
keyword_lower = keyword.lower()
# Check if keyword is in title
if keyword_lower not in title_text:
issues.append({
'type': 'critical',
'message': f'Target keyword "{keyword}" not in title',
'location': '<title>',
'current_value': title_text,
'fix': f'Include keyword "{keyword}" in title',
'code_example': f'<title>{keyword} - Your Page Title</title>',
'action': 'add_keyword_to_title'
})
# Check keyword density
keyword_count = page_text.count(keyword_lower)
if keyword_count == 0:
issues.append({
'type': 'critical',
'message': f'Target keyword "{keyword}" not found in content',
'location': 'Page content',
'current_value': '0 occurrences',
'fix': f'Include keyword "{keyword}" naturally in content',
'code_example': f'Add "{keyword}" to your page content',
'action': 'add_keyword_to_content'
})
elif keyword_count < 2:
warnings.append({
'type': 'warning',
'message': f'Target keyword "{keyword}" appears only {keyword_count} time(s)',
'location': 'Page content',
'current_value': f'{keyword_count} occurrence(s)',
'fix': f'Include keyword "{keyword}" more naturally',
'code_example': f'Add more instances of "{keyword}" to content',
'action': 'increase_keyword_density'
})
score = max(0, 100 - len(issues) * 25 - len(warnings) * 10)
return {
'score': score,
'issues': issues,
'warnings': warnings,
'recommendations': recommendations,
'target_keywords': target_keywords,
'keywords_found': [kw for kw in target_keywords if kw.lower() in page_text]
}

View File

@@ -0,0 +1,208 @@
"""
Core SEO Analyzer Module
Contains the main ComprehensiveSEOAnalyzer class and data structures.
"""
from datetime import datetime
from dataclasses import dataclass
from typing import Dict, List, Any, Optional
from loguru import logger
from .analyzers import (
URLStructureAnalyzer,
MetaDataAnalyzer,
ContentAnalyzer,
TechnicalSEOAnalyzer,
PerformanceAnalyzer,
AccessibilityAnalyzer,
UserExperienceAnalyzer,
SecurityHeadersAnalyzer,
KeywordAnalyzer
)
from .utils import HTMLFetcher, AIInsightGenerator
@dataclass
class SEOAnalysisResult:
"""Data class for SEO analysis results"""
url: str
timestamp: datetime
overall_score: int
health_status: str
critical_issues: List[Dict[str, Any]]
warnings: List[Dict[str, Any]]
recommendations: List[Dict[str, Any]]
data: Dict[str, Any]
class ComprehensiveSEOAnalyzer:
"""
Comprehensive SEO Analyzer
Orchestrates all individual analyzers to provide complete SEO analysis.
"""
def __init__(self):
"""Initialize the comprehensive SEO analyzer with all sub-analyzers"""
self.html_fetcher = HTMLFetcher()
self.ai_insight_generator = AIInsightGenerator()
# Initialize all analyzers
self.url_analyzer = URLStructureAnalyzer()
self.meta_analyzer = MetaDataAnalyzer()
self.content_analyzer = ContentAnalyzer()
self.technical_analyzer = TechnicalSEOAnalyzer()
self.performance_analyzer = PerformanceAnalyzer()
self.accessibility_analyzer = AccessibilityAnalyzer()
self.ux_analyzer = UserExperienceAnalyzer()
self.security_analyzer = SecurityHeadersAnalyzer()
self.keyword_analyzer = KeywordAnalyzer()
def analyze_url_progressive(self, url: str, target_keywords: Optional[List[str]] = None) -> SEOAnalysisResult:
"""
Progressive analysis method that runs all analyses with enhanced AI insights
"""
try:
logger.info(f"Starting enhanced SEO analysis for URL: {url}")
# Fetch HTML content
html_content = self.html_fetcher.fetch_html(url)
if not html_content:
return self._create_error_result(url, "Failed to fetch HTML content")
# Run all analyzers
analysis_data = {}
logger.info("Running enhanced analyses...")
analysis_data.update({
'url_structure': self.url_analyzer.analyze(url),
'meta_data': self.meta_analyzer.analyze(html_content, url),
'content_analysis': self.content_analyzer.analyze(html_content, url),
'keyword_analysis': self.keyword_analyzer.analyze(html_content, target_keywords) if target_keywords else {},
'technical_seo': self.technical_analyzer.analyze(html_content, url),
'accessibility': self.accessibility_analyzer.analyze(html_content),
'user_experience': self.ux_analyzer.analyze(html_content, url)
})
# Run potentially slower analyses with error handling
logger.info("Running security headers analysis...")
try:
analysis_data['security_headers'] = self.security_analyzer.analyze(url)
except Exception as e:
logger.warning(f"Security headers analysis failed: {e}")
analysis_data['security_headers'] = self._create_fallback_result('security_headers', str(e))
logger.info("Running performance analysis...")
try:
analysis_data['performance'] = self.performance_analyzer.analyze(url)
except Exception as e:
logger.warning(f"Performance analysis failed: {e}")
analysis_data['performance'] = self._create_fallback_result('performance', str(e))
# Generate AI-powered insights
ai_insights = self.ai_insight_generator.generate_insights(analysis_data, url)
# Calculate overall health
overall_score, health_status, critical_issues, warnings, recommendations = self._calculate_overall_health(analysis_data, ai_insights)
result = SEOAnalysisResult(
url=url,
timestamp=datetime.now(),
overall_score=overall_score,
health_status=health_status,
critical_issues=critical_issues,
warnings=warnings,
recommendations=recommendations,
data=analysis_data
)
logger.info(f"Enhanced SEO analysis completed for {url}. Overall score: {overall_score}")
return result
except Exception as e:
logger.error(f"Error in enhanced SEO analysis for {url}: {str(e)}")
return self._create_error_result(url, str(e))
def _calculate_overall_health(self, analysis_data: Dict[str, Any], ai_insights: List[Dict[str, Any]]) -> tuple:
"""Calculate overall health with enhanced scoring"""
scores = []
all_critical_issues = []
all_warnings = []
all_recommendations = []
for category, data in analysis_data.items():
if isinstance(data, dict) and 'score' in data:
scores.append(data['score'])
all_critical_issues.extend(data.get('issues', []))
all_warnings.extend(data.get('warnings', []))
all_recommendations.extend(data.get('recommendations', []))
# Calculate overall score
overall_score = sum(scores) // len(scores) if scores else 0
# Determine health status
if overall_score >= 80:
health_status = 'excellent'
elif overall_score >= 60:
health_status = 'good'
elif overall_score >= 40:
health_status = 'needs_improvement'
else:
health_status = 'poor'
# Add AI insights to recommendations
for insight in ai_insights:
all_recommendations.append({
'type': 'ai_insight',
'message': insight['message'],
'priority': insight['priority'],
'action': insight['action'],
'description': insight['description']
})
return overall_score, health_status, all_critical_issues, all_warnings, all_recommendations
def _create_fallback_result(self, category: str, error_message: str) -> Dict[str, Any]:
"""Create a fallback result when analysis fails"""
return {
'score': 0,
'error': f'{category} analysis failed: {error_message}',
'issues': [{
'type': 'critical',
'message': f'{category} analysis timed out',
'location': 'System',
'fix': f'Check {category} manually',
'action': 'manual_check'
}],
'warnings': [{
'type': 'warning',
'message': f'Could not analyze {category}',
'location': 'System',
'fix': f'Verify {category} manually',
'action': 'manual_check'
}],
'recommendations': [{
'type': 'recommendation',
'message': f'Check {category} manually',
'priority': 'medium',
'action': 'manual_check'
}]
}
def _create_error_result(self, url: str, error_message: str) -> SEOAnalysisResult:
"""Create error result with enhanced structure"""
return SEOAnalysisResult(
url=url,
timestamp=datetime.now(),
overall_score=0,
health_status='error',
critical_issues=[{
'type': 'critical',
'message': f'Analysis failed: {error_message}',
'location': 'System',
'fix': 'Check URL accessibility and try again',
'action': 'retry_analysis'
}],
warnings=[],
recommendations=[],
data={}
)

View File

@@ -0,0 +1,268 @@
"""
SEO Analysis Service
Handles storing and retrieving SEO analysis data from the database.
"""
from typing import Optional, List, Dict, Any
from datetime import datetime
from sqlalchemy.orm import Session
from sqlalchemy import func
from loguru import logger
from models.seo_analysis import (
SEOAnalysis,
SEOIssue,
SEOWarning,
SEORecommendation,
SEOCategoryScore,
SEOAnalysisHistory,
create_analysis_from_result,
create_issues_from_result,
create_warnings_from_result,
create_recommendations_from_result,
create_category_scores_from_result
)
from .core import SEOAnalysisResult
class SEOAnalysisService:
"""Service for managing SEO analysis data in the database."""
def __init__(self, db_session: Session):
self.db = db_session
def store_analysis_result(self, result: SEOAnalysisResult) -> Optional[SEOAnalysis]:
"""
Store SEO analysis result in the database.
Args:
result: SEOAnalysisResult from the analyzer
Returns:
Stored SEOAnalysis record or None if failed
"""
try:
# Create main analysis record
analysis_record = create_analysis_from_result(result)
self.db.add(analysis_record)
self.db.flush() # Get the ID
# Create related records
issues = create_issues_from_result(analysis_record.id, result)
warnings = create_warnings_from_result(analysis_record.id, result)
recommendations = create_recommendations_from_result(analysis_record.id, result)
category_scores = create_category_scores_from_result(analysis_record.id, result)
# Add all related records
for issue in issues:
self.db.add(issue)
for warning in warnings:
self.db.add(warning)
for recommendation in recommendations:
self.db.add(recommendation)
for score in category_scores:
self.db.add(score)
# Create history record
history_record = SEOAnalysisHistory(
url=result.url,
analysis_date=result.timestamp,
overall_score=result.overall_score,
health_status=result.health_status,
score_change=0, # Will be calculated later
critical_issues_count=len(result.critical_issues),
warnings_count=len(result.warnings),
recommendations_count=len(result.recommendations)
)
# Add category scores to history
for category, data in result.data.items():
if isinstance(data, dict) and 'score' in data:
if category == 'url_structure':
history_record.url_structure_score = data['score']
elif category == 'meta_data':
history_record.meta_data_score = data['score']
elif category == 'content_analysis':
history_record.content_score = data['score']
elif category == 'technical_seo':
history_record.technical_score = data['score']
elif category == 'performance':
history_record.performance_score = data['score']
elif category == 'accessibility':
history_record.accessibility_score = data['score']
elif category == 'user_experience':
history_record.user_experience_score = data['score']
elif category == 'security_headers':
history_record.security_score = data['score']
self.db.add(history_record)
self.db.commit()
logger.info(f"Stored SEO analysis for {result.url} with score {result.overall_score}")
return analysis_record
except Exception as e:
logger.error(f"Error storing SEO analysis: {str(e)}")
self.db.rollback()
return None
def get_latest_analysis(self, url: str) -> Optional[SEOAnalysis]:
"""
Get the latest SEO analysis for a URL.
Args:
url: The URL to get analysis for
Returns:
Latest SEOAnalysis record or None
"""
try:
return self.db.query(SEOAnalysis).filter(
SEOAnalysis.url == url
).order_by(SEOAnalysis.timestamp.desc()).first()
except Exception as e:
logger.error(f"Error getting latest analysis for {url}: {str(e)}")
return None
def get_analysis_history(self, url: str, limit: int = 10) -> List[SEOAnalysisHistory]:
"""
Get analysis history for a URL.
Args:
url: The URL to get history for
limit: Maximum number of records to return
Returns:
List of SEOAnalysisHistory records
"""
try:
return self.db.query(SEOAnalysisHistory).filter(
SEOAnalysisHistory.url == url
).order_by(SEOAnalysisHistory.analysis_date.desc()).limit(limit).all()
except Exception as e:
logger.error(f"Error getting analysis history for {url}: {str(e)}")
return []
def get_analysis_by_id(self, analysis_id: int) -> Optional[SEOAnalysis]:
"""
Get SEO analysis by ID.
Args:
analysis_id: The analysis ID
Returns:
SEOAnalysis record or None
"""
try:
return self.db.query(SEOAnalysis).filter(
SEOAnalysis.id == analysis_id
).first()
except Exception as e:
logger.error(f"Error getting analysis by ID {analysis_id}: {str(e)}")
return None
def get_all_analyses(self, limit: int = 50) -> List[SEOAnalysis]:
"""
Get all SEO analyses with pagination.
Args:
limit: Maximum number of records to return
Returns:
List of SEOAnalysis records
"""
try:
return self.db.query(SEOAnalysis).order_by(
SEOAnalysis.timestamp.desc()
).limit(limit).all()
except Exception as e:
logger.error(f"Error getting all analyses: {str(e)}")
return []
def delete_analysis(self, analysis_id: int) -> bool:
"""
Delete an SEO analysis.
Args:
analysis_id: The analysis ID to delete
Returns:
True if successful, False otherwise
"""
try:
analysis = self.db.query(SEOAnalysis).filter(
SEOAnalysis.id == analysis_id
).first()
if analysis:
self.db.delete(analysis)
self.db.commit()
logger.info(f"Deleted SEO analysis {analysis_id}")
return True
else:
logger.warning(f"Analysis {analysis_id} not found for deletion")
return False
except Exception as e:
logger.error(f"Error deleting analysis {analysis_id}: {str(e)}")
self.db.rollback()
return False
def get_analysis_statistics(self) -> Dict[str, Any]:
"""
Get overall statistics for SEO analyses.
Returns:
Dictionary with analysis statistics
"""
try:
total_analyses = self.db.query(SEOAnalysis).count()
total_urls = self.db.query(SEOAnalysis.url).distinct().count()
# Get average scores by health status
excellent_count = self.db.query(SEOAnalysis).filter(
SEOAnalysis.health_status == 'excellent'
).count()
good_count = self.db.query(SEOAnalysis).filter(
SEOAnalysis.health_status == 'good'
).count()
needs_improvement_count = self.db.query(SEOAnalysis).filter(
SEOAnalysis.health_status == 'needs_improvement'
).count()
poor_count = self.db.query(SEOAnalysis).filter(
SEOAnalysis.health_status == 'poor'
).count()
# Calculate average overall score
avg_score_result = self.db.query(
func.avg(SEOAnalysis.overall_score)
).scalar()
avg_score = float(avg_score_result) if avg_score_result else 0
return {
'total_analyses': total_analyses,
'total_urls': total_urls,
'average_score': round(avg_score, 2),
'health_distribution': {
'excellent': excellent_count,
'good': good_count,
'needs_improvement': needs_improvement_count,
'poor': poor_count
}
}
except Exception as e:
logger.error(f"Error getting analysis statistics: {str(e)}")
return {
'total_analyses': 0,
'total_urls': 0,
'average_score': 0,
'health_distribution': {
'excellent': 0,
'good': 0,
'needs_improvement': 0,
'poor': 0
}
}

View File

@@ -0,0 +1,135 @@
"""
SEO Analyzer Utilities
Contains utility classes for HTML fetching and AI insight generation.
"""
import requests
from typing import Optional, Dict, List, Any
from loguru import logger
class HTMLFetcher:
"""Utility class for fetching HTML content from URLs"""
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
})
def fetch_html(self, url: str) -> Optional[str]:
"""Fetch HTML content with retries and protocol fallback."""
def _try_fetch(target_url: str, timeout_s: int = 30) -> Optional[str]:
try:
response = self.session.get(
target_url,
timeout=timeout_s,
allow_redirects=True,
)
response.raise_for_status()
return response.text
except Exception as inner_e:
logger.error(f"Error fetching HTML from {target_url}: {inner_e}")
return None
# First attempt
html = _try_fetch(url, timeout_s=30)
if html is not None:
return html
# Retry once (shorter timeout)
html = _try_fetch(url, timeout_s=15)
if html is not None:
return html
# If https fails due to resets, try http fallback once
try:
if url.startswith("https://"):
http_url = "http://" + url[len("https://"):]
logger.info(f"SEO Analyzer: Falling back to HTTP for {http_url}")
html = _try_fetch(http_url, timeout_s=15)
if html is not None:
return html
except Exception:
# Best-effort fallback; errors already logged in _try_fetch
pass
return None
class AIInsightGenerator:
"""Utility class for generating AI-powered insights from analysis data"""
def generate_insights(self, analysis_data: Dict[str, Any], url: str) -> List[Dict[str, Any]]:
"""Generate AI-powered insights based on analysis data"""
insights = []
# Analyze overall performance
total_issues = sum(len(data.get('issues', [])) for data in analysis_data.values() if isinstance(data, dict))
total_warnings = sum(len(data.get('warnings', [])) for data in analysis_data.values() if isinstance(data, dict))
if total_issues > 5:
insights.append({
'type': 'critical',
'message': f'High number of critical issues ({total_issues}) detected',
'priority': 'high',
'action': 'fix_critical_issues',
'description': 'Multiple critical SEO issues need immediate attention to improve search rankings.'
})
# Content quality insights
content_data = analysis_data.get('content_analysis', {})
if content_data.get('word_count', 0) < 300:
insights.append({
'type': 'warning',
'message': 'Content is too thin for good SEO',
'priority': 'medium',
'action': 'expand_content',
'description': 'Add more valuable, relevant content to improve search rankings and user engagement.'
})
# Technical SEO insights
technical_data = analysis_data.get('technical_seo', {})
if not technical_data.get('has_canonical', False):
insights.append({
'type': 'critical',
'message': 'Missing canonical URL can cause duplicate content issues',
'priority': 'high',
'action': 'add_canonical',
'description': 'Canonical URLs help prevent duplicate content penalties.'
})
# Security insights
security_data = analysis_data.get('security_headers', {})
if security_data.get('total_headers', 0) < 3:
insights.append({
'type': 'warning',
'message': 'Insufficient security headers',
'priority': 'medium',
'action': 'improve_security',
'description': 'Security headers protect against common web vulnerabilities.'
})
# Performance insights
performance_data = analysis_data.get('performance', {})
if performance_data.get('load_time', 0) > 3:
insights.append({
'type': 'critical',
'message': 'Page load time is too slow',
'priority': 'high',
'action': 'optimize_performance',
'description': 'Slow loading pages negatively impact user experience and search rankings.'
})
# URL structure insights
url_data = analysis_data.get('url_structure', {})
if not url_data.get('has_https', False):
insights.append({
'type': 'critical',
'message': 'Website is not using HTTPS',
'priority': 'high',
'action': 'enable_https',
'description': 'HTTPS is required for security and is a ranking factor for search engines.'
})
return insights