Content Calendar, Content Gap Analysis, and Content Optimization
This commit is contained in:
249
lib/ai_seo_tools/content_gap_analysis/utils/README.md
Normal file
249
lib/ai_seo_tools/content_gap_analysis/utils/README.md
Normal file
@@ -0,0 +1,249 @@
|
||||
# Content Gap Analysis Utils
|
||||
|
||||
This directory contains utility modules that power the Content Gap Analysis tool. These modules provide core functionality for data collection, processing, analysis, and storage.
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
utils/
|
||||
├── README.md
|
||||
├── ai_processor.py # AI-powered content analysis and processing
|
||||
├── content_parser.py # Content structure parsing and analysis
|
||||
├── data_collector.py # Website data collection and processing
|
||||
└── storage.py # Analysis results storage and retrieval
|
||||
```
|
||||
|
||||
## Module Descriptions
|
||||
|
||||
### 1. AI Processor (`ai_processor.py`)
|
||||
|
||||
The AI Processor module enhances content analysis using AI techniques. It provides intelligent analysis of website content, competitor data, and keyword research.
|
||||
|
||||
#### Key Features:
|
||||
- Content quality assessment
|
||||
- Topic analysis and clustering
|
||||
- Performance metrics analysis
|
||||
- Strategic recommendations generation
|
||||
- Progress tracking for analysis tasks
|
||||
|
||||
#### Main Components:
|
||||
- `AIProcessor`: Main class for AI-powered analysis
|
||||
- `ProgressTracker`: Tracks analysis progress and status
|
||||
|
||||
#### Usage Example:
|
||||
```python
|
||||
from utils.ai_processor import AIProcessor
|
||||
|
||||
processor = AIProcessor()
|
||||
analysis = processor.analyze_content({
|
||||
'url': 'https://example.com',
|
||||
'industry': 'technology',
|
||||
'content': content_data
|
||||
})
|
||||
```
|
||||
|
||||
### 2. Content Parser (`content_parser.py`)
|
||||
|
||||
The Content Parser module handles the parsing and analysis of website content structure. It provides detailed insights into content organization and quality.
|
||||
|
||||
#### Key Features:
|
||||
- Content structure analysis
|
||||
- Text statistics calculation
|
||||
- Topic extraction
|
||||
- Readability analysis
|
||||
- Content hierarchy analysis
|
||||
|
||||
#### Main Components:
|
||||
- `ContentParser`: Main class for content parsing and analysis
|
||||
|
||||
#### Usage Example:
|
||||
```python
|
||||
from utils.content_parser import ContentParser
|
||||
|
||||
parser = ContentParser()
|
||||
structure = parser.parse_structure({
|
||||
'main_content': content,
|
||||
'html': html_content,
|
||||
'headings': headings_data
|
||||
})
|
||||
```
|
||||
|
||||
### 3. Data Collector (`data_collector.py`)
|
||||
|
||||
The Data Collector module is responsible for gathering website data for analysis. It handles web scraping and data extraction.
|
||||
|
||||
#### Key Features:
|
||||
- Website content collection
|
||||
- Meta data extraction
|
||||
- Heading structure analysis
|
||||
- Link and image extraction
|
||||
- Error handling and retry logic
|
||||
|
||||
#### Main Components:
|
||||
- `DataCollector`: Main class for data collection
|
||||
|
||||
#### Usage Example:
|
||||
```python
|
||||
from utils.data_collector import DataCollector
|
||||
|
||||
collector = DataCollector()
|
||||
data = collector.collect('https://example.com')
|
||||
```
|
||||
|
||||
### 4. Storage (`storage.py`)
|
||||
|
||||
The Storage module manages the persistence and retrieval of analysis results. It provides a robust database interface for storing and accessing analysis data.
|
||||
|
||||
#### Key Features:
|
||||
- Analysis results storage
|
||||
- Historical data management
|
||||
- Recommendation tracking
|
||||
- User-specific analysis storage
|
||||
- Error handling and rollback support
|
||||
|
||||
#### Main Components:
|
||||
- `ContentGapAnalysisStorage`: Main class for storage operations
|
||||
|
||||
#### Usage Example:
|
||||
```python
|
||||
from utils.storage import ContentGapAnalysisStorage
|
||||
|
||||
storage = ContentGapAnalysisStorage(db_session)
|
||||
analysis_id = storage.save_analysis(
|
||||
user_id=1,
|
||||
website_url='https://example.com',
|
||||
industry='technology',
|
||||
results=analysis_results
|
||||
)
|
||||
```
|
||||
|
||||
## Integration Points
|
||||
|
||||
### 1. Website Analysis Integration
|
||||
```python
|
||||
from utils.data_collector import DataCollector
|
||||
from utils.content_parser import ContentParser
|
||||
from utils.ai_processor import AIProcessor
|
||||
|
||||
# Collect data
|
||||
collector = DataCollector()
|
||||
data = collector.collect(url)
|
||||
|
||||
# Parse content
|
||||
parser = ContentParser()
|
||||
structure = parser.parse_structure(data)
|
||||
|
||||
# Process with AI
|
||||
processor = AIProcessor()
|
||||
analysis = processor.analyze_content({
|
||||
'url': url,
|
||||
'content': structure
|
||||
})
|
||||
```
|
||||
|
||||
### 2. Storage Integration
|
||||
```python
|
||||
from utils.storage import ContentGapAnalysisStorage
|
||||
|
||||
# Store analysis results
|
||||
storage = ContentGapAnalysisStorage(db_session)
|
||||
analysis_id = storage.save_analysis(
|
||||
user_id=user_id,
|
||||
website_url=url,
|
||||
industry=industry,
|
||||
results=analysis_results
|
||||
)
|
||||
|
||||
# Retrieve analysis
|
||||
results = storage.get_analysis(analysis_id)
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
All modules implement comprehensive error handling:
|
||||
|
||||
1. **Data Collection Errors**
|
||||
- Network timeouts
|
||||
- Invalid URLs
|
||||
- Access restrictions
|
||||
- Parsing errors
|
||||
|
||||
2. **Processing Errors**
|
||||
- Invalid data formats
|
||||
- AI processing failures
|
||||
- Resource limitations
|
||||
- Analysis timeouts
|
||||
|
||||
3. **Storage Errors**
|
||||
- Database connection issues
|
||||
- Transaction failures
|
||||
- Data validation errors
|
||||
- Concurrent access conflicts
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Data Collection**
|
||||
- Implement rate limiting
|
||||
- Use proper user agents
|
||||
- Handle redirects
|
||||
- Validate input data
|
||||
|
||||
2. **Content Processing**
|
||||
- Clean and normalize data
|
||||
- Handle encoding issues
|
||||
- Implement fallback strategies
|
||||
- Cache processed results
|
||||
|
||||
3. **Storage Management**
|
||||
- Use transactions
|
||||
- Implement data validation
|
||||
- Handle concurrent access
|
||||
- Maintain data integrity
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
1. **Performance Optimizations**
|
||||
- Implement parallel processing
|
||||
- Add caching layer
|
||||
- Optimize database queries
|
||||
- Enhance error recovery
|
||||
|
||||
2. **Feature Additions**
|
||||
- Content performance tracking
|
||||
- Automated content planning
|
||||
- Enhanced competitive intelligence
|
||||
- Advanced topic clustering
|
||||
|
||||
3. **Integration Improvements**
|
||||
- API endpoints
|
||||
- Export capabilities
|
||||
- Data visualization
|
||||
- Progress tracking
|
||||
|
||||
4. **UI/UX Enhancements**
|
||||
- Interactive visualizations
|
||||
- Real-time progress updates
|
||||
- Export interfaces
|
||||
- Customization options
|
||||
|
||||
## Contributing
|
||||
|
||||
When contributing to these utility modules:
|
||||
|
||||
1. Follow the existing code structure
|
||||
2. Add comprehensive error handling
|
||||
3. Include unit tests
|
||||
4. Update documentation
|
||||
5. Follow PEP 8 style guide
|
||||
|
||||
## Dependencies
|
||||
|
||||
- BeautifulSoup4: HTML parsing
|
||||
- NLTK: Natural language processing
|
||||
- SQLAlchemy: Database operations
|
||||
- Streamlit: UI components
|
||||
- Requests: HTTP requests
|
||||
|
||||
## License
|
||||
|
||||
This project is licensed under the MIT License - see the LICENSE file for details.
|
||||
13
lib/ai_seo_tools/content_gap_analysis/utils/__init__.py
Normal file
13
lib/ai_seo_tools/content_gap_analysis/utils/__init__.py
Normal file
@@ -0,0 +1,13 @@
|
||||
"""
|
||||
Utility modules for content gap analysis.
|
||||
"""
|
||||
|
||||
from .data_collector import DataCollector
|
||||
from .content_parser import ContentParser
|
||||
from .ai_processor import AIProcessor
|
||||
|
||||
__all__ = [
|
||||
'DataCollector',
|
||||
'ContentParser',
|
||||
'AIProcessor'
|
||||
]
|
||||
1134
lib/ai_seo_tools/content_gap_analysis/utils/ai_processor.py
Normal file
1134
lib/ai_seo_tools/content_gap_analysis/utils/ai_processor.py
Normal file
File diff suppressed because it is too large
Load Diff
236
lib/ai_seo_tools/content_gap_analysis/utils/content_parser.py
Normal file
236
lib/ai_seo_tools/content_gap_analysis/utils/content_parser.py
Normal file
@@ -0,0 +1,236 @@
|
||||
"""
|
||||
Content parser utility for analyzing website content structure.
|
||||
"""
|
||||
|
||||
from typing import Dict, Any, List
|
||||
import re
|
||||
from bs4 import BeautifulSoup
|
||||
import nltk
|
||||
from nltk.tokenize import sent_tokenize, word_tokenize
|
||||
from nltk.corpus import stopwords
|
||||
from collections import Counter
|
||||
|
||||
class ContentParser:
|
||||
"""Parser for analyzing website content structure."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the content parser."""
|
||||
try:
|
||||
nltk.data.find('tokenizers/punkt')
|
||||
except LookupError:
|
||||
nltk.download('punkt')
|
||||
try:
|
||||
nltk.data.find('corpora/stopwords')
|
||||
except LookupError:
|
||||
nltk.download('stopwords')
|
||||
|
||||
self.stop_words = set(stopwords.words('english'))
|
||||
|
||||
def parse_structure(self, content: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Parse and analyze the structure of website content.
|
||||
|
||||
Args:
|
||||
content: Dictionary containing website content
|
||||
|
||||
Returns:
|
||||
Dictionary containing parsed content structure
|
||||
"""
|
||||
try:
|
||||
# Parse main content
|
||||
main_content = content.get('main_content', '')
|
||||
soup = BeautifulSoup(content.get('html', ''), 'html.parser')
|
||||
|
||||
# Extract text statistics
|
||||
text_stats = self._analyze_text(main_content)
|
||||
|
||||
# Extract content sections
|
||||
sections = self._extract_sections(soup)
|
||||
|
||||
# Extract topics
|
||||
topics = self._extract_topics(main_content)
|
||||
|
||||
# Analyze readability
|
||||
readability = self._analyze_readability(main_content)
|
||||
|
||||
# Analyze content hierarchy
|
||||
hierarchy = self._analyze_hierarchy(content.get('headings', []))
|
||||
|
||||
return {
|
||||
'text_statistics': text_stats,
|
||||
'sections': sections,
|
||||
'topics': topics,
|
||||
'readability': readability,
|
||||
'hierarchy': hierarchy,
|
||||
'metadata': content.get('metadata', {})
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
return {
|
||||
'error': str(e),
|
||||
'text_statistics': {},
|
||||
'sections': [],
|
||||
'topics': [],
|
||||
'readability': {},
|
||||
'hierarchy': {},
|
||||
'metadata': {}
|
||||
}
|
||||
|
||||
def _analyze_text(self, text: str) -> Dict[str, Any]:
|
||||
"""Analyze text statistics."""
|
||||
sentences = sent_tokenize(text)
|
||||
words = word_tokenize(text.lower())
|
||||
words = [w for w in words if w.isalnum() and w not in self.stop_words]
|
||||
|
||||
return {
|
||||
'word_count': len(words),
|
||||
'sentence_count': len(sentences),
|
||||
'average_sentence_length': len(words) / max(len(sentences), 1),
|
||||
'unique_words': len(set(words)),
|
||||
'stop_words': len([w for w in word_tokenize(text.lower()) if w in self.stop_words]),
|
||||
'characters': len(text),
|
||||
'paragraphs': len(text.split('\n\n')),
|
||||
'sentences': sentences
|
||||
}
|
||||
|
||||
def _extract_sections(self, soup: BeautifulSoup) -> List[Dict[str, Any]]:
|
||||
"""Extract content sections."""
|
||||
sections = []
|
||||
|
||||
# Find main content containers
|
||||
containers = soup.find_all(['article', 'section', 'div'], class_=re.compile(r'content|main|article|section'))
|
||||
|
||||
for container in containers:
|
||||
# Get section heading
|
||||
heading = container.find(['h1', 'h2', 'h3'])
|
||||
heading_text = heading.get_text().strip() if heading else 'Untitled Section'
|
||||
|
||||
# Get section content
|
||||
content = container.get_text().strip()
|
||||
|
||||
# Get section type
|
||||
section_type = container.name
|
||||
if container.get('class'):
|
||||
section_type = ' '.join(container.get('class'))
|
||||
|
||||
sections.append({
|
||||
'heading': heading_text,
|
||||
'content': content,
|
||||
'type': section_type,
|
||||
'word_count': len(word_tokenize(content)),
|
||||
'position': self._get_element_position(container)
|
||||
})
|
||||
|
||||
return sections
|
||||
|
||||
def _extract_topics(self, text: str) -> List[Dict[str, Any]]:
|
||||
"""Extract main topics from content."""
|
||||
# Tokenize and clean text
|
||||
words = word_tokenize(text.lower())
|
||||
words = [w for w in words if w.isalnum() and w not in self.stop_words]
|
||||
|
||||
# Get word frequencies
|
||||
word_freq = Counter(words)
|
||||
|
||||
# Get top topics
|
||||
topics = []
|
||||
for word, freq in word_freq.most_common(10):
|
||||
topics.append({
|
||||
'topic': word,
|
||||
'frequency': freq,
|
||||
'percentage': freq / len(words) * 100
|
||||
})
|
||||
|
||||
return topics
|
||||
|
||||
def _analyze_readability(self, text: str) -> Dict[str, float]:
|
||||
"""Analyze text readability."""
|
||||
sentences = sent_tokenize(text)
|
||||
words = word_tokenize(text.lower())
|
||||
words = [w for w in words if w.isalnum()]
|
||||
|
||||
# Calculate average sentence length
|
||||
avg_sentence_length = len(words) / max(len(sentences), 1)
|
||||
|
||||
# Calculate average word length
|
||||
avg_word_length = sum(len(w) for w in words) / max(len(words), 1)
|
||||
|
||||
# Calculate Flesch Reading Ease score
|
||||
# Formula: 206.835 - 1.015(total words/total sentences) - 84.6(total syllables/total words)
|
||||
syllables = sum(self._count_syllables(w) for w in words)
|
||||
flesch_score = 206.835 - 1.015 * avg_sentence_length - 84.6 * (syllables / max(len(words), 1))
|
||||
|
||||
return {
|
||||
'flesch_score': max(0, min(100, flesch_score)),
|
||||
'avg_sentence_length': avg_sentence_length,
|
||||
'avg_word_length': avg_word_length,
|
||||
'syllables_per_word': syllables / max(len(words), 1)
|
||||
}
|
||||
|
||||
def _analyze_hierarchy(self, headings: List[Dict[str, Any]]) -> Dict[str, Any]:
|
||||
"""Analyze content hierarchy."""
|
||||
# Group headings by level
|
||||
heading_levels = {}
|
||||
for heading in headings:
|
||||
level = heading['level']
|
||||
if level not in heading_levels:
|
||||
heading_levels[level] = []
|
||||
heading_levels[level].append(heading)
|
||||
|
||||
# Calculate hierarchy metrics
|
||||
total_headings = len(headings)
|
||||
max_depth = max(int(level[1]) for level in heading_levels.keys()) if heading_levels else 0
|
||||
|
||||
return {
|
||||
'total_headings': total_headings,
|
||||
'max_depth': max_depth,
|
||||
'heading_distribution': {level: len(headings) for level, headings in heading_levels.items()},
|
||||
'has_proper_hierarchy': self._check_proper_hierarchy(heading_levels)
|
||||
}
|
||||
|
||||
def _check_proper_hierarchy(self, heading_levels: Dict[str, List[Dict[str, Any]]]) -> bool:
|
||||
"""Check if headings follow proper hierarchy."""
|
||||
if not heading_levels:
|
||||
return False
|
||||
|
||||
# Check if h1 exists
|
||||
if 'h1' not in heading_levels:
|
||||
return False
|
||||
|
||||
# Check if h1 is unique
|
||||
if len(heading_levels['h1']) > 1:
|
||||
return False
|
||||
|
||||
# Check if levels are sequential
|
||||
levels = sorted(int(level[1]) for level in heading_levels.keys())
|
||||
return all(levels[i] - levels[i-1] <= 1 for i in range(1, len(levels)))
|
||||
|
||||
def _count_syllables(self, word: str) -> int:
|
||||
"""Count syllables in a word."""
|
||||
word = word.lower()
|
||||
count = 0
|
||||
vowels = 'aeiouy'
|
||||
word = word.lower()
|
||||
if word[0] in vowels:
|
||||
count += 1
|
||||
for index in range(1, len(word)):
|
||||
if word[index] in vowels and word[index - 1] not in vowels:
|
||||
count += 1
|
||||
if word.endswith('e'):
|
||||
count -= 1
|
||||
if count == 0:
|
||||
count += 1
|
||||
return count
|
||||
|
||||
def _get_element_position(self, element) -> Dict[str, int]:
|
||||
"""Get element position in the document."""
|
||||
try:
|
||||
return {
|
||||
'top': element.sourceline,
|
||||
'left': element.sourcepos
|
||||
}
|
||||
except:
|
||||
return {
|
||||
'top': 0,
|
||||
'left': 0
|
||||
}
|
||||
112
lib/ai_seo_tools/content_gap_analysis/utils/data_collector.py
Normal file
112
lib/ai_seo_tools/content_gap_analysis/utils/data_collector.py
Normal file
@@ -0,0 +1,112 @@
|
||||
"""
|
||||
Data collector utility for content gap analysis.
|
||||
"""
|
||||
|
||||
import requests
|
||||
from bs4 import BeautifulSoup
|
||||
from typing import Dict, Any
|
||||
|
||||
class DataCollector:
|
||||
"""
|
||||
Collects and processes website data for analysis.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the data collector."""
|
||||
self.headers = {
|
||||
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
|
||||
}
|
||||
|
||||
def collect(self, url: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Collect website data for analysis.
|
||||
|
||||
Args:
|
||||
url (str): The URL to collect data from
|
||||
|
||||
Returns:
|
||||
dict: Collected website data
|
||||
"""
|
||||
try:
|
||||
# Fetch webpage content
|
||||
response = requests.get(url, headers=self.headers)
|
||||
response.raise_for_status()
|
||||
|
||||
# Parse HTML content
|
||||
soup = BeautifulSoup(response.text, 'html.parser')
|
||||
|
||||
# Extract relevant data
|
||||
data = {
|
||||
'url': url,
|
||||
'title': self._extract_title(soup),
|
||||
'meta_description': self._extract_meta_description(soup),
|
||||
'headings': self._extract_headings(soup),
|
||||
'content': self._extract_content(soup),
|
||||
'links': self._extract_links(soup),
|
||||
'images': self._extract_images(soup)
|
||||
}
|
||||
|
||||
return data
|
||||
|
||||
except Exception as e:
|
||||
return {
|
||||
'error': str(e),
|
||||
'url': url
|
||||
}
|
||||
|
||||
def _extract_title(self, soup: BeautifulSoup) -> str:
|
||||
"""Extract page title."""
|
||||
title = soup.find('title')
|
||||
return title.text if title else ''
|
||||
|
||||
def _extract_meta_description(self, soup: BeautifulSoup) -> str:
|
||||
"""Extract meta description."""
|
||||
meta = soup.find('meta', attrs={'name': 'description'})
|
||||
return meta.get('content', '') if meta else ''
|
||||
|
||||
def _extract_headings(self, soup: BeautifulSoup) -> Dict[str, list]:
|
||||
"""Extract all headings."""
|
||||
headings = {}
|
||||
for i in range(1, 7):
|
||||
tags = soup.find_all(f'h{i}')
|
||||
headings[f'h{i}'] = [tag.text.strip() for tag in tags]
|
||||
return headings
|
||||
|
||||
def _extract_content(self, soup: BeautifulSoup) -> str:
|
||||
"""Extract main content."""
|
||||
# Remove script and style elements
|
||||
for script in soup(['script', 'style']):
|
||||
script.decompose()
|
||||
|
||||
# Get text content
|
||||
text = soup.get_text()
|
||||
|
||||
# Clean up text
|
||||
lines = (line.strip() for line in text.splitlines())
|
||||
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
|
||||
text = ' '.join(chunk for chunk in chunks if chunk)
|
||||
|
||||
return text
|
||||
|
||||
def _extract_links(self, soup: BeautifulSoup) -> list:
|
||||
"""Extract all links."""
|
||||
links = []
|
||||
for link in soup.find_all('a'):
|
||||
href = link.get('href')
|
||||
if href:
|
||||
links.append({
|
||||
'url': href,
|
||||
'text': link.text.strip()
|
||||
})
|
||||
return links
|
||||
|
||||
def _extract_images(self, soup: BeautifulSoup) -> list:
|
||||
"""Extract all images."""
|
||||
images = []
|
||||
for img in soup.find_all('img'):
|
||||
images.append({
|
||||
'src': img.get('src', ''),
|
||||
'alt': img.get('alt', ''),
|
||||
'title': img.get('title', '')
|
||||
})
|
||||
return images
|
||||
237
lib/ai_seo_tools/content_gap_analysis/utils/seo_analyzer.py
Normal file
237
lib/ai_seo_tools/content_gap_analysis/utils/seo_analyzer.py
Normal file
@@ -0,0 +1,237 @@
|
||||
"""
|
||||
SEO analyzer utility for content gap analysis.
|
||||
"""
|
||||
|
||||
import requests
|
||||
from bs4 import BeautifulSoup
|
||||
from urllib.parse import urlparse, urljoin
|
||||
import re
|
||||
from typing import Dict, Any, List, Optional
|
||||
from ....utils.website_analyzer.analyzer import WebsiteAnalyzer
|
||||
|
||||
def analyze_onpage_seo(url: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Analyze on-page SEO elements of a website.
|
||||
|
||||
Args:
|
||||
url: The URL to analyze
|
||||
|
||||
Returns:
|
||||
Dictionary containing SEO analysis results
|
||||
"""
|
||||
try:
|
||||
# Use the combined website analyzer
|
||||
analyzer = WebsiteAnalyzer()
|
||||
analysis = analyzer.analyze_website(url)
|
||||
|
||||
if not analysis.get('success', False):
|
||||
return {
|
||||
'error': analysis.get('error', 'Unknown error in SEO analysis'),
|
||||
'meta_title': '',
|
||||
'meta_description': '',
|
||||
'has_robots_txt': False,
|
||||
'has_sitemap': False,
|
||||
'mobile_friendly': False,
|
||||
'load_time': 0
|
||||
}
|
||||
|
||||
# Extract relevant information from the analysis
|
||||
seo_info = analysis['data']['analysis']['seo_info']
|
||||
basic_info = analysis['data']['analysis']['basic_info']
|
||||
performance = analysis['data']['analysis']['performance']
|
||||
|
||||
return {
|
||||
'meta_tags': seo_info.get('meta_tags', {}),
|
||||
'content': seo_info.get('content', {}),
|
||||
'meta_title': basic_info.get('title', ''),
|
||||
'meta_description': basic_info.get('meta_description', ''),
|
||||
'has_robots_txt': bool(basic_info.get('robots_txt')),
|
||||
'has_sitemap': bool(basic_info.get('sitemap')),
|
||||
'mobile_friendly': True, # This would need to be implemented separately
|
||||
'load_time': performance.get('load_time', 0)
|
||||
}
|
||||
except Exception as e:
|
||||
return {
|
||||
'error': str(e),
|
||||
'meta_title': '',
|
||||
'meta_description': '',
|
||||
'has_robots_txt': False,
|
||||
'has_sitemap': False,
|
||||
'mobile_friendly': False,
|
||||
'load_time': 0
|
||||
}
|
||||
|
||||
def _analyze_meta_tags(soup: BeautifulSoup) -> Dict[str, Any]:
|
||||
"""Analyze meta tags of the webpage."""
|
||||
meta_tags = {}
|
||||
|
||||
# Title tag
|
||||
title_tag = soup.find('title')
|
||||
if title_tag:
|
||||
meta_tags['title'] = title_tag.string.strip()
|
||||
|
||||
# Meta description
|
||||
meta_desc = soup.find('meta', {'name': 'description'})
|
||||
if meta_desc:
|
||||
meta_tags['description'] = meta_desc.get('content', '').strip()
|
||||
|
||||
# Meta keywords
|
||||
meta_keywords = soup.find('meta', {'name': 'keywords'})
|
||||
if meta_keywords:
|
||||
meta_tags['keywords'] = meta_keywords.get('content', '').strip()
|
||||
|
||||
# Open Graph tags
|
||||
og_tags = {}
|
||||
for tag in soup.find_all('meta', property=re.compile(r'^og:')):
|
||||
og_tags[tag['property']] = tag.get('content', '')
|
||||
meta_tags['og_tags'] = og_tags
|
||||
|
||||
# Twitter Card tags
|
||||
twitter_tags = {}
|
||||
for tag in soup.find_all('meta', name=re.compile(r'^twitter:')):
|
||||
twitter_tags[tag['name']] = tag.get('content', '')
|
||||
meta_tags['twitter_tags'] = twitter_tags
|
||||
|
||||
return meta_tags
|
||||
|
||||
def _analyze_headings(soup: BeautifulSoup) -> Dict[str, Any]:
|
||||
"""Analyze heading structure of the webpage."""
|
||||
headings = {
|
||||
'h1': [],
|
||||
'h2': [],
|
||||
'h3': [],
|
||||
'h4': [],
|
||||
'h5': [],
|
||||
'h6': []
|
||||
}
|
||||
|
||||
for tag in ['h1', 'h2', 'h3', 'h4', 'h5', 'h6']:
|
||||
for heading in soup.find_all(tag):
|
||||
headings[tag].append(heading.get_text().strip())
|
||||
|
||||
return headings
|
||||
|
||||
def _analyze_content(soup: BeautifulSoup) -> Dict[str, Any]:
|
||||
"""Analyze main content of the webpage."""
|
||||
# Find main content
|
||||
main_content = soup.find('main') or soup.find('article') or soup.find('div', class_=re.compile(r'content|main|article'))
|
||||
|
||||
if not main_content:
|
||||
return {
|
||||
'word_count': 0,
|
||||
'paragraph_count': 0,
|
||||
'content': ''
|
||||
}
|
||||
|
||||
# Get text content
|
||||
content = main_content.get_text()
|
||||
|
||||
# Count words and paragraphs
|
||||
words = content.split()
|
||||
paragraphs = main_content.find_all('p')
|
||||
|
||||
return {
|
||||
'word_count': len(words),
|
||||
'paragraph_count': len(paragraphs),
|
||||
'content': content
|
||||
}
|
||||
|
||||
def _analyze_links(soup: BeautifulSoup, base_url: str) -> Dict[str, Any]:
|
||||
"""Analyze links on the webpage."""
|
||||
links = {
|
||||
'internal': [],
|
||||
'external': [],
|
||||
'broken': []
|
||||
}
|
||||
|
||||
base_domain = urlparse(base_url).netloc
|
||||
|
||||
for link in soup.find_all('a', href=True):
|
||||
href = link['href']
|
||||
|
||||
# Handle relative URLs
|
||||
if not href.startswith(('http://', 'https://')):
|
||||
href = urljoin(base_url, href)
|
||||
|
||||
# Categorize link
|
||||
if urlparse(href).netloc == base_domain:
|
||||
links['internal'].append({
|
||||
'url': href,
|
||||
'text': link.get_text().strip(),
|
||||
'title': link.get('title', '')
|
||||
})
|
||||
else:
|
||||
links['external'].append({
|
||||
'url': href,
|
||||
'text': link.get_text().strip(),
|
||||
'title': link.get('title', '')
|
||||
})
|
||||
|
||||
return links
|
||||
|
||||
def _analyze_images(soup: BeautifulSoup) -> Dict[str, Any]:
|
||||
"""Analyze images on the webpage."""
|
||||
images = []
|
||||
|
||||
for img in soup.find_all('img'):
|
||||
image_data = {
|
||||
'src': img.get('src', ''),
|
||||
'alt': img.get('alt', ''),
|
||||
'title': img.get('title', ''),
|
||||
'width': img.get('width', ''),
|
||||
'height': img.get('height', ''),
|
||||
'has_alt': bool(img.get('alt')),
|
||||
'has_title': bool(img.get('title')),
|
||||
'has_dimensions': bool(img.get('width') and img.get('height'))
|
||||
}
|
||||
images.append(image_data)
|
||||
|
||||
return {
|
||||
'total': len(images),
|
||||
'with_alt': sum(1 for img in images if img['has_alt']),
|
||||
'with_title': sum(1 for img in images if img['has_title']),
|
||||
'with_dimensions': sum(1 for img in images if img['has_dimensions']),
|
||||
'images': images
|
||||
}
|
||||
|
||||
def _check_technical_elements(soup: BeautifulSoup, url: str) -> Dict[str, Any]:
|
||||
"""Check technical SEO elements."""
|
||||
base_url = urlparse(url)
|
||||
domain = base_url.netloc
|
||||
|
||||
# Check robots.txt
|
||||
robots_url = f"{base_url.scheme}://{domain}/robots.txt"
|
||||
try:
|
||||
robots_response = requests.get(robots_url, timeout=5)
|
||||
has_robots_txt = robots_response.status_code == 200
|
||||
except:
|
||||
has_robots_txt = False
|
||||
|
||||
# Check sitemap
|
||||
sitemap_url = f"{base_url.scheme}://{domain}/sitemap.xml"
|
||||
try:
|
||||
sitemap_response = requests.get(sitemap_url, timeout=5)
|
||||
has_sitemap = sitemap_response.status_code == 200
|
||||
except:
|
||||
has_sitemap = False
|
||||
|
||||
# Check mobile friendliness
|
||||
viewport = soup.find('meta', {'name': 'viewport'})
|
||||
has_viewport = bool(viewport)
|
||||
|
||||
# Check canonical URL
|
||||
canonical = soup.find('link', {'rel': 'canonical'})
|
||||
has_canonical = bool(canonical)
|
||||
|
||||
# Check language
|
||||
html_lang = soup.find('html').get('lang', '')
|
||||
has_language = bool(html_lang)
|
||||
|
||||
return {
|
||||
'has_robots_txt': has_robots_txt,
|
||||
'has_sitemap': has_sitemap,
|
||||
'mobile_friendly': has_viewport,
|
||||
'has_canonical': has_canonical,
|
||||
'has_language': has_language,
|
||||
'language': html_lang
|
||||
}
|
||||
270
lib/ai_seo_tools/content_gap_analysis/utils/storage.py
Normal file
270
lib/ai_seo_tools/content_gap_analysis/utils/storage.py
Normal file
@@ -0,0 +1,270 @@
|
||||
"""
|
||||
Storage module for content gap analysis results.
|
||||
"""
|
||||
|
||||
from typing import Dict, Any, List, Optional
|
||||
from datetime import datetime
|
||||
from sqlalchemy.orm import Session
|
||||
from sqlalchemy.exc import SQLAlchemyError
|
||||
import streamlit as st
|
||||
|
||||
class ContentGapAnalysisStorage:
|
||||
"""Handles storage and retrieval of content gap analysis results."""
|
||||
|
||||
def __init__(self, db_session: Session):
|
||||
"""Initialize the storage handler."""
|
||||
self.db = db_session
|
||||
|
||||
def save_analysis(self, user_id: int, website_url: str, industry: str, results: Dict[str, Any]) -> Optional[int]:
|
||||
"""
|
||||
Save content gap analysis results.
|
||||
|
||||
Args:
|
||||
user_id: User ID
|
||||
website_url: Target website URL
|
||||
industry: Industry category
|
||||
results: Analysis results dictionary
|
||||
|
||||
Returns:
|
||||
Analysis ID if successful, None otherwise
|
||||
"""
|
||||
try:
|
||||
# Create main analysis record
|
||||
analysis = ContentGapAnalysis(
|
||||
user_id=user_id,
|
||||
website_url=website_url,
|
||||
industry=industry,
|
||||
status='completed',
|
||||
metadata={'version': '1.0'}
|
||||
)
|
||||
self.db.add(analysis)
|
||||
self.db.flush() # Get the ID without committing
|
||||
|
||||
# Save website analysis
|
||||
website_analysis = WebsiteAnalysis(
|
||||
content_gap_analysis_id=analysis.id,
|
||||
content_score=results.get('website', {}).get('content_score', 0),
|
||||
seo_score=results.get('website', {}).get('seo_score', 0),
|
||||
structure_score=results.get('website', {}).get('structure_score', 0),
|
||||
content_metrics=results.get('website', {}).get('content_metrics', {}),
|
||||
seo_metrics=results.get('website', {}).get('seo_metrics', {}),
|
||||
technical_metrics=results.get('website', {}).get('technical_metrics', {}),
|
||||
ai_insights=results.get('website', {}).get('ai_insights', {})
|
||||
)
|
||||
self.db.add(website_analysis)
|
||||
|
||||
# Save competitor analysis if available
|
||||
if 'competitors' in results:
|
||||
for competitor in results['competitors']:
|
||||
competitor_analysis = CompetitorAnalysis(
|
||||
content_gap_analysis_id=analysis.id,
|
||||
competitor_url=competitor.get('url'),
|
||||
market_position=competitor.get('market_position', {}),
|
||||
content_gaps=competitor.get('content_gaps', []),
|
||||
competitive_advantages=competitor.get('competitive_advantages', []),
|
||||
trend_analysis=competitor.get('trend_analysis', {})
|
||||
)
|
||||
self.db.add(competitor_analysis)
|
||||
|
||||
# Save keyword analysis
|
||||
keyword_analysis = KeywordAnalysis(
|
||||
content_gap_analysis_id=analysis.id,
|
||||
top_keywords=results.get('keywords', {}).get('top_keywords', []),
|
||||
search_intent=results.get('keywords', {}).get('search_intent', {}),
|
||||
opportunities=results.get('keywords', {}).get('opportunities', []),
|
||||
trend_analysis=results.get('keywords', {}).get('trend_analysis', {})
|
||||
)
|
||||
self.db.add(keyword_analysis)
|
||||
|
||||
# Save recommendations
|
||||
for recommendation in results.get('recommendations', []):
|
||||
content_recommendation = ContentRecommendation(
|
||||
content_gap_analysis_id=analysis.id,
|
||||
recommendation_type=recommendation.get('type'),
|
||||
priority_score=recommendation.get('priority_score', 0),
|
||||
recommendation=recommendation.get('recommendation', ''),
|
||||
implementation_steps=recommendation.get('implementation_steps', []),
|
||||
expected_impact=recommendation.get('expected_impact', {}),
|
||||
status='pending'
|
||||
)
|
||||
self.db.add(content_recommendation)
|
||||
|
||||
# Save analysis history
|
||||
history = AnalysisHistory(
|
||||
content_gap_analysis_id=analysis.id,
|
||||
status='completed',
|
||||
metrics={'duration': results.get('duration', 0)}
|
||||
)
|
||||
self.db.add(history)
|
||||
|
||||
# Commit all changes
|
||||
self.db.commit()
|
||||
return analysis.id
|
||||
|
||||
except SQLAlchemyError as e:
|
||||
self.db.rollback()
|
||||
st.error(f"Error saving analysis results: {str(e)}")
|
||||
return None
|
||||
|
||||
def get_analysis(self, analysis_id: int) -> Optional[Dict[str, Any]]:
|
||||
"""
|
||||
Retrieve content gap analysis results.
|
||||
|
||||
Args:
|
||||
analysis_id: Analysis ID
|
||||
|
||||
Returns:
|
||||
Dictionary containing analysis results if found, None otherwise
|
||||
"""
|
||||
try:
|
||||
analysis = self.db.query(ContentGapAnalysis).get(analysis_id)
|
||||
if not analysis:
|
||||
return None
|
||||
|
||||
# Get website analysis
|
||||
website_analysis = self.db.query(WebsiteAnalysis).filter_by(
|
||||
content_gap_analysis_id=analysis_id
|
||||
).first()
|
||||
|
||||
# Get competitor analysis
|
||||
competitor_analyses = self.db.query(CompetitorAnalysis).filter_by(
|
||||
content_gap_analysis_id=analysis_id
|
||||
).all()
|
||||
|
||||
# Get keyword analysis
|
||||
keyword_analysis = self.db.query(KeywordAnalysis).filter_by(
|
||||
content_gap_analysis_id=analysis_id
|
||||
).first()
|
||||
|
||||
# Get recommendations
|
||||
recommendations = self.db.query(ContentRecommendation).filter_by(
|
||||
content_gap_analysis_id=analysis_id
|
||||
).all()
|
||||
|
||||
# Get analysis history
|
||||
history = self.db.query(AnalysisHistory).filter_by(
|
||||
content_gap_analysis_id=analysis_id
|
||||
).order_by(AnalysisHistory.run_date.desc()).all()
|
||||
|
||||
return {
|
||||
'id': analysis.id,
|
||||
'website_url': analysis.website_url,
|
||||
'industry': analysis.industry,
|
||||
'analysis_date': analysis.analysis_date,
|
||||
'status': analysis.status,
|
||||
'website': {
|
||||
'content_score': website_analysis.content_score,
|
||||
'seo_score': website_analysis.seo_score,
|
||||
'structure_score': website_analysis.structure_score,
|
||||
'content_metrics': website_analysis.content_metrics,
|
||||
'seo_metrics': website_analysis.seo_metrics,
|
||||
'technical_metrics': website_analysis.technical_metrics,
|
||||
'ai_insights': website_analysis.ai_insights
|
||||
} if website_analysis else {},
|
||||
'competitors': [{
|
||||
'url': ca.competitor_url,
|
||||
'market_position': ca.market_position,
|
||||
'content_gaps': ca.content_gaps,
|
||||
'competitive_advantages': ca.competitive_advantages,
|
||||
'trend_analysis': ca.trend_analysis
|
||||
} for ca in competitor_analyses],
|
||||
'keywords': {
|
||||
'top_keywords': keyword_analysis.top_keywords,
|
||||
'search_intent': keyword_analysis.search_intent,
|
||||
'opportunities': keyword_analysis.opportunities,
|
||||
'trend_analysis': keyword_analysis.trend_analysis
|
||||
} if keyword_analysis else {},
|
||||
'recommendations': [{
|
||||
'type': r.recommendation_type,
|
||||
'priority_score': r.priority_score,
|
||||
'recommendation': r.recommendation,
|
||||
'implementation_steps': r.implementation_steps,
|
||||
'expected_impact': r.expected_impact,
|
||||
'status': r.status
|
||||
} for r in recommendations],
|
||||
'history': [{
|
||||
'run_date': h.run_date,
|
||||
'status': h.status,
|
||||
'metrics': h.metrics,
|
||||
'error_log': h.error_log
|
||||
} for h in history]
|
||||
}
|
||||
|
||||
except SQLAlchemyError as e:
|
||||
st.error(f"Error retrieving analysis results: {str(e)}")
|
||||
return None
|
||||
|
||||
def get_user_analyses(self, user_id: int) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Get all analyses for a user.
|
||||
|
||||
Args:
|
||||
user_id: User ID
|
||||
|
||||
Returns:
|
||||
List of analysis summaries
|
||||
"""
|
||||
try:
|
||||
analyses = self.db.query(ContentGapAnalysis).filter_by(
|
||||
user_id=user_id
|
||||
).order_by(ContentGapAnalysis.analysis_date.desc()).all()
|
||||
|
||||
return [{
|
||||
'id': analysis.id,
|
||||
'website_url': analysis.website_url,
|
||||
'industry': analysis.industry,
|
||||
'analysis_date': analysis.analysis_date,
|
||||
'status': analysis.status
|
||||
} for analysis in analyses]
|
||||
|
||||
except SQLAlchemyError as e:
|
||||
st.error(f"Error retrieving user analyses: {str(e)}")
|
||||
return []
|
||||
|
||||
def update_recommendation_status(self, recommendation_id: int, status: str) -> bool:
|
||||
"""
|
||||
Update the status of a recommendation.
|
||||
|
||||
Args:
|
||||
recommendation_id: Recommendation ID
|
||||
status: New status
|
||||
|
||||
Returns:
|
||||
True if successful, False otherwise
|
||||
"""
|
||||
try:
|
||||
recommendation = self.db.query(ContentRecommendation).get(recommendation_id)
|
||||
if recommendation:
|
||||
recommendation.status = status
|
||||
recommendation.updated_at = datetime.utcnow()
|
||||
self.db.commit()
|
||||
return True
|
||||
return False
|
||||
|
||||
except SQLAlchemyError as e:
|
||||
self.db.rollback()
|
||||
st.error(f"Error updating recommendation status: {str(e)}")
|
||||
return False
|
||||
|
||||
def delete_analysis(self, analysis_id: int) -> bool:
|
||||
"""
|
||||
Delete an analysis and all related data.
|
||||
|
||||
Args:
|
||||
analysis_id: Analysis ID
|
||||
|
||||
Returns:
|
||||
True if successful, False otherwise
|
||||
"""
|
||||
try:
|
||||
analysis = self.db.query(ContentGapAnalysis).get(analysis_id)
|
||||
if analysis:
|
||||
self.db.delete(analysis)
|
||||
self.db.commit()
|
||||
return True
|
||||
return False
|
||||
|
||||
except SQLAlchemyError as e:
|
||||
self.db.rollback()
|
||||
st.error(f"Error deleting analysis: {str(e)}")
|
||||
return False
|
||||
Reference in New Issue
Block a user