Content Calendar, Content Gap Analysis, and Content Optimization

2025-05-27 09:15:08 +05:30
parent 4049d19787
commit 889021c078
100 changed files with 18504 additions and 1251 deletions
--- a/lib/ai_seo_tools/content_gap_analysis/utils/README.md
+++ b/lib/ai_seo_tools/content_gap_analysis/utils/README.md
@@ -0,0 +1,249 @@
+# Content Gap Analysis Utils
+
+This directory contains utility modules that power the Content Gap Analysis tool. These modules provide core functionality for data collection, processing, analysis, and storage.
+
+## Directory Structure
+
+```
+utils/
+├── README.md
+├── ai_processor.py      # AI-powered content analysis and processing
+├── content_parser.py    # Content structure parsing and analysis
+├── data_collector.py    # Website data collection and processing
+└── storage.py          # Analysis results storage and retrieval
+```
+
+## Module Descriptions
+
+### 1. AI Processor (`ai_processor.py`)
+
+The AI Processor module enhances content analysis using AI techniques. It provides intelligent analysis of website content, competitor data, and keyword research.
+
+#### Key Features:
+- Content quality assessment
+- Topic analysis and clustering
+- Performance metrics analysis
+- Strategic recommendations generation
+- Progress tracking for analysis tasks
+
+#### Main Components:
+- `AIProcessor`: Main class for AI-powered analysis
+- `ProgressTracker`: Tracks analysis progress and status
+
+#### Usage Example:
+```python
+from utils.ai_processor import AIProcessor
+
+processor = AIProcessor()
+analysis = processor.analyze_content({
+    'url': 'https://example.com',
+    'industry': 'technology',
+    'content': content_data
+})
+```
+
+### 2. Content Parser (`content_parser.py`)
+
+The Content Parser module handles the parsing and analysis of website content structure. It provides detailed insights into content organization and quality.
+
+#### Key Features:
+- Content structure analysis
+- Text statistics calculation
+- Topic extraction
+- Readability analysis
+- Content hierarchy analysis
+
+#### Main Components:
+- `ContentParser`: Main class for content parsing and analysis
+
+#### Usage Example:
+```python
+from utils.content_parser import ContentParser
+
+parser = ContentParser()
+structure = parser.parse_structure({
+    'main_content': content,
+    'html': html_content,
+    'headings': headings_data
+})
+```
+
+### 3. Data Collector (`data_collector.py`)
+
+The Data Collector module is responsible for gathering website data for analysis. It handles web scraping and data extraction.
+
+#### Key Features:
+- Website content collection
+- Meta data extraction
+- Heading structure analysis
+- Link and image extraction
+- Error handling and retry logic
+
+#### Main Components:
+- `DataCollector`: Main class for data collection
+
+#### Usage Example:
+```python
+from utils.data_collector import DataCollector
+
+collector = DataCollector()
+data = collector.collect('https://example.com')
+```
+
+### 4. Storage (`storage.py`)
+
+The Storage module manages the persistence and retrieval of analysis results. It provides a robust database interface for storing and accessing analysis data.
+
+#### Key Features:
+- Analysis results storage
+- Historical data management
+- Recommendation tracking
+- User-specific analysis storage
+- Error handling and rollback support
+
+#### Main Components:
+- `ContentGapAnalysisStorage`: Main class for storage operations
+
+#### Usage Example:
+```python
+from utils.storage import ContentGapAnalysisStorage
+
+storage = ContentGapAnalysisStorage(db_session)
+analysis_id = storage.save_analysis(
+    user_id=1,
+    website_url='https://example.com',
+    industry='technology',
+    results=analysis_results
+)
+```
+
+## Integration Points
+
+### 1. Website Analysis Integration
+```python
+from utils.data_collector import DataCollector
+from utils.content_parser import ContentParser
+from utils.ai_processor import AIProcessor
+
+# Collect data
+collector = DataCollector()
+data = collector.collect(url)
+
+# Parse content
+parser = ContentParser()
+structure = parser.parse_structure(data)
+
+# Process with AI
+processor = AIProcessor()
+analysis = processor.analyze_content({
+    'url': url,
+    'content': structure
+})
+```
+
+### 2. Storage Integration
+```python
+from utils.storage import ContentGapAnalysisStorage
+
+# Store analysis results
+storage = ContentGapAnalysisStorage(db_session)
+analysis_id = storage.save_analysis(
+    user_id=user_id,
+    website_url=url,
+    industry=industry,
+    results=analysis_results
+)
+
+# Retrieve analysis
+results = storage.get_analysis(analysis_id)
+```
+
+## Error Handling
+
+All modules implement comprehensive error handling:
+
+1. **Data Collection Errors**
+   - Network timeouts
+   - Invalid URLs
+   - Access restrictions
+   - Parsing errors
+
+2. **Processing Errors**
+   - Invalid data formats
+   - AI processing failures
+   - Resource limitations
+   - Analysis timeouts
+
+3. **Storage Errors**
+   - Database connection issues
+   - Transaction failures
+   - Data validation errors
+   - Concurrent access conflicts
+
+## Best Practices
+
+1. **Data Collection**
+   - Implement rate limiting
+   - Use proper user agents
+   - Handle redirects
+   - Validate input data
+
+2. **Content Processing**
+   - Clean and normalize data
+   - Handle encoding issues
+   - Implement fallback strategies
+   - Cache processed results
+
+3. **Storage Management**
+   - Use transactions
+   - Implement data validation
+   - Handle concurrent access
+   - Maintain data integrity
+
+## Future Enhancements
+
+1. **Performance Optimizations**
+   - Implement parallel processing
+   - Add caching layer
+   - Optimize database queries
+   - Enhance error recovery
+
+2. **Feature Additions**
+   - Content performance tracking
+   - Automated content planning
+   - Enhanced competitive intelligence
+   - Advanced topic clustering
+
+3. **Integration Improvements**
+   - API endpoints
+   - Export capabilities
+   - Data visualization
+   - Progress tracking
+
+4. **UI/UX Enhancements**
+   - Interactive visualizations
+   - Real-time progress updates
+   - Export interfaces
+   - Customization options
+
+## Contributing
+
+When contributing to these utility modules:
+
+1. Follow the existing code structure
+2. Add comprehensive error handling
+3. Include unit tests
+4. Update documentation
+5. Follow PEP 8 style guide
+
+## Dependencies
+
+- BeautifulSoup4: HTML parsing
+- NLTK: Natural language processing
+- SQLAlchemy: Database operations
+- Streamlit: UI components
+- Requests: HTTP requests
+
+## License
+
+This project is licensed under the MIT License - see the LICENSE file for details. 
--- a/lib/ai_seo_tools/content_gap_analysis/utils/init.py
+++ b/lib/ai_seo_tools/content_gap_analysis/utils/init.py
@@ -0,0 +1,13 @@
+"""
+Utility modules for content gap analysis.
+"""
+
+from .data_collector import DataCollector
+from .content_parser import ContentParser
+from .ai_processor import AIProcessor
+
+__all__ = [
+    'DataCollector',
+    'ContentParser',
+    'AIProcessor'
+] 
--- a/lib/ai_seo_tools/content_gap_analysis/utils/ai_processor.py
+++ b/lib/ai_seo_tools/content_gap_analysis/utils/ai_processor.py
--- a/lib/ai_seo_tools/content_gap_analysis/utils/content_parser.py
+++ b/lib/ai_seo_tools/content_gap_analysis/utils/content_parser.py
@@ -0,0 +1,236 @@
+"""
+Content parser utility for analyzing website content structure.
+"""
+
+from typing import Dict, Any, List
+import re
+from bs4 import BeautifulSoup
+import nltk
+from nltk.tokenize import sent_tokenize, word_tokenize
+from nltk.corpus import stopwords
+from collections import Counter
+
+class ContentParser:
+    """Parser for analyzing website content structure."""
+    
+    def __init__(self):
+        """Initialize the content parser."""
+        try:
+            nltk.data.find('tokenizers/punkt')
+        except LookupError:
+            nltk.download('punkt')
+        try:
+            nltk.data.find('corpora/stopwords')
+        except LookupError:
+            nltk.download('stopwords')
+        
+        self.stop_words = set(stopwords.words('english'))
+    
+    def parse_structure(self, content: Dict[str, Any]) -> Dict[str, Any]:
+        """
+        Parse and analyze the structure of website content.
+        
+        Args:
+            content: Dictionary containing website content
+            
+        Returns:
+            Dictionary containing parsed content structure
+        """
+        try:
+            # Parse main content
+            main_content = content.get('main_content', '')
+            soup = BeautifulSoup(content.get('html', ''), 'html.parser')
+            
+            # Extract text statistics
+            text_stats = self._analyze_text(main_content)
+            
+            # Extract content sections
+            sections = self._extract_sections(soup)
+            
+            # Extract topics
+            topics = self._extract_topics(main_content)
+            
+            # Analyze readability
+            readability = self._analyze_readability(main_content)
+            
+            # Analyze content hierarchy
+            hierarchy = self._analyze_hierarchy(content.get('headings', []))
+            
+            return {
+                'text_statistics': text_stats,
+                'sections': sections,
+                'topics': topics,
+                'readability': readability,
+                'hierarchy': hierarchy,
+                'metadata': content.get('metadata', {})
+            }
+            
+        except Exception as e:
+            return {
+                'error': str(e),
+                'text_statistics': {},
+                'sections': [],
+                'topics': [],
+                'readability': {},
+                'hierarchy': {},
+                'metadata': {}
+            }
+    
+    def _analyze_text(self, text: str) -> Dict[str, Any]:
+        """Analyze text statistics."""
+        sentences = sent_tokenize(text)
+        words = word_tokenize(text.lower())
+        words = [w for w in words if w.isalnum() and w not in self.stop_words]
+        
+        return {
+            'word_count': len(words),
+            'sentence_count': len(sentences),
+            'average_sentence_length': len(words) / max(len(sentences), 1),
+            'unique_words': len(set(words)),
+            'stop_words': len([w for w in word_tokenize(text.lower()) if w in self.stop_words]),
+            'characters': len(text),
+            'paragraphs': len(text.split('\n\n')),
+            'sentences': sentences
+        }
+    
+    def _extract_sections(self, soup: BeautifulSoup) -> List[Dict[str, Any]]:
+        """Extract content sections."""
+        sections = []
+        
+        # Find main content containers
+        containers = soup.find_all(['article', 'section', 'div'], class_=re.compile(r'content|main|article|section'))
+        
+        for container in containers:
+            # Get section heading
+            heading = container.find(['h1', 'h2', 'h3'])
+            heading_text = heading.get_text().strip() if heading else 'Untitled Section'
+            
+            # Get section content
+            content = container.get_text().strip()
+            
+            # Get section type
+            section_type = container.name
+            if container.get('class'):
+                section_type = ' '.join(container.get('class'))
+            
+            sections.append({
+                'heading': heading_text,
+                'content': content,
+                'type': section_type,
+                'word_count': len(word_tokenize(content)),
+                'position': self._get_element_position(container)
+            })
+        
+        return sections
+    
+    def _extract_topics(self, text: str) -> List[Dict[str, Any]]:
+        """Extract main topics from content."""
+        # Tokenize and clean text
+        words = word_tokenize(text.lower())
+        words = [w for w in words if w.isalnum() and w not in self.stop_words]
+        
+        # Get word frequencies
+        word_freq = Counter(words)
+        
+        # Get top topics
+        topics = []
+        for word, freq in word_freq.most_common(10):
+            topics.append({
+                'topic': word,
+                'frequency': freq,
+                'percentage': freq / len(words) * 100
+            })
+        
+        return topics
+    
+    def _analyze_readability(self, text: str) -> Dict[str, float]:
+        """Analyze text readability."""
+        sentences = sent_tokenize(text)
+        words = word_tokenize(text.lower())
+        words = [w for w in words if w.isalnum()]
+        
+        # Calculate average sentence length
+        avg_sentence_length = len(words) / max(len(sentences), 1)
+        
+        # Calculate average word length
+        avg_word_length = sum(len(w) for w in words) / max(len(words), 1)
+        
+        # Calculate Flesch Reading Ease score
+        # Formula: 206.835 - 1.015(total words/total sentences) - 84.6(total syllables/total words)
+        syllables = sum(self._count_syllables(w) for w in words)
+        flesch_score = 206.835 - 1.015 * avg_sentence_length - 84.6 * (syllables / max(len(words), 1))
+        
+        return {
+            'flesch_score': max(0, min(100, flesch_score)),
+            'avg_sentence_length': avg_sentence_length,
+            'avg_word_length': avg_word_length,
+            'syllables_per_word': syllables / max(len(words), 1)
+        }
+    
+    def _analyze_hierarchy(self, headings: List[Dict[str, Any]]) -> Dict[str, Any]:
+        """Analyze content hierarchy."""
+        # Group headings by level
+        heading_levels = {}
+        for heading in headings:
+            level = heading['level']
+            if level not in heading_levels:
+                heading_levels[level] = []
+            heading_levels[level].append(heading)
+        
+        # Calculate hierarchy metrics
+        total_headings = len(headings)
+        max_depth = max(int(level[1]) for level in heading_levels.keys()) if heading_levels else 0
+        
+        return {
+            'total_headings': total_headings,
+            'max_depth': max_depth,
+            'heading_distribution': {level: len(headings) for level, headings in heading_levels.items()},
+            'has_proper_hierarchy': self._check_proper_hierarchy(heading_levels)
+        }
+    
+    def _check_proper_hierarchy(self, heading_levels: Dict[str, List[Dict[str, Any]]]) -> bool:
+        """Check if headings follow proper hierarchy."""
+        if not heading_levels:
+            return False
+        
+        # Check if h1 exists
+        if 'h1' not in heading_levels:
+            return False
+        
+        # Check if h1 is unique
+        if len(heading_levels['h1']) > 1:
+            return False
+        
+        # Check if levels are sequential
+        levels = sorted(int(level[1]) for level in heading_levels.keys())
+        return all(levels[i] - levels[i-1] <= 1 for i in range(1, len(levels)))
+    
+    def _count_syllables(self, word: str) -> int:
+        """Count syllables in a word."""
+        word = word.lower()
+        count = 0
+        vowels = 'aeiouy'
+        word = word.lower()
+        if word[0] in vowels:
+            count += 1
+        for index in range(1, len(word)):
+            if word[index] in vowels and word[index - 1] not in vowels:
+                count += 1
+        if word.endswith('e'):
+            count -= 1
+        if count == 0:
+            count += 1
+        return count
+    
+    def _get_element_position(self, element) -> Dict[str, int]:
+        """Get element position in the document."""
+        try:
+            return {
+                'top': element.sourceline,
+                'left': element.sourcepos
+            }
+        except:
+            return {
+                'top': 0,
+                'left': 0
+            } 
--- a/lib/ai_seo_tools/content_gap_analysis/utils/data_collector.py
+++ b/lib/ai_seo_tools/content_gap_analysis/utils/data_collector.py
@@ -0,0 +1,112 @@
+"""
+Data collector utility for content gap analysis.
+"""
+
+import requests
+from bs4 import BeautifulSoup
+from typing import Dict, Any
+
+class DataCollector:
+    """
+    Collects and processes website data for analysis.
+    """
+    
+    def __init__(self):
+        """Initialize the data collector."""
+        self.headers = {
+            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
+        }
+    
+    def collect(self, url: str) -> Dict[str, Any]:
+        """
+        Collect website data for analysis.
+        
+        Args:
+            url (str): The URL to collect data from
+            
+        Returns:
+            dict: Collected website data
+        """
+        try:
+            # Fetch webpage content
+            response = requests.get(url, headers=self.headers)
+            response.raise_for_status()
+            
+            # Parse HTML content
+            soup = BeautifulSoup(response.text, 'html.parser')
+            
+            # Extract relevant data
+            data = {
+                'url': url,
+                'title': self._extract_title(soup),
+                'meta_description': self._extract_meta_description(soup),
+                'headings': self._extract_headings(soup),
+                'content': self._extract_content(soup),
+                'links': self._extract_links(soup),
+                'images': self._extract_images(soup)
+            }
+            
+            return data
+            
+        except Exception as e:
+            return {
+                'error': str(e),
+                'url': url
+            }
+    
+    def _extract_title(self, soup: BeautifulSoup) -> str:
+        """Extract page title."""
+        title = soup.find('title')
+        return title.text if title else ''
+    
+    def _extract_meta_description(self, soup: BeautifulSoup) -> str:
+        """Extract meta description."""
+        meta = soup.find('meta', attrs={'name': 'description'})
+        return meta.get('content', '') if meta else ''
+    
+    def _extract_headings(self, soup: BeautifulSoup) -> Dict[str, list]:
+        """Extract all headings."""
+        headings = {}
+        for i in range(1, 7):
+            tags = soup.find_all(f'h{i}')
+            headings[f'h{i}'] = [tag.text.strip() for tag in tags]
+        return headings
+    
+    def _extract_content(self, soup: BeautifulSoup) -> str:
+        """Extract main content."""
+        # Remove script and style elements
+        for script in soup(['script', 'style']):
+            script.decompose()
+        
+        # Get text content
+        text = soup.get_text()
+        
+        # Clean up text
+        lines = (line.strip() for line in text.splitlines())
+        chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
+        text = ' '.join(chunk for chunk in chunks if chunk)
+        
+        return text
+    
+    def _extract_links(self, soup: BeautifulSoup) -> list:
+        """Extract all links."""
+        links = []
+        for link in soup.find_all('a'):
+            href = link.get('href')
+            if href:
+                links.append({
+                    'url': href,
+                    'text': link.text.strip()
+                })
+        return links
+    
+    def _extract_images(self, soup: BeautifulSoup) -> list:
+        """Extract all images."""
+        images = []
+        for img in soup.find_all('img'):
+            images.append({
+                'src': img.get('src', ''),
+                'alt': img.get('alt', ''),
+                'title': img.get('title', '')
+            })
+        return images 
--- a/lib/ai_seo_tools/content_gap_analysis/utils/seo_analyzer.py
+++ b/lib/ai_seo_tools/content_gap_analysis/utils/seo_analyzer.py
@@ -0,0 +1,237 @@
+"""
+SEO analyzer utility for content gap analysis.
+"""
+
+import requests
+from bs4 import BeautifulSoup
+from urllib.parse import urlparse, urljoin
+import re
+from typing import Dict, Any, List, Optional
+from ....utils.website_analyzer.analyzer import WebsiteAnalyzer
+
+def analyze_onpage_seo(url: str) -> Dict[str, Any]:
+    """
+    Analyze on-page SEO elements of a website.
+    
+    Args:
+        url: The URL to analyze
+        
+    Returns:
+        Dictionary containing SEO analysis results
+    """
+    try:
+        # Use the combined website analyzer
+        analyzer = WebsiteAnalyzer()
+        analysis = analyzer.analyze_website(url)
+        
+        if not analysis.get('success', False):
+            return {
+                'error': analysis.get('error', 'Unknown error in SEO analysis'),
+                'meta_title': '',
+                'meta_description': '',
+                'has_robots_txt': False,
+                'has_sitemap': False,
+                'mobile_friendly': False,
+                'load_time': 0
+            }
+        
+        # Extract relevant information from the analysis
+        seo_info = analysis['data']['analysis']['seo_info']
+        basic_info = analysis['data']['analysis']['basic_info']
+        performance = analysis['data']['analysis']['performance']
+        
+        return {
+            'meta_tags': seo_info.get('meta_tags', {}),
+            'content': seo_info.get('content', {}),
+            'meta_title': basic_info.get('title', ''),
+            'meta_description': basic_info.get('meta_description', ''),
+            'has_robots_txt': bool(basic_info.get('robots_txt')),
+            'has_sitemap': bool(basic_info.get('sitemap')),
+            'mobile_friendly': True,  # This would need to be implemented separately
+            'load_time': performance.get('load_time', 0)
+        }
+    except Exception as e:
+        return {
+            'error': str(e),
+            'meta_title': '',
+            'meta_description': '',
+            'has_robots_txt': False,
+            'has_sitemap': False,
+            'mobile_friendly': False,
+            'load_time': 0
+        }
+
+def _analyze_meta_tags(soup: BeautifulSoup) -> Dict[str, Any]:
+    """Analyze meta tags of the webpage."""
+    meta_tags = {}
+    
+    # Title tag
+    title_tag = soup.find('title')
+    if title_tag:
+        meta_tags['title'] = title_tag.string.strip()
+    
+    # Meta description
+    meta_desc = soup.find('meta', {'name': 'description'})
+    if meta_desc:
+        meta_tags['description'] = meta_desc.get('content', '').strip()
+    
+    # Meta keywords
+    meta_keywords = soup.find('meta', {'name': 'keywords'})
+    if meta_keywords:
+        meta_tags['keywords'] = meta_keywords.get('content', '').strip()
+    
+    # Open Graph tags
+    og_tags = {}
+    for tag in soup.find_all('meta', property=re.compile(r'^og:')):
+        og_tags[tag['property']] = tag.get('content', '')
+    meta_tags['og_tags'] = og_tags
+    
+    # Twitter Card tags
+    twitter_tags = {}
+    for tag in soup.find_all('meta', name=re.compile(r'^twitter:')):
+        twitter_tags[tag['name']] = tag.get('content', '')
+    meta_tags['twitter_tags'] = twitter_tags
+    
+    return meta_tags
+
+def _analyze_headings(soup: BeautifulSoup) -> Dict[str, Any]:
+    """Analyze heading structure of the webpage."""
+    headings = {
+        'h1': [],
+        'h2': [],
+        'h3': [],
+        'h4': [],
+        'h5': [],
+        'h6': []
+    }
+    
+    for tag in ['h1', 'h2', 'h3', 'h4', 'h5', 'h6']:
+        for heading in soup.find_all(tag):
+            headings[tag].append(heading.get_text().strip())
+    
+    return headings
+
+def _analyze_content(soup: BeautifulSoup) -> Dict[str, Any]:
+    """Analyze main content of the webpage."""
+    # Find main content
+    main_content = soup.find('main') or soup.find('article') or soup.find('div', class_=re.compile(r'content|main|article'))
+    
+    if not main_content:
+        return {
+            'word_count': 0,
+            'paragraph_count': 0,
+            'content': ''
+        }
+    
+    # Get text content
+    content = main_content.get_text()
+    
+    # Count words and paragraphs
+    words = content.split()
+    paragraphs = main_content.find_all('p')
+    
+    return {
+        'word_count': len(words),
+        'paragraph_count': len(paragraphs),
+        'content': content
+    }
+
+def _analyze_links(soup: BeautifulSoup, base_url: str) -> Dict[str, Any]:
+    """Analyze links on the webpage."""
+    links = {
+        'internal': [],
+        'external': [],
+        'broken': []
+    }
+    
+    base_domain = urlparse(base_url).netloc
+    
+    for link in soup.find_all('a', href=True):
+        href = link['href']
+        
+        # Handle relative URLs
+        if not href.startswith(('http://', 'https://')):
+            href = urljoin(base_url, href)
+        
+        # Categorize link
+        if urlparse(href).netloc == base_domain:
+            links['internal'].append({
+                'url': href,
+                'text': link.get_text().strip(),
+                'title': link.get('title', '')
+            })
+        else:
+            links['external'].append({
+                'url': href,
+                'text': link.get_text().strip(),
+                'title': link.get('title', '')
+            })
+    
+    return links
+
+def _analyze_images(soup: BeautifulSoup) -> Dict[str, Any]:
+    """Analyze images on the webpage."""
+    images = []
+    
+    for img in soup.find_all('img'):
+        image_data = {
+            'src': img.get('src', ''),
+            'alt': img.get('alt', ''),
+            'title': img.get('title', ''),
+            'width': img.get('width', ''),
+            'height': img.get('height', ''),
+            'has_alt': bool(img.get('alt')),
+            'has_title': bool(img.get('title')),
+            'has_dimensions': bool(img.get('width') and img.get('height'))
+        }
+        images.append(image_data)
+    
+    return {
+        'total': len(images),
+        'with_alt': sum(1 for img in images if img['has_alt']),
+        'with_title': sum(1 for img in images if img['has_title']),
+        'with_dimensions': sum(1 for img in images if img['has_dimensions']),
+        'images': images
+    }
+
+def _check_technical_elements(soup: BeautifulSoup, url: str) -> Dict[str, Any]:
+    """Check technical SEO elements."""
+    base_url = urlparse(url)
+    domain = base_url.netloc
+    
+    # Check robots.txt
+    robots_url = f"{base_url.scheme}://{domain}/robots.txt"
+    try:
+        robots_response = requests.get(robots_url, timeout=5)
+        has_robots_txt = robots_response.status_code == 200
+    except:
+        has_robots_txt = False
+    
+    # Check sitemap
+    sitemap_url = f"{base_url.scheme}://{domain}/sitemap.xml"
+    try:
+        sitemap_response = requests.get(sitemap_url, timeout=5)
+        has_sitemap = sitemap_response.status_code == 200
+    except:
+        has_sitemap = False
+    
+    # Check mobile friendliness
+    viewport = soup.find('meta', {'name': 'viewport'})
+    has_viewport = bool(viewport)
+    
+    # Check canonical URL
+    canonical = soup.find('link', {'rel': 'canonical'})
+    has_canonical = bool(canonical)
+    
+    # Check language
+    html_lang = soup.find('html').get('lang', '')
+    has_language = bool(html_lang)
+    
+    return {
+        'has_robots_txt': has_robots_txt,
+        'has_sitemap': has_sitemap,
+        'mobile_friendly': has_viewport,
+        'has_canonical': has_canonical,
+        'has_language': has_language,
+        'language': html_lang
+    } 
--- a/lib/ai_seo_tools/content_gap_analysis/utils/storage.py
+++ b/lib/ai_seo_tools/content_gap_analysis/utils/storage.py
@@ -0,0 +1,270 @@
+"""
+Storage module for content gap analysis results.
+"""
+
+from typing import Dict, Any, List, Optional
+from datetime import datetime
+from sqlalchemy.orm import Session
+from sqlalchemy.exc import SQLAlchemyError
+import streamlit as st
+
+class ContentGapAnalysisStorage:
+    """Handles storage and retrieval of content gap analysis results."""
+    
+    def __init__(self, db_session: Session):
+        """Initialize the storage handler."""
+        self.db = db_session
+    
+    def save_analysis(self, user_id: int, website_url: str, industry: str, results: Dict[str, Any]) -> Optional[int]:
+        """
+        Save content gap analysis results.
+        
+        Args:
+            user_id: User ID
+            website_url: Target website URL
+            industry: Industry category
+            results: Analysis results dictionary
+            
+        Returns:
+            Analysis ID if successful, None otherwise
+        """
+        try:
+            # Create main analysis record
+            analysis = ContentGapAnalysis(
+                user_id=user_id,
+                website_url=website_url,
+                industry=industry,
+                status='completed',
+                metadata={'version': '1.0'}
+            )
+            self.db.add(analysis)
+            self.db.flush()  # Get the ID without committing
+            
+            # Save website analysis
+            website_analysis = WebsiteAnalysis(
+                content_gap_analysis_id=analysis.id,
+                content_score=results.get('website', {}).get('content_score', 0),
+                seo_score=results.get('website', {}).get('seo_score', 0),
+                structure_score=results.get('website', {}).get('structure_score', 0),
+                content_metrics=results.get('website', {}).get('content_metrics', {}),
+                seo_metrics=results.get('website', {}).get('seo_metrics', {}),
+                technical_metrics=results.get('website', {}).get('technical_metrics', {}),
+                ai_insights=results.get('website', {}).get('ai_insights', {})
+            )
+            self.db.add(website_analysis)
+            
+            # Save competitor analysis if available
+            if 'competitors' in results:
+                for competitor in results['competitors']:
+                    competitor_analysis = CompetitorAnalysis(
+                        content_gap_analysis_id=analysis.id,
+                        competitor_url=competitor.get('url'),
+                        market_position=competitor.get('market_position', {}),
+                        content_gaps=competitor.get('content_gaps', []),
+                        competitive_advantages=competitor.get('competitive_advantages', []),
+                        trend_analysis=competitor.get('trend_analysis', {})
+                    )
+                    self.db.add(competitor_analysis)
+            
+            # Save keyword analysis
+            keyword_analysis = KeywordAnalysis(
+                content_gap_analysis_id=analysis.id,
+                top_keywords=results.get('keywords', {}).get('top_keywords', []),
+                search_intent=results.get('keywords', {}).get('search_intent', {}),
+                opportunities=results.get('keywords', {}).get('opportunities', []),
+                trend_analysis=results.get('keywords', {}).get('trend_analysis', {})
+            )
+            self.db.add(keyword_analysis)
+            
+            # Save recommendations
+            for recommendation in results.get('recommendations', []):
+                content_recommendation = ContentRecommendation(
+                    content_gap_analysis_id=analysis.id,
+                    recommendation_type=recommendation.get('type'),
+                    priority_score=recommendation.get('priority_score', 0),
+                    recommendation=recommendation.get('recommendation', ''),
+                    implementation_steps=recommendation.get('implementation_steps', []),
+                    expected_impact=recommendation.get('expected_impact', {}),
+                    status='pending'
+                )
+                self.db.add(content_recommendation)
+            
+            # Save analysis history
+            history = AnalysisHistory(
+                content_gap_analysis_id=analysis.id,
+                status='completed',
+                metrics={'duration': results.get('duration', 0)}
+            )
+            self.db.add(history)
+            
+            # Commit all changes
+            self.db.commit()
+            return analysis.id
+            
+        except SQLAlchemyError as e:
+            self.db.rollback()
+            st.error(f"Error saving analysis results: {str(e)}")
+            return None
+    
+    def get_analysis(self, analysis_id: int) -> Optional[Dict[str, Any]]:
+        """
+        Retrieve content gap analysis results.
+        
+        Args:
+            analysis_id: Analysis ID
+            
+        Returns:
+            Dictionary containing analysis results if found, None otherwise
+        """
+        try:
+            analysis = self.db.query(ContentGapAnalysis).get(analysis_id)
+            if not analysis:
+                return None
+            
+            # Get website analysis
+            website_analysis = self.db.query(WebsiteAnalysis).filter_by(
+                content_gap_analysis_id=analysis_id
+            ).first()
+            
+            # Get competitor analysis
+            competitor_analyses = self.db.query(CompetitorAnalysis).filter_by(
+                content_gap_analysis_id=analysis_id
+            ).all()
+            
+            # Get keyword analysis
+            keyword_analysis = self.db.query(KeywordAnalysis).filter_by(
+                content_gap_analysis_id=analysis_id
+            ).first()
+            
+            # Get recommendations
+            recommendations = self.db.query(ContentRecommendation).filter_by(
+                content_gap_analysis_id=analysis_id
+            ).all()
+            
+            # Get analysis history
+            history = self.db.query(AnalysisHistory).filter_by(
+                content_gap_analysis_id=analysis_id
+            ).order_by(AnalysisHistory.run_date.desc()).all()
+            
+            return {
+                'id': analysis.id,
+                'website_url': analysis.website_url,
+                'industry': analysis.industry,
+                'analysis_date': analysis.analysis_date,
+                'status': analysis.status,
+                'website': {
+                    'content_score': website_analysis.content_score,
+                    'seo_score': website_analysis.seo_score,
+                    'structure_score': website_analysis.structure_score,
+                    'content_metrics': website_analysis.content_metrics,
+                    'seo_metrics': website_analysis.seo_metrics,
+                    'technical_metrics': website_analysis.technical_metrics,
+                    'ai_insights': website_analysis.ai_insights
+                } if website_analysis else {},
+                'competitors': [{
+                    'url': ca.competitor_url,
+                    'market_position': ca.market_position,
+                    'content_gaps': ca.content_gaps,
+                    'competitive_advantages': ca.competitive_advantages,
+                    'trend_analysis': ca.trend_analysis
+                } for ca in competitor_analyses],
+                'keywords': {
+                    'top_keywords': keyword_analysis.top_keywords,
+                    'search_intent': keyword_analysis.search_intent,
+                    'opportunities': keyword_analysis.opportunities,
+                    'trend_analysis': keyword_analysis.trend_analysis
+                } if keyword_analysis else {},
+                'recommendations': [{
+                    'type': r.recommendation_type,
+                    'priority_score': r.priority_score,
+                    'recommendation': r.recommendation,
+                    'implementation_steps': r.implementation_steps,
+                    'expected_impact': r.expected_impact,
+                    'status': r.status
+                } for r in recommendations],
+                'history': [{
+                    'run_date': h.run_date,
+                    'status': h.status,
+                    'metrics': h.metrics,
+                    'error_log': h.error_log
+                } for h in history]
+            }
+            
+        except SQLAlchemyError as e:
+            st.error(f"Error retrieving analysis results: {str(e)}")
+            return None
+    
+    def get_user_analyses(self, user_id: int) -> List[Dict[str, Any]]:
+        """
+        Get all analyses for a user.
+        
+        Args:
+            user_id: User ID
+            
+        Returns:
+            List of analysis summaries
+        """
+        try:
+            analyses = self.db.query(ContentGapAnalysis).filter_by(
+                user_id=user_id
+            ).order_by(ContentGapAnalysis.analysis_date.desc()).all()
+            
+            return [{
+                'id': analysis.id,
+                'website_url': analysis.website_url,
+                'industry': analysis.industry,
+                'analysis_date': analysis.analysis_date,
+                'status': analysis.status
+            } for analysis in analyses]
+            
+        except SQLAlchemyError as e:
+            st.error(f"Error retrieving user analyses: {str(e)}")
+            return []
+    
+    def update_recommendation_status(self, recommendation_id: int, status: str) -> bool:
+        """
+        Update the status of a recommendation.
+        
+        Args:
+            recommendation_id: Recommendation ID
+            status: New status
+            
+        Returns:
+            True if successful, False otherwise
+        """
+        try:
+            recommendation = self.db.query(ContentRecommendation).get(recommendation_id)
+            if recommendation:
+                recommendation.status = status
+                recommendation.updated_at = datetime.utcnow()
+                self.db.commit()
+                return True
+            return False
+            
+        except SQLAlchemyError as e:
+            self.db.rollback()
+            st.error(f"Error updating recommendation status: {str(e)}")
+            return False
+    
+    def delete_analysis(self, analysis_id: int) -> bool:
+        """
+        Delete an analysis and all related data.
+        
+        Args:
+            analysis_id: Analysis ID
+            
+        Returns:
+            True if successful, False otherwise
+        """
+        try:
+            analysis = self.db.query(ContentGapAnalysis).get(analysis_id)
+            if analysis:
+                self.db.delete(analysis)
+                self.db.commit()
+                return True
+            return False
+            
+        except SQLAlchemyError as e:
+            self.db.rollback()
+            st.error(f"Error deleting analysis: {str(e)}")
+            return False