AI Image and Audio Generation Improvements.

AI Video Generation Pre-Flight Checklist. Cost Estimate Improvements.
2025-12-25 16:26:08 +05:30
parent 59913bffa9
commit 7512933c65
163 changed files with 8938 additions and 37401 deletions
--- a/ToBeMigrated/ai_writers/github_blogs/README.md
+++ b/ToBeMigrated/ai_writers/github_blogs/README.md
@@ -1,259 +0,0 @@
-# GitHub Blog Generator
-
-A powerful AI-powered content generation system that automatically creates comprehensive documentation, tutorials, and guides from GitHub repositories. This module transforms GitHub repository data into various types of high-quality technical content.
-
-## Features
-
-### 1. Content Generation Types
-
-The system can generate the following types of content from GitHub repositories:
-
- **Getting Started Guides**
-  - Introduction and Overview
-  - Prerequisites and Setup
-  - Installation Instructions
-  - Basic Usage Examples
-  - Common Use Cases
-  - Best Practices
-  - Next Steps and Resources
-
- **Technical Documentation**
-  - Architecture Overview
-  - Core Components
-  - Technical Specifications
-  - Integration Points
-  - Performance Considerations
-  - Security Features
-  - API Documentation
-  - Configuration Options
-  - Deployment Guidelines
-  - Troubleshooting Guide
-
- **Tutorial Series**
-  - Beginner Tutorials
-    - Basic concepts
-    - Simple examples
-    - Step-by-step instructions
-  - Intermediate Tutorials
-    - Advanced features
-    - Real-world examples
-    - Best practices
-  - Advanced Tutorials
-    - Complex use cases
-    - Performance optimization
-    - Integration patterns
-
- **Comparison Analysis**
-  - Feature Comparison
-  - Performance Analysis
-  - Use Case Suitability
-  - Community and Support
-  - Learning Curve
-  - Integration Capabilities
-  - Future Prospects
-
- **Case Studies**
-  - Problem Statement
-  - Solution Implementation
-  - Technical Challenges
-  - Results and Benefits
-  - Lessons Learned
-  - Future Improvements
-
- **Contribution Guides**
-  - Development Setup
-  - Code Style Guidelines
-  - Testing Requirements
-  - Documentation Standards
-  - Pull Request Process
-  - Review Guidelines
-  - Community Guidelines
-
- **Security Guides**
-  - Security Architecture
-  - Authentication & Authorization
-  - Data Protection
-  - Secure Configuration
-  - Vulnerability Management
-  - Incident Response
-  - Compliance Requirements
-
- **Performance Guides**
-  - Performance Metrics
-  - Optimization Techniques
-  - Benchmarking Guidelines
-  - Resource Management
-  - Scaling Strategies
-  - Monitoring Setup
-  - Troubleshooting
-
-### 2. GitHub Content Scraping
-
-The module includes a sophisticated GitHub content scraper with the following capabilities:
-
- **Rate Limiting**
-  - Configurable API call limits
-  - Automatic request throttling
-  - Concurrent request management
-
- **Caching System**
-  - Configurable cache duration (TTL)
-  - Automatic cache invalidation
-  - Efficient storage of scraped content
-
- **Content Extraction**
-  - Repository metadata
-  - README content
-  - File contents
-  - Repository topics
-  - Contributor information
-  - License information
-
-### 3. Content Enhancement
-
- **Online Research Integration**
-  - Automatic topic research
-  - Related content discovery
-  - Industry trend analysis
-
- **FAQ Generation**
-  - Automatic FAQ creation
-  - Common question identification
-  - Comprehensive answers
-
- **Metadata Generation**
-  - SEO-optimized titles
-  - Meta descriptions
-  - Tags and categories
-  - Content structuring
-
-## Usage Examples
-
-### Basic Usage
-
-```python
-from lib.ai_writers.github_blogs import GitHubBlogGenerator
-
-# Initialize the generator
-generator = GitHubBlogGenerator()
-
-# Generate content for a GitHub repository
-content = await generator.generate_content(
-    github_url="https://github.com/owner/repo",
-    content_types=["getting_started", "technical_docs", "tutorials"]
-)
-
-# Save the generated content
-generator.save_content(content, "my_repository")
-```
-
-### Advanced Usage
-
-```python
-from lib.ai_writers.github_blogs import GitHubBlogGenerator
-
-# Initialize with custom settings
-generator = GitHubBlogGenerator(
-    cache_dir=".custom_cache",
-    ttl_hours=48
-)
-
-# Generate all content types
-content_types = [
-    "getting_started",
-    "technical_docs",
-    "tutorials",
-    "comparison",
-    "case_studies",
-    "contribution",
-    "security",
-    "performance"
-]
-
-# Generate content for multiple repositories
-urls = [
-    "https://github.com/owner/repo1",
-    "https://github.com/owner/repo2"
-]
-
-for url in urls:
-    content = await generator.generate_content(url, content_types)
-    generator.save_content(content, url.split("/")[-1])
-```
-
-## Configuration Options
-
-### GitHubBlogGenerator
-
- `cache_dir` (str): Directory for caching scraped content (default: ".github_cache")
- `ttl_hours` (int): Time-to-live for cached content in hours (default: 24)
-
-### Content Generation
-
- `gpt_provider` (str): Choice of AI provider ("gemini" or "openai")
- `content_types` (List[str]): Types of content to generate
- `github_url` (str): URL of the GitHub repository
-
-## Output Format
-
-All generated content is saved in Markdown format with the following structure:
-
-```markdown
-# [Title]
-
-[Generated content based on content type]
-
-## Metadata
- Title: [SEO-optimized title]
- Description: [Meta description]
- Tags: [Generated tags]
- Categories: [Generated categories]
-```
-
-## Best Practices
-
-1. **Rate Limiting**
-   - Configure appropriate rate limits based on your GitHub API quota
-   - Use caching to minimize API calls
-   - Implement proper error handling for rate limit exceeded scenarios
-
-2. **Content Generation**
-   - Start with basic content types before generating advanced content
-   - Review generated content for accuracy and completeness
-   - Customize prompts for specific repository types
-
-3. **Caching**
-   - Set appropriate TTL based on repository update frequency
-   - Clear cache when repository content changes significantly
-   - Monitor cache size and performance
-
-4. **Error Handling**
-   - Implement proper error handling for API failures
-   - Log errors for debugging
-   - Provide fallback mechanisms for failed content generation
-
-## Dependencies
-
- Python 3.8+
- aiohttp
- beautifulsoup4
- loguru
- pydantic
- requests
- pandas
-
-## Contributing
-
-1. Fork the repository
-2. Create a feature branch
-3. Commit your changes
-4. Push to the branch
-5. Create a Pull Request
-
-## License
-
-[Your License Here]
-
-## Support
-
-For support, please [create an issue](https://github.com/your-repo/issues) or contact the maintainers. 
--- a/ToBeMigrated/ai_writers/github_blogs/github_getting_started.py
+++ b/ToBeMigrated/ai_writers/github_blogs/github_getting_started.py
@@ -1,254 +0,0 @@
-"""
-Enhanced GitHub Content Generator
-
-This module provides various content generation capabilities from GitHub repository data,
-including getting started guides, technical documentation, tutorials, and more.
-"""
-
-import sys
-from typing import Dict, List, Optional
-from loguru import logger
-
-from lib.gpt_providers.text_generation.main_text_generation import llm_text_gen
-
-logger.remove()
-logger.add(sys.stdout,
-          colorize=True,
-          format="<level>{level}</level>|<green>{file}:{line}:{function}</green>| {message}")
-
-def generate_technical_documentation(repo_data: Dict, gpt_provider: str = "gemini") -> str:
-    """Generate comprehensive technical documentation from repository data."""
-    prompt = f"""As an expert technical writer, create detailed technical documentation for the following GitHub repository:
-
-Repository Data:
-{repo_data}
-
-Please create a comprehensive technical documentation that includes:
-1. Architecture Overview
-2. Core Components
-3. Technical Specifications
-4. Integration Points
-5. Performance Considerations
-6. Security Features
-7. API Documentation (if applicable)
-8. Configuration Options
-9. Deployment Guidelines
-10. Troubleshooting Guide
-
-Format the documentation in markdown with appropriate headers, code blocks, and diagrams.
-Include real-world examples and best practices.
-"""
-    return _get_llm_response(prompt, gpt_provider)
-
-def generate_getting_started_guide(repo_data: Dict, gpt_provider: str = "gemini") -> str:
-    """Generate a beginner-friendly getting started guide."""
-    prompt = f"""As an expert programmer and teacher, create a comprehensive getting started guide for the following GitHub repository:
-
-Repository Data:
-{repo_data}
-
-Create a step-by-step guide that includes:
-1. Introduction and Overview
-2. Prerequisites and Setup
-3. Installation Instructions
-4. Basic Usage Examples
-5. Common Use Cases
-6. Best Practices
-7. Next Steps and Resources
-
-Make the guide:
- Beginner-friendly with clear explanations
- Include practical examples with code snippets
- Add emojis for better readability
- Include troubleshooting tips
- Provide links to additional resources
-"""
-    return _get_llm_response(prompt, gpt_provider)
-
-def generate_tutorial_series(repo_data: Dict, gpt_provider: str = "gemini") -> str:
-    """Generate a series of tutorials for different skill levels."""
-    prompt = f"""As an expert educator, create a series of tutorials for the following GitHub repository:
-
-Repository Data:
-{repo_data}
-
-Create a structured tutorial series that includes:
-1. Beginner Tutorial
-   - Basic concepts
-   - Simple examples
-   - Step-by-step instructions
-
-2. Intermediate Tutorial
-   - Advanced features
-   - Real-world examples
-   - Best practices
-
-3. Advanced Tutorial
-   - Complex use cases
-   - Performance optimization
-   - Integration patterns
-
-Each tutorial should:
- Be self-contained
- Include practical examples
- Have clear learning objectives
- Include exercises and challenges
-"""
-    return _get_llm_response(prompt, gpt_provider)
-
-def generate_comparison_analysis(repo_data: Dict, gpt_provider: str = "gemini") -> str:
-    """Generate a comparison analysis with similar tools/frameworks."""
-    prompt = f"""As a technical analyst, create a comprehensive comparison analysis for the following GitHub repository:
-
-Repository Data:
-{repo_data}
-
-Create a detailed comparison that includes:
-1. Feature Comparison
-2. Performance Analysis
-3. Use Case Suitability
-4. Community and Support
-5. Learning Curve
-6. Integration Capabilities
-7. Future Prospects
-
-Include:
- Pros and Cons
- Real-world use cases
- Industry adoption
- Community feedback
- Future roadmap
-"""
-    return _get_llm_response(prompt, gpt_provider)
-
-def generate_case_studies(repo_data: Dict, gpt_provider: str = "gemini") -> str:
-    """Generate real-world case studies and success stories."""
-    prompt = f"""As a technical writer, create compelling case studies for the following GitHub repository:
-
-Repository Data:
-{repo_data}
-
-Create detailed case studies that include:
-1. Problem Statement
-2. Solution Implementation
-3. Technical Challenges
-4. Results and Benefits
-5. Lessons Learned
-6. Future Improvements
-
-Make the case studies:
- Based on real-world scenarios
- Include technical details
- Show measurable results
- Provide actionable insights
-"""
-    return _get_llm_response(prompt, gpt_provider)
-
-def generate_contribution_guide(repo_data: Dict, gpt_provider: str = "gemini") -> str:
-    """Generate a comprehensive contribution guide."""
-    prompt = f"""As an open-source maintainer, create a detailed contribution guide for the following GitHub repository:
-
-Repository Data:
-{repo_data}
-
-Create a contribution guide that includes:
-1. Development Setup
-2. Code Style Guidelines
-3. Testing Requirements
-4. Documentation Standards
-5. Pull Request Process
-6. Review Guidelines
-7. Community Guidelines
-
-Make the guide:
- Clear and concise
- Include examples
- Cover all contribution types
- Provide templates
-"""
-    return _get_llm_response(prompt, gpt_provider)
-
-def generate_security_guide(repo_data: Dict, gpt_provider: str = "gemini") -> str:
-    """Generate a security best practices guide."""
-    prompt = f"""As a security expert, create a comprehensive security guide for the following GitHub repository:
-
-Repository Data:
-{repo_data}
-
-Create a security guide that includes:
-1. Security Architecture
-2. Authentication & Authorization
-3. Data Protection
-4. Secure Configuration
-5. Vulnerability Management
-6. Incident Response
-7. Compliance Requirements
-
-Make the guide:
- Practical and actionable
- Include security checklists
- Provide code examples
- Cover common vulnerabilities
-"""
-    return _get_llm_response(prompt, gpt_provider)
-
-def generate_performance_guide(repo_data: Dict, gpt_provider: str = "gemini") -> str:
-    """Generate a performance optimization guide."""
-    prompt = f"""As a performance optimization expert, create a detailed performance guide for the following GitHub repository:
-
-Repository Data:
-{repo_data}
-
-Create a performance guide that includes:
-1. Performance Metrics
-2. Optimization Techniques
-3. Benchmarking Guidelines
-4. Resource Management
-5. Scaling Strategies
-6. Monitoring Setup
-7. Troubleshooting
-
-Make the guide:
- Data-driven
- Include benchmarks
- Provide optimization tips
- Cover different scales
-"""
-    return _get_llm_response(prompt, gpt_provider)
-
-def _get_llm_response(prompt: str, gpt_provider: str) -> str:
-    """Get response from the specified LLM provider."""
-    system_prompt = """You are an expert technical writer and GitHub repository analyst with deep expertise in software development, documentation, and technical communication.
-
-  Your role is to create high-quality, accurate, and engaging content based on GitHub repository data. You should:
-
-  1. **Technical Accuracy**
-     - Ensure all technical information is precise and up-to-date
-     - Verify code examples and configurations
-     - Cross-reference documentation and source code
-     - Maintain consistency with repository standards
-
-  2. **Content Structure**
-     - Use clear hierarchical organization
-     - Include appropriate code blocks and examples
-     - Add relevant diagrams and visual aids
-     - Break complex topics into digestible sections
-
-  3. **Writing Style**
-     - Maintain a professional yet approachable tone
-     - Use active voice and clear language
-     - Include practical examples and use cases
-     - Add relevant emojis for better readability
-
-  4. **Best Practices**
-     - Follow industry-standard documentation practices
-     - Include troubleshooting sections
-     - Add performance considerations
-     - Address security implications
-"""
-    try:
-        
-        llm_response = llm_text_gen(prompt, system_prompt=system_prompt)
-    except Exception as err:
-        logger.error(f"Failed to get response from {gpt_provider}: {err}")
-        raise
--- a/ToBeMigrated/ai_writers/github_blogs/main_getting_started_blogs.py
+++ b/ToBeMigrated/ai_writers/github_blogs/main_getting_started_blogs.py
@@ -1,157 +0,0 @@
-"""
-Enhanced GitHub Blog Generator
-
-This module provides comprehensive content generation from GitHub repositories,
-including technical documentation, tutorials, case studies, and more.
-"""
-
-import os
-import sys
-import datetime
-import json
-from typing import Dict, List, Optional
-from pathlib import Path
-
-from loguru import logger
-logger.remove()
-logger.add(sys.stdout,
-          colorize=True,
-          format="<level>{level}</level>|<green>{file}:{line}:{function}</green>| {message}")
-
-from .scrape_github_readme import GitHubScraper, GitHubContent
-from .scrape_github_readme import get_gh_details_vision, get_readme_content
-from .scrape_github_readme import research_github_topics, check_if_already_written
-from .github_getting_started import (
-    generate_technical_documentation,
-    generate_getting_started_guide,
-    generate_tutorial_series,
-    generate_comparison_analysis,
-    generate_case_studies,
-    generate_contribution_guide,
-    generate_security_guide,
-    generate_performance_guide
-)
-
-
-class GitHubBlogGenerator:
-    """Generator for various types of GitHub-related content."""
-    
-    def __init__(self, cache_dir: str = ".github_cache", ttl_hours: int = 24):
-        """Initialize the blog generator."""
-        self.cache_dir = Path(cache_dir)
-        self.scraper = GitHubScraper(cache_dir, ttl_hours)
-        self.output_dir = Path("generated_content")
-        self.output_dir.mkdir(exist_ok=True)
-    
-    async def generate_content(self, github_url: str, content_types: List[str] = None) -> Dict[str, str]:
-        """Generate various types of content from a GitHub repository."""
-        if content_types is None:
-            content_types = ["getting_started", "technical_docs", "tutorials"]
-        
-        try:
-            # Scrape GitHub content
-            repo_content = await self.scraper.scrape_github_content(github_url)
-            
-            # Generate different types of content
-            generated_content = {}
-            
-            for content_type in content_types:
-                if content_type == "getting_started":
-                    content = generate_getting_started_guide(repo_content.dict())
-                elif content_type == "technical_docs":
-                    content = generate_technical_documentation(repo_content.dict())
-                elif content_type == "tutorials":
-                    content = generate_tutorial_series(repo_content.dict())
-                elif content_type == "comparison":
-                    content = generate_comparison_analysis(repo_content.dict())
-                elif content_type == "case_studies":
-                    content = generate_case_studies(repo_content.dict())
-                elif content_type == "contribution":
-                    content = generate_contribution_guide(repo_content.dict())
-                elif content_type == "security":
-                    content = generate_security_guide(repo_content.dict())
-                elif content_type == "performance":
-                    content = generate_performance_guide(repo_content.dict())
-                else:
-                    logger.warning(f"Unknown content type: {content_type}")
-                    continue
-                
-                generated_content[content_type] = content
-            
-            # Generate FAQs from online research
-            try:
-                research_report = do_online_research(repo_content.title, "gemini", github_url)
-                faqs = generate_blog_faq(research_report, "gemini")
-                generated_content["faqs"] = faqs
-            except Exception as err:
-                logger.error(f"Failed to generate FAQs: {err}")
-            
-            return generated_content
-            
-        except Exception as err:
-            logger.error(f"Failed to generate content: {err}")
-            raise
-    
-    def save_content(self, content: Dict[str, str], base_filename: str):
-        """Save generated content to files."""
-        try:
-            for content_type, content_text in content.items():
-                # Generate metadata for each content type
-                title, meta_desc, tags, categories = blog_metadata(content_text, "gemini")
-                
-                # Create filename with content type
-                filename = f"{base_filename}_{content_type}.md"
-                
-                # Save content to file
-                save_blog_to_file(
-                    content_text,
-                    title,
-                    meta_desc,
-                    tags,
-                    categories,
-                    None  # No image path for now
-                )
-                
-                logger.info(f"Saved {content_type} content to {filename}")
-                
-        except Exception as err:
-            logger.error(f"Failed to save content: {err}")
-            raise
-
-async def main():
-    """Example usage of the GitHub blog generator."""
-    generator = GitHubBlogGenerator()
-    
-    # Example GitHub URLs
-    urls = [
-        "https://github.com/owner/repo",
-        "https://github.com/owner/another-repo"
-    ]
-    
-    content_types = [
-        "getting_started",
-        "technical_docs",
-        "tutorials",
-        "comparison",
-        "case_studies",
-        "contribution",
-        "security",
-        "performance"
-    ]
-    
-    for url in urls:
-        try:
-            # Generate content
-            content = await generator.generate_content(url, content_types)
-            
-            # Create base filename from URL
-            base_filename = url.split("/")[-1]
-            
-            # Save content
-            generator.save_content(content, base_filename)
-            
-        except Exception as e:
-            logger.error(f"Error processing {url}: {e}")
-
-if __name__ == "__main__":
-    asyncio.run(main())
--- a/ToBeMigrated/ai_writers/github_blogs/scrape_github_readme.py
+++ b/ToBeMigrated/ai_writers/github_blogs/scrape_github_readme.py
@@ -1,427 +0,0 @@
-"""
-Enhanced GitHub Content Scraper with Rate Limiting and Caching
-
-This module provides functionality to scrape GitHub repositories, READMEs, and code files
-for content marketing purposes. It includes async support, rate limiting, caching,
-and comprehensive metadata collection.
-"""
-
-import os
-import sys
-import json
-import asyncio
-import aiohttp
-from datetime import datetime, timedelta
-from typing import Dict, List, Optional, Union
-from urllib.parse import urljoin, urlparse
-import pandas as pd
-from bs4 import BeautifulSoup
-from loguru import logger
-import requests
-from pydantic import BaseModel, Field
-import time
-import pickle
-from pathlib import Path
-
-# Configure logging
-logger.remove()
-logger.add(sys.stdout,
-        colorize=True,
-          format="<level>{level}</level>|<green>{file}:{line}:{function}</green>| {message}")
-
-class RateLimiter:
-    """Rate limiter for GitHub API requests."""
-    
-    def __init__(self, calls_per_minute: int = 30):
-        self.calls_per_minute = calls_per_minute
-        self.interval = 60 / calls_per_minute  # seconds between calls
-        self.last_call_time = 0
-        self.lock = asyncio.Lock()
-    
-    async def acquire(self):
-        """Acquire rate limit token."""
-        async with self.lock:
-            current_time = time.time()
-            time_since_last_call = current_time - self.last_call_time
-            
-            if time_since_last_call < self.interval:
-                await asyncio.sleep(self.interval - time_since_last_call)
-            
-            self.last_call_time = time.time()
-
-class Cache:
-    """Cache for GitHub content."""
-    
-    def __init__(self, cache_dir: str = ".github_cache", ttl_hours: int = 24):
-        self.cache_dir = Path(cache_dir)
-        self.ttl = timedelta(hours=ttl_hours)
-        self.cache_dir.mkdir(exist_ok=True)
-    
-    def _get_cache_path(self, key: str) -> Path:
-        """Get cache file path for a key."""
-        return self.cache_dir / f"{hash(key)}.cache"
-    
-    def get(self, key: str) -> Optional[Dict]:
-        """Get cached value for key."""
-        cache_path = self._get_cache_path(key)
-        
-        if not cache_path.exists():
-            return None
-        
-        try:
-            with open(cache_path, 'rb') as f:
-                data = pickle.load(f)
-                if datetime.now() - data['timestamp'] > self.ttl:
-                    cache_path.unlink()
-                    return None
-                return data['value']
-        except Exception as e:
-            logger.warning(f"Cache read error for {key}: {e}")
-            return None
-    
-    def set(self, key: str, value: Dict):
-        """Set cache value for key."""
-        cache_path = self._get_cache_path(key)
-        
-        try:
-            with open(cache_path, 'wb') as f:
-                pickle.dump({
-                    'timestamp': datetime.now(),
-                    'value': value
-                }, f)
-        except Exception as e:
-            logger.warning(f"Cache write error for {key}: {e}")
-
-class GitHubContent(BaseModel):
-    """Model for GitHub content analysis."""
-    title: str = Field("", description="Title of the content")
-    description: str = Field("", description="Description of the content")
-    content: str = Field("", description="Main content")
-    language: str = Field("", description="Programming language")
-    stars: int = Field(0, description="Number of stars")
-    forks: int = Field(0, description="Number of forks")
-    watchers: int = Field(0, description="Number of watchers")
-    last_updated: str = Field("", description="Last update date")
-    topics: List[str] = Field([], description="Repository topics")
-    contributors: List[str] = Field([], description="Contributor usernames")
-    readme_url: str = Field("", description="URL of the README")
-    raw_content_url: str = Field("", description="URL for raw content")
-    license: str = Field("", description="Repository license")
-    dependencies: List[str] = Field([], description="Project dependencies")
-    metadata: Dict = Field({}, description="Additional metadata")
-
-class GitHubScraper:
-    """Service for scraping GitHub content with rate limiting and caching."""
-    
-    def __init__(self, cache_dir: str = ".github_cache", ttl_hours: int = 24, calls_per_minute: int = 30):
-        """Initialize the scraper service."""
-        self.session = None
-        self.headers = {
-            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
-            'Accept': 'application/vnd.github.v3+json'
-        }
-        self.rate_limiter = RateLimiter(calls_per_minute)
-        self.cache = Cache(cache_dir, ttl_hours)
-    
-    async def __aenter__(self):
-        """Create aiohttp session when entering context."""
-        self.session = aiohttp.ClientSession(headers=self.headers)
-        return self
-        
-    async def __aexit__(self, exc_type, exc_val, exc_tb):
-        """Close aiohttp session when exiting context."""
-        if self.session:
-            await self.session.close()
-    
-    async def fetch_url(self, url: str, use_cache: bool = True) -> str:
-        """Fetch URL content asynchronously with rate limiting and caching."""
-        if use_cache:
-            cached_content = self.cache.get(url)
-            if cached_content:
-                logger.debug(f"Cache hit for {url}")
-                return cached_content
-        
-        await self.rate_limiter.acquire()
-        
-        try:
-            async with self.session.get(url) as response:
-                if response.status == 200:
-                    content = await response.text()
-                    if use_cache:
-                        self.cache.set(url, content)
-                    return content
-                else:
-                    error_msg = f"Failed to fetch URL: Status code {response.status}"
-                    logger.error(error_msg)
-                    raise Exception(error_msg)
-        except Exception as e:
-            logger.error(f"Error fetching URL {url}: {e}")
-            raise
-    
-    def parse_github_url(self, url: str) -> Dict[str, str]:
-        """Parse GitHub URL to extract repository information."""
-        parsed = urlparse(url)
-        path_parts = parsed.path.strip('/').split('/')
-        
-        if len(path_parts) < 2:
-            raise ValueError("Invalid GitHub URL format")
-        
-        return {
-            'owner': path_parts[0],
-            'repo': path_parts[1],
-            'branch': path_parts[3] if len(path_parts) > 3 else 'main',
-            'path': '/'.join(path_parts[4:]) if len(path_parts) > 4 else ''
-        }
-    
-    async def get_repo_metadata(self, owner: str, repo: str) -> Dict:
-        """Get repository metadata from GitHub API with caching."""
-        cache_key = f"metadata_{owner}_{repo}"
-        cached_metadata = self.cache.get(cache_key)
-        if cached_metadata:
-            return cached_metadata
-        
-        await self.rate_limiter.acquire()
-        
-        api_url = f"https://api.github.com/repos/{owner}/{repo}"
-        try:
-            async with self.session.get(api_url) as response:
-                if response.status == 200:
-                    metadata = await response.json()
-                    self.cache.set(cache_key, metadata)
-                    return metadata
-                else:
-                    logger.error(f"Failed to fetch repo metadata: {response.status}")
-                    return {}
-        except Exception as e:
-            logger.error(f"Error fetching repo metadata: {e}")
-            return {}
-    
-    async def get_readme_content(self, owner: str, repo: str, branch: str = 'main') -> Dict:
-        """Get README content from GitHub with caching."""
-        cache_key = f"readme_{owner}_{repo}_{branch}"
-        cached_content = self.cache.get(cache_key)
-        if cached_content:
-            return cached_content
-        
-        try:
-            # Try to get README from API first
-            await self.rate_limiter.acquire()
-            api_url = f"https://api.github.com/repos/{owner}/{repo}/readme"
-            async with self.session.get(api_url) as response:
-                if response.status == 200:
-                    readme_data = await response.json()
-                    content = {
-                        'content': readme_data.get('content', ''),
-                        'encoding': readme_data.get('encoding', 'base64'),
-                        'url': readme_data.get('html_url', '')
-                    }
-                    self.cache.set(cache_key, content)
-                    return content
-            
-            # Fallback to scraping if API fails
-            readme_url = f"https://github.com/{owner}/{repo}/blob/{branch}/README.md"
-            html_content = await self.fetch_url(readme_url, use_cache=True)
-            soup = BeautifulSoup(html_content, 'html.parser')
-            
-            # Find the README content
-            readme_content = soup.find('div', {'class': 'markdown-body'})
-            if readme_content:
-                content = {
-                    'content': readme_content.get_text(),
-                    'encoding': 'text',
-                    'url': readme_url
-                }
-                self.cache.set(cache_key, content)
-                return content
-            
-            return {}
-        except Exception as e:
-            logger.error(f"Error fetching README: {e}")
-            return {}
-    
-    async def get_file_content(self, owner: str, repo: str, path: str, branch: str = 'main') -> Dict:
-        """Get content of a specific file from GitHub with caching."""
-        cache_key = f"file_{owner}_{repo}_{path}_{branch}"
-        cached_content = self.cache.get(cache_key)
-        if cached_content:
-            return cached_content
-        
-        try:
-            # Try to get file content from API first
-            await self.rate_limiter.acquire()
-            api_url = f"https://api.github.com/repos/{owner}/{repo}/contents/{path}?ref={branch}"
-            async with self.session.get(api_url) as response:
-                if response.status == 200:
-                    file_data = await response.json()
-                    content = {
-                        'content': file_data.get('content', ''),
-                        'encoding': file_data.get('encoding', 'base64'),
-                        'url': file_data.get('html_url', '')
-                    }
-                    self.cache.set(cache_key, content)
-                    return content
-            
-            # Fallback to scraping if API fails
-            file_url = f"https://github.com/{owner}/{repo}/blob/{branch}/{path}"
-            html_content = await self.fetch_url(file_url, use_cache=True)
-            soup = BeautifulSoup(html_content, 'html.parser')
-            
-            # Find the file content
-            file_content = soup.find('div', {'class': 'file-content'})
-            if file_content:
-                content = {
-                    'content': file_content.get_text(),
-                    'encoding': 'text',
-                    'url': file_url
-                }
-                self.cache.set(cache_key, content)
-                return content
-            
-            return {}
-        except Exception as e:
-            logger.error(f"Error fetching file content: {e}")
-            return {}
-    
-    async def get_repo_topics(self, owner: str, repo: str) -> List[str]:
-        """Get repository topics with caching."""
-        cache_key = f"topics_{owner}_{repo}"
-        cached_topics = self.cache.get(cache_key)
-        if cached_topics:
-            return cached_topics
-        
-        try:
-            await self.rate_limiter.acquire()
-            api_url = f"https://api.github.com/repos/{owner}/{repo}/topics"
-            async with self.session.get(api_url, headers={'Accept': 'application/vnd.github.mercy-preview+json'}) as response:
-                if response.status == 200:
-                    data = await response.json()
-                    topics = data.get('names', [])
-                    self.cache.set(cache_key, topics)
-                    return topics
-                return []
-        except Exception as e:
-            logger.error(f"Error fetching topics: {e}")
-            return []
-    
-    async def get_contributors(self, owner: str, repo: str) -> List[str]:
-        """Get repository contributors with caching."""
-        cache_key = f"contributors_{owner}_{repo}"
-        cached_contributors = self.cache.get(cache_key)
-        if cached_contributors:
-            return cached_contributors
-        
-        try:
-            await self.rate_limiter.acquire()
-            api_url = f"https://api.github.com/repos/{owner}/{repo}/contributors"
-            async with self.session.get(api_url) as response:
-                if response.status == 200:
-                    contributors = await response.json()
-                    contributor_list = [contributor['login'] for contributor in contributors]
-                    self.cache.set(cache_key, contributor_list)
-                    return contributor_list
-                return []
-        except Exception as e:
-            logger.error(f"Error fetching contributors: {e}")
-            return []
-    
-    async def scrape_github_content(self, url: str) -> GitHubContent:
-        """Main function to scrape GitHub content with caching."""
-        cache_key = f"content_{url}"
-        cached_content = self.cache.get(cache_key)
-        if cached_content:
-            return GitHubContent(**cached_content)
-        
-        try:
-            # Parse the GitHub URL
-            repo_info = self.parse_github_url(url)
-            
-            # Get repository metadata
-            metadata = await self.get_repo_metadata(repo_info['owner'], repo_info['repo'])
-            
-            # Get content based on URL type
-            if not repo_info['path'] or repo_info['path'].lower() == 'readme.md':
-                content_data = await self.get_readme_content(
-                    repo_info['owner'], 
-                    repo_info['repo'], 
-                    repo_info['branch']
-                )
-            else:
-                content_data = await self.get_file_content(
-                    repo_info['owner'], 
-                    repo_info['repo'], 
-                    repo_info['path'], 
-                    repo_info['branch']
-                )
-            
-            # Get additional metadata
-            topics = await self.get_repo_topics(repo_info['owner'], repo_info['repo'])
-            contributors = await self.get_contributors(repo_info['owner'], repo_info['repo'])
-            
-            # Create GitHubContent object
-            content = GitHubContent(
-                title=metadata.get('name', ''),
-                description=metadata.get('description', ''),
-                content=content_data.get('content', ''),
-                language=metadata.get('language', ''),
-                stars=metadata.get('stargazers_count', 0),
-                forks=metadata.get('forks_count', 0),
-                watchers=metadata.get('watchers_count', 0),
-                last_updated=metadata.get('updated_at', ''),
-                topics=topics,
-                contributors=contributors,
-                readme_url=content_data.get('url', ''),
-                raw_content_url=metadata.get('html_url', ''),
-                license=metadata.get('license', {}).get('name', ''),
-                metadata={
-                    'size': metadata.get('size', 0),
-                    'open_issues': metadata.get('open_issues_count', 0),
-                    'default_branch': metadata.get('default_branch', 'main'),
-                    'created_at': metadata.get('created_at', ''),
-                    'pushed_at': metadata.get('pushed_at', '')
-                }
-            )
-            
-            # Cache the complete content
-            self.cache.set(cache_key, content.dict())
-            
-            return content
-            
-        except Exception as e:
-            logger.error(f"Error scraping GitHub content: {e}")
-            raise
-
-async def main():
-    """Example usage of the GitHub scraper with rate limiting and caching."""
-    scraper = GitHubScraper(
-        cache_dir=".github_cache",
-        ttl_hours=24,
-        calls_per_minute=30
-    )
-    
-    async with scraper:
-        # Example URLs
-        urls = [
-            "https://github.com/owner/repo",
-            "https://github.com/owner/repo/blob/main/README.md",
-            "https://github.com/owner/repo/blob/main/src/main.py"
-        ]
-        
-        for url in urls:
-            try:
-                content = await scraper.scrape_github_content(url)
-                print(f"Scraped content from {url}:")
-                print(json.dumps(content.dict(), indent=2))
-            except Exception as e:
-                print(f"Error scraping {url}: {e}")
-
-
-if __name__ == "__main__":
-    asyncio.run(main())
-
-
-
-
-
-
-
-