Files
ALwrity/ToBeMigrated/utils/website_analyzer/README.md
2025-08-06 16:29:49 +05:30

181 lines
4.6 KiB
Markdown

# Website Analyzer Module
A comprehensive website analysis toolkit that provides detailed insights into website performance, SEO metrics, and content quality. This module combines traditional web analysis techniques with AI-powered content evaluation to deliver actionable recommendations.
## Features
### 1. Comprehensive Website Analysis
- Basic website information extraction
- SSL/TLS certificate validation
- DNS record analysis
- WHOIS information retrieval
- Content analysis and structure evaluation
- Performance metrics assessment
### 2. Advanced SEO Analysis
- Meta tag optimization analysis
- Content quality evaluation
- Keyword density analysis
- Readability scoring
- Heading structure analysis
- AI-powered content recommendations
### 3. Technical Infrastructure
- Asynchronous web crawling
- Multi-threaded analysis
- Robust error handling
- Comprehensive logging
- Type-safe data models
## Module Structure
### 1. `analyzer.py`
The main analysis engine that provides comprehensive website analysis.
#### Key Components:
- `WebsiteAnalyzer` class
- URL validation
- Basic website information extraction
- SSL/TLS certificate checking
- DNS record analysis
- WHOIS information retrieval
- Content analysis
- Performance metrics assessment
#### Features:
- Concurrent analysis using ThreadPoolExecutor
- Robust error handling and logging
- User-agent simulation for reliable scraping
- Timeout handling for requests
- Comprehensive result formatting
### 2. `seo_analyzer.py`
Specialized SEO analysis module with AI integration.
#### Key Components:
- `extract_content()`: Fetches and parses webpage content
- `analyze_meta_tags()`: Evaluates meta tags and SEO elements
- `analyze_content_with_ai()`: AI-powered content analysis
- `analyze_seo()`: Main SEO analysis function
#### Features:
- Meta tag optimization analysis
- Content quality scoring
- Keyword density analysis
- Readability evaluation
- AI-powered recommendations
- Weighted scoring system
### 3. `models.py`
Data models for structured analysis results.
#### Key Components:
- `SEORecommendation`: Individual SEO recommendations
- `MetaTagAnalysis`: Meta tag analysis results
- `ContentAnalysis`: Content analysis metrics
- `SEOAnalysisResult`: Complete analysis results
#### Features:
- Type-safe data structures
- Clear data organization
- Easy serialization/deserialization
- Comprehensive documentation
## Usage Examples
### Basic Website Analysis
```python
from website_analyzer import analyze_website
# Analyze a website
results = analyze_website("https://example.com")
# Access analysis results
if results["success"]:
data = results["data"]
print(f"Domain: {data['domain']}")
print(f"SSL Info: {data['analysis']['ssl_info']}")
print(f"Content Info: {data['analysis']['content_info']}")
```
### SEO Analysis
```python
from website_analyzer.seo_analyzer import analyze_seo
# Perform SEO analysis
seo_results = analyze_seo("https://example.com", "your-openai-api-key")
# Access SEO results
if seo_results.success:
print(f"Overall Score: {seo_results.overall_score}")
print(f"Meta Tags: {seo_results.meta_tags}")
print(f"Content Analysis: {seo_results.content}")
print(f"Recommendations: {seo_results.recommendations}")
```
## Dependencies
- `requests`: HTTP requests
- `beautifulsoup4`: HTML parsing
- `python-whois`: WHOIS information
- `dnspython`: DNS record analysis
- `openai`: AI-powered analysis
- `loguru`: Logging
- `typing`: Type hints
- `dataclasses`: Data models
## Error Handling
The module implements comprehensive error handling:
- URL validation
- Request timeouts
- Connection errors
- Parsing errors
- API errors
- DNS resolution errors
- SSL/TLS errors
All errors are logged and returned in a structured format for easy handling.
## Logging
The module uses `loguru` for logging with the following features:
- File rotation (500 MB)
- 10-day retention
- Debug level logging
- Structured log format
- Both file and stdout output
## Best Practices
1. **API Key Management**
- Store API keys securely
- Use environment variables
- Implement rate limiting
2. **Error Handling**
- Always check success status
- Handle errors gracefully
- Log errors appropriately
3. **Performance**
- Use concurrent analysis
- Implement timeouts
- Cache results when possible
4. **Rate Limiting**
- Respect website robots.txt
- Implement delays between requests
- Use appropriate user agents
## Contributing
1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Create a Pull Request
## License
This module is part of the ALwrity project and is licensed under the MIT License.