ALwrity AI Blog Writer - Added Google Grounding UI Implementation

This commit is contained in:
ajaysi
2025-09-18 18:45:53 +05:30
parent 9f13daf443
commit 4d153b292d
72 changed files with 11944 additions and 1526 deletions

View File

@@ -0,0 +1,256 @@
# ALwrity Billing & Subscription System Integration
## Overview
The ALwrity backend now includes a comprehensive billing and subscription system that automatically tracks API usage, calculates costs, and manages subscription limits. This system is fully integrated into the startup process and provides real-time monitoring capabilities.
## 🚀 Quick Start
### 1. Start the Backend with Billing System
```bash
# From the backend directory
python start_alwrity_backend.py
```
The startup script will automatically:
- ✅ Create billing and subscription database tables
- ✅ Initialize default pricing and subscription plans
- ✅ Set up usage tracking middleware
- ✅ Verify all billing components are working
- ✅ Start the server with billing endpoints enabled
### 2. Verify Installation
```bash
# Run the comprehensive verification script
python verify_billing_setup.py
```
### 3. Test API Endpoints
```bash
# Get subscription plans
curl http://localhost:8000/api/subscription/plans
# Get user usage (replace 'demo' with actual user ID)
curl http://localhost:8000/api/subscription/usage/demo
# Get billing dashboard data
curl http://localhost:8000/api/subscription/dashboard/demo
# Get API pricing information
curl http://localhost:8000/api/subscription/pricing
```
## 📊 Database Tables
The billing system creates the following tables:
| Table Name | Purpose |
|------------|---------|
| `subscription_plans` | Available subscription tiers and pricing |
| `user_subscriptions` | User subscription assignments |
| `api_usage_logs` | Detailed API usage tracking |
| `usage_summaries` | Aggregated usage statistics |
| `api_provider_pricing` | Cost per token for each AI provider |
| `usage_alerts` | Usage limit warnings and notifications |
| `billing_history` | Historical billing records |
## 🔧 System Components
### 1. Database Models (`models/subscription_models.py`)
- **SubscriptionPlan**: Subscription tiers and pricing
- **UserSubscription**: User subscription assignments
- **APIUsageLog**: Detailed usage tracking
- **UsageSummary**: Aggregated statistics
- **APIProviderPricing**: Cost calculations
- **UsageAlert**: Limit notifications
### 2. Services
- **PricingService** (`services/pricing_service.py`): Cost calculations and plan management
- **UsageTrackingService** (`services/usage_tracking_service.py`): Usage monitoring and limits
- **SubscriptionExceptionHandler** (`services/subscription_exception_handler.py`): Error handling
### 3. API Endpoints (`api/subscription_api.py`)
- `GET /api/subscription/plans` - Available subscription plans
- `GET /api/subscription/usage/{user_id}` - User usage statistics
- `GET /api/subscription/dashboard/{user_id}` - Dashboard data
- `GET /api/subscription/pricing` - API pricing information
- `GET /api/subscription/trends/{user_id}` - Usage trends
### 4. Middleware Integration
- **Monitoring Middleware** (`middleware/monitoring_middleware.py`): Automatic usage tracking
- **Exception Handling**: Graceful error handling for billing issues
## 🎯 Frontend Integration
The billing system is fully integrated with the frontend dashboard:
### CompactBillingDashboard
- Real-time usage metrics
- Cost tracking
- System health monitoring
- Interactive tooltips and help text
### EnhancedBillingDashboard
- Detailed usage breakdowns
- Provider-specific costs
- Usage trends and analytics
- Alert management
## 📈 Usage Tracking
The system automatically tracks:
- **API Calls**: Number of requests to each provider
- **Token Usage**: Input and output tokens for each request
- **Costs**: Real-time cost calculations
- **Response Times**: Performance monitoring
- **Error Rates**: Failed request tracking
- **User Activity**: Per-user usage patterns
## 💰 Pricing Configuration
### Default AI Provider Pricing (per token)
| Provider | Model | Input Cost | Output Cost |
|----------|-------|------------|-------------|
| OpenAI | GPT-4 | $0.00003 | $0.00006 |
| OpenAI | GPT-3.5-turbo | $0.0000015 | $0.000002 |
| Gemini | Gemini Pro | $0.0000005 | $0.0000015 |
| Anthropic | Claude-3 | $0.000008 | $0.000024 |
| Mistral | Mistral-7B | $0.0000002 | $0.0000006 |
### Subscription Plans
| Plan | Monthly Price | Yearly Price | API Limits |
|------|---------------|--------------|------------|
| Free | $0 | $0 | 1,000 calls/month |
| Starter | $29 | $290 | 10,000 calls/month |
| Professional | $99 | $990 | 100,000 calls/month |
| Enterprise | $299 | $2,990 | Unlimited |
## 🔍 Monitoring & Alerts
### Real-time Monitoring
- Usage tracking for all API calls
- Cost calculations in real-time
- Performance metrics
- Error rate monitoring
### Alert System
- Usage approaching limits (80% threshold)
- Cost overruns
- System health issues
- Provider-specific problems
## 🛠️ Development Mode
For development with auto-reload:
```bash
# Development mode with auto-reload
python start_alwrity_backend.py --dev
# Or with explicit reload flag
python start_alwrity_backend.py --reload
```
## 📝 Configuration
### Environment Variables
The system uses the following environment variables:
```bash
# Database
DATABASE_URL=sqlite:///./alwrity.db
# API Keys (configured through onboarding)
OPENAI_API_KEY=your_key_here
GEMINI_API_KEY=your_key_here
ANTHROPIC_API_KEY=your_key_here
MISTRAL_API_KEY=your_key_here
# Server Configuration
HOST=0.0.0.0
PORT=8000
DEBUG=true
```
### Custom Pricing
To modify pricing, update the `PricingService.initialize_default_pricing()` method in `services/pricing_service.py`.
## 🧪 Testing
### Run Verification Script
```bash
python verify_billing_setup.py
```
### Test Individual Components
```bash
# Test subscription system
python test_subscription_system.py
# Test billing tables creation
python scripts/create_billing_tables.py
```
## 🚨 Troubleshooting
### Common Issues
1. **Tables not created**: Run `python scripts/create_billing_tables.py`
2. **Missing dependencies**: Run `pip install -r requirements.txt`
3. **Database errors**: Check `DATABASE_URL` in environment
4. **API key issues**: Verify API keys are configured
### Debug Mode
Enable debug logging by setting `DEBUG=true` in your environment.
## 📚 API Documentation
Once the server is running, access the interactive API documentation:
- **Swagger UI**: http://localhost:8000/api/docs
- **ReDoc**: http://localhost:8000/api/redoc
## 🔄 Updates and Maintenance
### Adding New Providers
1. Add provider to `APIProvider` enum in `models/subscription_models.py`
2. Update pricing in `PricingService.initialize_default_pricing()`
3. Add provider detection in middleware
4. Update frontend provider chips
### Modifying Plans
1. Update `PricingService.initialize_default_plans()`
2. Modify plan limits and pricing
3. Test with verification script
## 📞 Support
For issues or questions:
1. Check the verification script output
2. Review the startup logs
3. Test individual components
4. Check database table creation
## 🎉 Success Indicators
You'll know the billing system is working when:
- ✅ Startup script shows "Billing and subscription tables created successfully"
- ✅ Verification script passes all checks
- ✅ API endpoints return data
- ✅ Frontend dashboard shows usage metrics
- ✅ Usage tracking middleware is active
The billing system is now fully integrated and ready for production use!

159
backend/test/check_db.py Normal file
View File

@@ -0,0 +1,159 @@
#!/usr/bin/env python3
"""
Database check and sample data creation script
"""
from services.database import get_db_session
from models.content_planning import ContentStrategy, ContentGapAnalysis, AIAnalysisResult
from sqlalchemy.orm import Session
import json
def check_database():
"""Check what data exists in the database"""
db = get_db_session()
try:
# Check strategies
strategies = db.query(ContentStrategy).all()
print(f"Found {len(strategies)} strategies")
for strategy in strategies:
print(f" Strategy {strategy.id}: {strategy.name} - {strategy.industry}")
# Check gap analyses
gap_analyses = db.query(ContentGapAnalysis).all()
print(f"Found {len(gap_analyses)} gap analyses")
# Check AI analytics
ai_analytics = db.query(AIAnalysisResult).all()
print(f"Found {len(ai_analytics)} AI analytics")
except Exception as e:
print(f"Error checking database: {e}")
finally:
db.close()
def create_sample_data():
"""Create sample data for Strategic Intelligence and Keyword Research tabs"""
db = get_db_session()
try:
# Create a sample strategy if none exists
existing_strategies = db.query(ContentStrategy).all()
if not existing_strategies:
sample_strategy = ContentStrategy(
name="Sample Content Strategy",
industry="Digital Marketing",
target_audience={"demographics": "Small to medium businesses", "interests": ["marketing", "technology"]},
content_pillars=["Educational Content", "Thought Leadership", "Case Studies"],
ai_recommendations={
"market_positioning": {
"score": 75,
"strengths": ["Strong brand voice", "Consistent content quality"],
"weaknesses": ["Limited video content", "Slow content production"]
},
"competitive_advantages": [
{"advantage": "AI-powered content creation", "impact": "High", "implementation": "In Progress"},
{"advantage": "Data-driven strategy", "impact": "Medium", "implementation": "Complete"}
],
"strategic_risks": [
{"risk": "Content saturation in market", "probability": "Medium", "impact": "High"},
{"risk": "Algorithm changes affecting reach", "probability": "High", "impact": "Medium"}
]
},
user_id=1
)
db.add(sample_strategy)
db.commit()
print("Created sample strategy")
# Create sample gap analysis
existing_gaps = db.query(ContentGapAnalysis).all()
if not existing_gaps:
sample_gap = ContentGapAnalysis(
website_url="https://example.com",
competitor_urls=["competitor1.com", "competitor2.com"],
target_keywords=["content marketing", "digital strategy", "SEO"],
analysis_results={
"gaps": ["Video content gap", "Local SEO opportunities"],
"opportunities": [
{"keyword": "AI content tools", "search_volume": "5K-10K", "competition": "Low", "cpc": "$2.50"},
{"keyword": "content marketing ROI", "search_volume": "1K-5K", "competition": "Medium", "cpc": "$4.20"}
]
},
recommendations=[
{
"type": "content",
"title": "Create video tutorials",
"description": "Address the video content gap",
"priority": "high"
},
{
"type": "seo",
"title": "Optimize for local search",
"description": "Target local keywords",
"priority": "medium"
}
],
user_id=1
)
db.add(sample_gap)
db.commit()
print("Created sample gap analysis")
# Create sample AI analytics
existing_ai = db.query(AIAnalysisResult).all()
if not existing_ai:
sample_ai = AIAnalysisResult(
analysis_type="strategic_intelligence",
insights=[
"Focus on video content to address market gap",
"Leverage AI tools for competitive advantage",
"Monitor algorithm changes closely"
],
recommendations=[
{
"type": "content",
"title": "Increase video content production",
"description": "Address the video content gap identified in analysis",
"priority": "high"
},
{
"type": "strategy",
"title": "Implement AI-powered content creation",
"description": "Leverage AI tools for competitive advantage",
"priority": "medium"
}
],
performance_metrics={
"content_engagement": 78.5,
"traffic_growth": 25.3,
"conversion_rate": 2.1
},
personalized_data_used={
"onboarding_data": True,
"user_preferences": True,
"historical_performance": True
},
processing_time=15.2,
ai_service_status="operational",
user_id=1
)
db.add(sample_ai)
db.commit()
print("Created sample AI analytics")
except Exception as e:
print(f"Error creating sample data: {e}")
db.rollback()
finally:
db.close()
if __name__ == "__main__":
print("Checking database...")
check_database()
print("\nCreating sample data...")
create_sample_data()
print("\nFinal database state:")
check_database()

View File

@@ -0,0 +1,104 @@
{
"test_summary": {
"total_duration": 52.56023073196411,
"total_tests": 4,
"successful_tests": 4,
"failed_tests": 0,
"total_api_calls": 4
},
"test_results": [
{
"test_name": "Single Phrase Test (Should be preserved as-is)",
"keyword_phrase": "ALwrity content generation",
"success": true,
"duration": 8.364419937133789,
"api_calls": 1,
"error": null,
"content_length": 44,
"sources_count": 0,
"citations_count": 0,
"grounding_status": {
"status": "success",
"sources_used": 0,
"citation_coverage": 0,
"quality_score": 0.0
},
"generation_metadata": {
"model_used": "gemini-2.0-flash-001",
"generation_time": 0.002626,
"research_time": 0.000537,
"grounding_enabled": true
}
},
{
"test_name": "Comma-Separated Test (Should be split by commas)",
"keyword_phrase": "AI tools, content creation, marketing automation",
"success": true,
"duration": 12.616755723953247,
"api_calls": 1,
"error": null,
"content_length": 44,
"sources_count": 5,
"citations_count": 3,
"grounding_status": {
"status": "success",
"sources_used": 5,
"citation_coverage": 0.6,
"quality_score": 0.359
},
"generation_metadata": {
"model_used": "gemini-2.0-flash-001",
"generation_time": 0.009273,
"research_time": 0.000285,
"grounding_enabled": true
}
},
{
"test_name": "Another Single Phrase Test",
"keyword_phrase": "LinkedIn content strategy",
"success": true,
"duration": 11.366000652313232,
"api_calls": 1,
"error": null,
"content_length": 44,
"sources_count": 4,
"citations_count": 3,
"grounding_status": {
"status": "success",
"sources_used": 4,
"citation_coverage": 0.75,
"quality_score": 0.359
},
"generation_metadata": {
"model_used": "gemini-2.0-flash-001",
"generation_time": 0.008166,
"research_time": 0.000473,
"grounding_enabled": true
}
},
{
"test_name": "Another Comma-Separated Test",
"keyword_phrase": "social media, digital marketing, brand awareness",
"success": true,
"duration": 12.107932806015015,
"api_calls": 1,
"error": null,
"content_length": 44,
"sources_count": 0,
"citations_count": 0,
"grounding_status": {
"status": "success",
"sources_used": 0,
"citation_coverage": 0,
"quality_score": 0.0
},
"generation_metadata": {
"model_used": "gemini-2.0-flash-001",
"generation_time": 0.004575,
"research_time": 0.000323,
"grounding_enabled": true
}
}
],
"timestamp": "2025-09-14T22:39:30.220518"
}

View File

@@ -0,0 +1,495 @@
"""
Unit tests for GroundingContextEngine.
Tests the enhanced grounding metadata utilization functionality.
"""
import pytest
from typing import List
from models.blog_models import (
GroundingMetadata,
GroundingChunk,
GroundingSupport,
Citation,
BlogOutlineSection,
BlogResearchResponse,
ResearchSource,
)
from services.blog_writer.outline.grounding_engine import GroundingContextEngine
class TestGroundingContextEngine:
"""Test cases for GroundingContextEngine."""
def setup_method(self):
"""Set up test fixtures."""
self.engine = GroundingContextEngine()
# Create sample grounding chunks
self.sample_chunks = [
GroundingChunk(
title="AI Research Study 2025: Machine Learning Breakthroughs",
url="https://research.university.edu/ai-study-2025",
confidence_score=0.95
),
GroundingChunk(
title="Enterprise AI Implementation Guide",
url="https://techcorp.com/enterprise-ai-guide",
confidence_score=0.88
),
GroundingChunk(
title="Machine Learning Algorithms Explained",
url="https://blog.datascience.com/ml-algorithms",
confidence_score=0.82
),
GroundingChunk(
title="AI Ethics and Responsible Development",
url="https://ethics.org/ai-responsible-development",
confidence_score=0.90
),
GroundingChunk(
title="Personal Opinion on AI Trends",
url="https://personal-blog.com/ai-opinion",
confidence_score=0.65
)
]
# Create sample grounding supports
self.sample_supports = [
GroundingSupport(
confidence_scores=[0.92, 0.89],
grounding_chunk_indices=[0, 1],
segment_text="Recent research shows that artificial intelligence is transforming enterprise operations with significant improvements in efficiency and decision-making capabilities.",
start_index=0,
end_index=150
),
GroundingSupport(
confidence_scores=[0.85, 0.78],
grounding_chunk_indices=[2, 3],
segment_text="Machine learning algorithms are becoming more sophisticated, enabling better pattern recognition and predictive analytics in business applications.",
start_index=151,
end_index=300
),
GroundingSupport(
confidence_scores=[0.45, 0.52],
grounding_chunk_indices=[4],
segment_text="Some people think AI is overhyped and won't deliver on its promises.",
start_index=301,
end_index=400
)
]
# Create sample citations
self.sample_citations = [
Citation(
citation_type="expert_opinion",
start_index=0,
end_index=50,
text="AI research shows significant improvements in enterprise operations",
source_indices=[0],
reference="Source 1"
),
Citation(
citation_type="statistical_data",
start_index=51,
end_index=100,
text="85% of enterprises report improved efficiency with AI implementation",
source_indices=[1],
reference="Source 2"
),
Citation(
citation_type="research_study",
start_index=101,
end_index=150,
text="University study demonstrates 40% increase in decision-making accuracy",
source_indices=[0],
reference="Source 1"
)
]
# Create sample grounding metadata
self.sample_grounding_metadata = GroundingMetadata(
grounding_chunks=self.sample_chunks,
grounding_supports=self.sample_supports,
citations=self.sample_citations,
search_entry_point="AI trends and enterprise implementation",
web_search_queries=[
"AI trends 2025 enterprise",
"machine learning business applications",
"AI implementation best practices"
]
)
# Create sample outline section
self.sample_section = BlogOutlineSection(
id="s1",
heading="AI Implementation in Enterprise",
subheadings=["Benefits of AI", "Implementation Challenges", "Best Practices"],
key_points=["Improved efficiency", "Cost reduction", "Better decision making"],
references=[],
target_words=400,
keywords=["AI", "enterprise", "implementation", "machine learning"]
)
def test_extract_contextual_insights(self):
"""Test extraction of contextual insights from grounding metadata."""
insights = self.engine.extract_contextual_insights(self.sample_grounding_metadata)
# Should have all insight categories
expected_categories = [
'confidence_analysis', 'authority_analysis', 'temporal_analysis',
'content_relationships', 'citation_insights', 'search_intent_insights',
'quality_indicators'
]
for category in expected_categories:
assert category in insights
# Test confidence analysis
confidence_analysis = insights['confidence_analysis']
assert 'average_confidence' in confidence_analysis
assert 'high_confidence_count' in confidence_analysis
assert confidence_analysis['average_confidence'] > 0.0
# Test authority analysis
authority_analysis = insights['authority_analysis']
assert 'average_authority' in authority_analysis
assert 'high_authority_sources' in authority_analysis
assert 'authority_distribution' in authority_analysis
def test_extract_contextual_insights_empty_metadata(self):
"""Test extraction with empty grounding metadata."""
insights = self.engine.extract_contextual_insights(None)
# Should return empty insights structure
assert insights['confidence_analysis']['average_confidence'] == 0.0
assert insights['authority_analysis']['high_authority_sources'] == 0
assert insights['temporal_analysis']['recent_content'] == 0
def test_analyze_confidence_patterns(self):
"""Test confidence pattern analysis."""
confidence_analysis = self.engine._analyze_confidence_patterns(self.sample_grounding_metadata)
assert 'average_confidence' in confidence_analysis
assert 'high_confidence_count' in confidence_analysis
assert 'confidence_distribution' in confidence_analysis
# Should have reasonable confidence values
assert 0.0 <= confidence_analysis['average_confidence'] <= 1.0
assert confidence_analysis['high_confidence_count'] >= 0
def test_analyze_source_authority(self):
"""Test source authority analysis."""
authority_analysis = self.engine._analyze_source_authority(self.sample_grounding_metadata)
assert 'average_authority' in authority_analysis
assert 'high_authority_sources' in authority_analysis
assert 'authority_distribution' in authority_analysis
# Should have reasonable authority values
assert 0.0 <= authority_analysis['average_authority'] <= 1.0
assert authority_analysis['high_authority_sources'] >= 0
def test_analyze_temporal_relevance(self):
"""Test temporal relevance analysis."""
temporal_analysis = self.engine._analyze_temporal_relevance(self.sample_grounding_metadata)
assert 'recent_content' in temporal_analysis
assert 'trending_topics' in temporal_analysis
assert 'evergreen_content' in temporal_analysis
assert 'temporal_balance' in temporal_analysis
# Should have reasonable temporal values
assert temporal_analysis['recent_content'] >= 0
assert temporal_analysis['evergreen_content'] >= 0
assert temporal_analysis['temporal_balance'] in ['recent_heavy', 'evergreen_heavy', 'balanced', 'unknown']
def test_analyze_content_relationships(self):
"""Test content relationship analysis."""
relationships = self.engine._analyze_content_relationships(self.sample_grounding_metadata)
assert 'related_concepts' in relationships
assert 'content_gaps' in relationships
assert 'concept_coverage' in relationships
assert 'gap_count' in relationships
# Should have reasonable relationship values
assert isinstance(relationships['related_concepts'], list)
assert isinstance(relationships['content_gaps'], list)
assert relationships['concept_coverage'] >= 0
assert relationships['gap_count'] >= 0
def test_analyze_citation_patterns(self):
"""Test citation pattern analysis."""
citation_analysis = self.engine._analyze_citation_patterns(self.sample_grounding_metadata)
assert 'citation_types' in citation_analysis
assert 'total_citations' in citation_analysis
assert 'citation_density' in citation_analysis
assert 'citation_quality' in citation_analysis
# Should have reasonable citation values
assert citation_analysis['total_citations'] == len(self.sample_citations)
assert citation_analysis['citation_density'] >= 0.0
assert 0.0 <= citation_analysis['citation_quality'] <= 1.0
def test_analyze_search_intent(self):
"""Test search intent analysis."""
intent_analysis = self.engine._analyze_search_intent(self.sample_grounding_metadata)
assert 'intent_signals' in intent_analysis
assert 'user_questions' in intent_analysis
assert 'primary_intent' in intent_analysis
# Should have reasonable intent values
assert isinstance(intent_analysis['intent_signals'], list)
assert isinstance(intent_analysis['user_questions'], list)
assert intent_analysis['primary_intent'] in ['informational', 'comparison', 'transactional']
def test_assess_quality_indicators(self):
"""Test quality indicator assessment."""
quality_indicators = self.engine._assess_quality_indicators(self.sample_grounding_metadata)
assert 'overall_quality' in quality_indicators
assert 'quality_factors' in quality_indicators
assert 'quality_grade' in quality_indicators
# Should have reasonable quality values
assert 0.0 <= quality_indicators['overall_quality'] <= 1.0
assert isinstance(quality_indicators['quality_factors'], list)
assert quality_indicators['quality_grade'] in ['A', 'B', 'C', 'D', 'F']
def test_calculate_chunk_authority(self):
"""Test chunk authority calculation."""
# Test high authority chunk
high_authority_chunk = self.sample_chunks[0] # Research study
authority_score = self.engine._calculate_chunk_authority(high_authority_chunk)
assert 0.0 <= authority_score <= 1.0
assert authority_score > 0.5 # Should be high authority
# Test low authority chunk
low_authority_chunk = self.sample_chunks[4] # Personal opinion
authority_score = self.engine._calculate_chunk_authority(low_authority_chunk)
assert 0.0 <= authority_score <= 1.0
assert authority_score < 0.7 # Should be lower authority
def test_get_authority_sources(self):
"""Test getting high-authority sources."""
authority_sources = self.engine.get_authority_sources(self.sample_grounding_metadata)
# Should return list of tuples
assert isinstance(authority_sources, list)
# Each item should be (chunk, score) tuple
for chunk, score in authority_sources:
assert isinstance(chunk, GroundingChunk)
assert isinstance(score, float)
assert 0.0 <= score <= 1.0
# Should be sorted by authority score (descending)
if len(authority_sources) > 1:
for i in range(len(authority_sources) - 1):
assert authority_sources[i][1] >= authority_sources[i + 1][1]
def test_get_high_confidence_insights(self):
"""Test getting high-confidence insights."""
insights = self.engine.get_high_confidence_insights(self.sample_grounding_metadata)
# Should return list of insights
assert isinstance(insights, list)
# Each insight should be a string
for insight in insights:
assert isinstance(insight, str)
assert len(insight) > 0
def test_enhance_sections_with_grounding(self):
"""Test section enhancement with grounding insights."""
sections = [self.sample_section]
insights = self.engine.extract_contextual_insights(self.sample_grounding_metadata)
enhanced_sections = self.engine.enhance_sections_with_grounding(
sections, self.sample_grounding_metadata, insights
)
# Should return same number of sections
assert len(enhanced_sections) == len(sections)
# Enhanced section should have same basic structure
enhanced_section = enhanced_sections[0]
assert enhanced_section.id == self.sample_section.id
assert enhanced_section.heading == self.sample_section.heading
# Should have enhanced content
assert len(enhanced_section.subheadings) >= len(self.sample_section.subheadings)
assert len(enhanced_section.key_points) >= len(self.sample_section.key_points)
assert len(enhanced_section.keywords) >= len(self.sample_section.keywords)
def test_enhance_sections_with_empty_grounding(self):
"""Test section enhancement with empty grounding metadata."""
sections = [self.sample_section]
enhanced_sections = self.engine.enhance_sections_with_grounding(
sections, None, {}
)
# Should return original sections unchanged
assert len(enhanced_sections) == len(sections)
assert enhanced_sections[0].subheadings == self.sample_section.subheadings
assert enhanced_sections[0].key_points == self.sample_section.key_points
assert enhanced_sections[0].keywords == self.sample_section.keywords
def test_find_relevant_chunks(self):
"""Test finding relevant chunks for a section."""
relevant_chunks = self.engine._find_relevant_chunks(
self.sample_section, self.sample_grounding_metadata
)
# Should return list of relevant chunks
assert isinstance(relevant_chunks, list)
# Each chunk should be a GroundingChunk
for chunk in relevant_chunks:
assert isinstance(chunk, GroundingChunk)
def test_find_relevant_supports(self):
"""Test finding relevant supports for a section."""
relevant_supports = self.engine._find_relevant_supports(
self.sample_section, self.sample_grounding_metadata
)
# Should return list of relevant supports
assert isinstance(relevant_supports, list)
# Each support should be a GroundingSupport
for support in relevant_supports:
assert isinstance(support, GroundingSupport)
def test_extract_insight_from_segment(self):
"""Test insight extraction from segment text."""
# Test with valid segment
segment = "This is a comprehensive analysis of AI trends in enterprise applications."
insight = self.engine._extract_insight_from_segment(segment)
assert insight == segment
# Test with short segment
short_segment = "Short"
insight = self.engine._extract_insight_from_segment(short_segment)
assert insight is None
# Test with long segment
long_segment = "This is a very long segment that exceeds the maximum length limit and should be truncated appropriately to ensure it fits within the expected constraints and provides comprehensive coverage of the topic while maintaining readability and clarity for the intended audience."
insight = self.engine._extract_insight_from_segment(long_segment)
assert insight is not None
assert len(insight) <= 203 # 200 + "..."
assert insight.endswith("...")
def test_get_confidence_distribution(self):
"""Test confidence distribution calculation."""
confidences = [0.95, 0.88, 0.82, 0.90, 0.65]
distribution = self.engine._get_confidence_distribution(confidences)
assert 'high' in distribution
assert 'medium' in distribution
assert 'low' in distribution
# Should have reasonable distribution
total = distribution['high'] + distribution['medium'] + distribution['low']
assert total == len(confidences)
def test_calculate_temporal_balance(self):
"""Test temporal balance calculation."""
# Test recent heavy
balance = self.engine._calculate_temporal_balance(8, 2)
assert balance == 'recent_heavy'
# Test evergreen heavy
balance = self.engine._calculate_temporal_balance(2, 8)
assert balance == 'evergreen_heavy'
# Test balanced
balance = self.engine._calculate_temporal_balance(5, 5)
assert balance == 'balanced'
# Test empty
balance = self.engine._calculate_temporal_balance(0, 0)
assert balance == 'unknown'
def test_extract_related_concepts(self):
"""Test related concept extraction."""
text_list = [
"Artificial Intelligence is transforming Machine Learning applications",
"Deep Learning algorithms are improving Neural Network performance",
"Natural Language Processing is advancing AI capabilities"
]
concepts = self.engine._extract_related_concepts(text_list)
# Should extract capitalized concepts
assert isinstance(concepts, list)
assert len(concepts) > 0
# Should contain expected concepts
expected_concepts = ['Artificial', 'Intelligence', 'Machine', 'Learning', 'Deep', 'Neural', 'Network']
for concept in expected_concepts:
assert concept in concepts
def test_identify_content_gaps(self):
"""Test content gap identification."""
text_list = [
"The research shows significant improvements in AI applications",
"However, there is a lack of comprehensive studies on AI ethics",
"The gap in understanding AI bias remains unexplored",
"Current research does not cover all aspects of AI implementation"
]
gaps = self.engine._identify_content_gaps(text_list)
# Should identify gaps
assert isinstance(gaps, list)
assert len(gaps) > 0
def test_assess_citation_quality(self):
"""Test citation quality assessment."""
quality = self.engine._assess_citation_quality(self.sample_citations)
# Should have reasonable quality score
assert 0.0 <= quality <= 1.0
assert quality > 0.0 # Should have some quality
def test_determine_primary_intent(self):
"""Test primary intent determination."""
# Test informational intent
intent = self.engine._determine_primary_intent(['informational', 'informational', 'comparison'])
assert intent == 'informational'
# Test empty signals
intent = self.engine._determine_primary_intent([])
assert intent == 'informational'
def test_get_quality_grade(self):
"""Test quality grade calculation."""
# Test A grade
grade = self.engine._get_quality_grade(0.95)
assert grade == 'A'
# Test B grade
grade = self.engine._get_quality_grade(0.85)
assert grade == 'B'
# Test C grade
grade = self.engine._get_quality_grade(0.75)
assert grade == 'C'
# Test D grade
grade = self.engine._get_quality_grade(0.65)
assert grade == 'D'
# Test F grade
grade = self.engine._get_quality_grade(0.45)
assert grade == 'F'
if __name__ == '__main__':
pytest.main([__file__])

View File

@@ -0,0 +1,271 @@
#!/usr/bin/env python3
"""
Test Script for LinkedIn Content Generation Keyword Fix
This script tests the fixed keyword processing by calling the LinkedIn content generation
endpoint directly and capturing detailed logs to analyze API usage patterns.
"""
import asyncio
import json
import time
import logging
from datetime import datetime
from typing import Dict, Any
import sys
import os
# Add the backend directory to the Python path
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
# Configure detailed logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler(f'test_linkedin_keyword_fix_{datetime.now().strftime("%Y%m%d_%H%M%S")}.log'),
logging.StreamHandler(sys.stdout)
]
)
logger = logging.getLogger(__name__)
# Import the LinkedIn service
from services.linkedin_service import LinkedInService
from models.linkedin_models import LinkedInPostRequest, LinkedInPostType, LinkedInTone, GroundingLevel, SearchEngine
class LinkedInKeywordTest:
"""Test class for LinkedIn keyword processing fix."""
def __init__(self):
self.linkedin_service = LinkedInService()
self.test_results = []
self.api_call_count = 0
self.start_time = None
def log_api_call(self, endpoint: str, duration: float, success: bool):
"""Log API call details."""
self.api_call_count += 1
logger.info(f"API Call #{self.api_call_count}: {endpoint} - Duration: {duration:.2f}s - Success: {success}")
async def test_keyword_phrase(self, phrase: str, test_name: str) -> Dict[str, Any]:
"""Test a specific keyword phrase."""
logger.info(f"\n{'='*60}")
logger.info(f"TESTING: {test_name}")
logger.info(f"KEYWORD PHRASE: '{phrase}'")
logger.info(f"{'='*60}")
test_start = time.time()
try:
# Create the request
request = LinkedInPostRequest(
topic=phrase,
industry="Technology",
post_type=LinkedInPostType.PROFESSIONAL,
tone=LinkedInTone.PROFESSIONAL,
grounding_level=GroundingLevel.ENHANCED,
search_engine=SearchEngine.GOOGLE,
research_enabled=True,
include_citations=True,
max_length=1000
)
logger.info(f"Request created: {request.topic}")
logger.info(f"Research enabled: {request.research_enabled}")
logger.info(f"Search engine: {request.search_engine}")
logger.info(f"Grounding level: {request.grounding_level}")
# Call the LinkedIn service
logger.info("Calling LinkedIn service...")
response = await self.linkedin_service.generate_linkedin_post(request)
test_duration = time.time() - test_start
self.log_api_call("LinkedIn Post Generation", test_duration, response.success)
# Analyze the response
result = {
"test_name": test_name,
"keyword_phrase": phrase,
"success": response.success,
"duration": test_duration,
"api_calls": self.api_call_count,
"error": response.error if not response.success else None,
"content_length": len(response.data.content) if response.success and response.data else 0,
"sources_count": len(response.research_sources) if response.success and response.research_sources else 0,
"citations_count": len(response.data.citations) if response.success and response.data and response.data.citations else 0,
"grounding_status": response.grounding_status if response.success else None,
"generation_metadata": response.generation_metadata if response.success else None
}
if response.success:
logger.info(f"✅ SUCCESS: Generated {result['content_length']} characters")
logger.info(f"📊 Sources: {result['sources_count']}, Citations: {result['citations_count']}")
logger.info(f"⏱️ Total duration: {test_duration:.2f}s")
logger.info(f"🔢 API calls made: {self.api_call_count}")
# Log content preview
if response.data and response.data.content:
content_preview = response.data.content[:200] + "..." if len(response.data.content) > 200 else response.data.content
logger.info(f"📝 Content preview: {content_preview}")
# Log grounding status
if response.grounding_status:
logger.info(f"🔍 Grounding status: {response.grounding_status}")
else:
logger.error(f"❌ FAILED: {response.error}")
return result
except Exception as e:
test_duration = time.time() - test_start
logger.error(f"❌ EXCEPTION in {test_name}: {str(e)}")
self.log_api_call("LinkedIn Post Generation", test_duration, False)
return {
"test_name": test_name,
"keyword_phrase": phrase,
"success": False,
"duration": test_duration,
"api_calls": self.api_call_count,
"error": str(e),
"content_length": 0,
"sources_count": 0,
"citations_count": 0,
"grounding_status": None,
"generation_metadata": None
}
async def run_comprehensive_test(self):
"""Run comprehensive tests for keyword processing."""
logger.info("🚀 Starting LinkedIn Keyword Processing Test Suite")
logger.info(f"Test started at: {datetime.now()}")
self.start_time = time.time()
# Test cases
test_cases = [
{
"phrase": "ALwrity content generation",
"name": "Single Phrase Test (Should be preserved as-is)"
},
{
"phrase": "AI tools, content creation, marketing automation",
"name": "Comma-Separated Test (Should be split by commas)"
},
{
"phrase": "LinkedIn content strategy",
"name": "Another Single Phrase Test"
},
{
"phrase": "social media, digital marketing, brand awareness",
"name": "Another Comma-Separated Test"
}
]
# Run all tests
for test_case in test_cases:
result = await self.test_keyword_phrase(
test_case["phrase"],
test_case["name"]
)
self.test_results.append(result)
# Reset API call counter for next test
self.api_call_count = 0
# Small delay between tests
await asyncio.sleep(2)
# Generate summary report
self.generate_summary_report()
def generate_summary_report(self):
"""Generate a comprehensive summary report."""
total_time = time.time() - self.start_time
logger.info(f"\n{'='*80}")
logger.info("📊 COMPREHENSIVE TEST SUMMARY REPORT")
logger.info(f"{'='*80}")
logger.info(f"🕐 Total test duration: {total_time:.2f} seconds")
logger.info(f"🧪 Total tests run: {len(self.test_results)}")
successful_tests = [r for r in self.test_results if r["success"]]
failed_tests = [r for r in self.test_results if not r["success"]]
logger.info(f"✅ Successful tests: {len(successful_tests)}")
logger.info(f"❌ Failed tests: {len(failed_tests)}")
if successful_tests:
avg_duration = sum(r["duration"] for r in successful_tests) / len(successful_tests)
avg_content_length = sum(r["content_length"] for r in successful_tests) / len(successful_tests)
avg_sources = sum(r["sources_count"] for r in successful_tests) / len(successful_tests)
avg_citations = sum(r["citations_count"] for r in successful_tests) / len(successful_tests)
logger.info(f"📈 Average generation time: {avg_duration:.2f}s")
logger.info(f"📝 Average content length: {avg_content_length:.0f} characters")
logger.info(f"🔍 Average sources found: {avg_sources:.1f}")
logger.info(f"📚 Average citations: {avg_citations:.1f}")
# Detailed results
logger.info(f"\n📋 DETAILED TEST RESULTS:")
for i, result in enumerate(self.test_results, 1):
status = "✅ PASS" if result["success"] else "❌ FAIL"
logger.info(f"{i}. {status} - {result['test_name']}")
logger.info(f" Phrase: '{result['keyword_phrase']}'")
logger.info(f" Duration: {result['duration']:.2f}s")
if result["success"]:
logger.info(f" Content: {result['content_length']} chars, Sources: {result['sources_count']}, Citations: {result['citations_count']}")
else:
logger.info(f" Error: {result['error']}")
# API Usage Analysis
logger.info(f"\n🔍 API USAGE ANALYSIS:")
total_api_calls = sum(r["api_calls"] for r in self.test_results)
logger.info(f"Total API calls across all tests: {total_api_calls}")
if successful_tests:
avg_api_calls = sum(r["api_calls"] for r in successful_tests) / len(successful_tests)
logger.info(f"Average API calls per successful test: {avg_api_calls:.1f}")
# Save detailed results to JSON file
report_data = {
"test_summary": {
"total_duration": total_time,
"total_tests": len(self.test_results),
"successful_tests": len(successful_tests),
"failed_tests": len(failed_tests),
"total_api_calls": total_api_calls
},
"test_results": self.test_results,
"timestamp": datetime.now().isoformat()
}
report_filename = f"linkedin_keyword_test_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
with open(report_filename, 'w') as f:
json.dump(report_data, f, indent=2, default=str)
logger.info(f"📄 Detailed report saved to: {report_filename}")
logger.info(f"{'='*80}")
async def main():
"""Main test execution function."""
try:
test_suite = LinkedInKeywordTest()
await test_suite.run_comprehensive_test()
except Exception as e:
logger.error(f"❌ Test suite failed: {str(e)}")
raise
if __name__ == "__main__":
print("🚀 Starting LinkedIn Keyword Processing Test Suite")
print("This will test the keyword fix and analyze API usage patterns...")
print("=" * 60)
asyncio.run(main())

View File

@@ -0,0 +1,366 @@
"""
Unit tests for ResearchDataFilter.
Tests the filtering and cleaning functionality for research data.
"""
import pytest
from datetime import datetime, timedelta
from typing import List
from models.blog_models import (
BlogResearchResponse,
ResearchSource,
GroundingMetadata,
GroundingChunk,
GroundingSupport,
Citation,
)
from services.blog_writer.research.data_filter import ResearchDataFilter
class TestResearchDataFilter:
"""Test cases for ResearchDataFilter."""
def setup_method(self):
"""Set up test fixtures."""
self.filter = ResearchDataFilter()
# Create sample research sources
self.sample_sources = [
ResearchSource(
title="High Quality AI Article",
url="https://example.com/ai-article",
excerpt="This is a comprehensive article about artificial intelligence trends in 2024 with detailed analysis and expert insights.",
credibility_score=0.95,
published_at="2025-08-15",
index=0,
source_type="web"
),
ResearchSource(
title="Low Quality Source",
url="https://example.com/low-quality",
excerpt="This is a low quality source with very poor credibility score and outdated information from 2020.",
credibility_score=0.3,
published_at="2020-01-01",
index=1,
source_type="web"
),
ResearchSource(
title="PDF Document",
url="https://example.com/document.pdf",
excerpt="This is a PDF document with research data",
credibility_score=0.8,
published_at="2025-08-01",
index=2,
source_type="web"
),
ResearchSource(
title="Recent AI Study",
url="https://example.com/ai-study",
excerpt="A recent study on AI adoption shows significant growth in enterprise usage with detailed statistics and case studies.",
credibility_score=0.9,
published_at="2025-09-01",
index=3,
source_type="web"
)
]
# Create sample grounding metadata
self.sample_grounding_metadata = GroundingMetadata(
grounding_chunks=[
GroundingChunk(
title="High Confidence Chunk",
url="https://example.com/chunk1",
confidence_score=0.95
),
GroundingChunk(
title="Low Confidence Chunk",
url="https://example.com/chunk2",
confidence_score=0.5
),
GroundingChunk(
title="Medium Confidence Chunk",
url="https://example.com/chunk3",
confidence_score=0.8
)
],
grounding_supports=[
GroundingSupport(
confidence_scores=[0.9, 0.85],
grounding_chunk_indices=[0, 1],
segment_text="High confidence support text with expert insights"
),
GroundingSupport(
confidence_scores=[0.4, 0.3],
grounding_chunk_indices=[2, 3],
segment_text="Low confidence support text"
)
],
citations=[
Citation(
citation_type="expert_opinion",
start_index=0,
end_index=50,
text="Expert opinion on AI trends",
source_indices=[0],
reference="Source 1"
),
Citation(
citation_type="statistical_data",
start_index=51,
end_index=100,
text="Statistical data showing AI adoption rates",
source_indices=[1],
reference="Source 2"
),
Citation(
citation_type="inline",
start_index=101,
end_index=150,
text="Generic inline citation",
source_indices=[2],
reference="Source 3"
)
]
)
# Create sample research response
self.sample_research_response = BlogResearchResponse(
success=True,
sources=self.sample_sources,
keyword_analysis={
'primary': ['artificial intelligence', 'AI trends', 'machine learning'],
'secondary': ['AI adoption', 'enterprise AI', 'AI technology'],
'long_tail': ['AI trends 2024', 'enterprise AI adoption rates', 'AI technology benefits'],
'semantic_keywords': ['artificial intelligence', 'machine learning', 'deep learning'],
'trending_terms': ['AI 2024', 'generative AI', 'AI automation'],
'content_gaps': [
'AI ethics in small businesses',
'AI implementation guide for startups',
'AI cost-benefit analysis for SMEs',
'general overview', # Should be filtered out
'basics' # Should be filtered out
],
'search_intent': 'informational',
'difficulty': 7
},
competitor_analysis={
'top_competitors': ['Competitor A', 'Competitor B', 'Competitor C'],
'opportunities': ['Market gap 1', 'Market gap 2'],
'competitive_advantages': ['Advantage 1', 'Advantage 2'],
'market_positioning': 'Premium positioning'
},
suggested_angles=[
'AI trends in 2024',
'Enterprise AI adoption',
'AI implementation strategies'
],
search_widget="<div>Search widget HTML</div>",
search_queries=["AI trends 2024", "enterprise AI adoption"],
grounding_metadata=self.sample_grounding_metadata
)
def test_filter_sources_quality_filtering(self):
"""Test that sources are filtered by quality criteria."""
filtered_sources = self.filter.filter_sources(self.sample_sources)
# Should filter out low quality source (credibility < 0.6) and PDF document
assert len(filtered_sources) == 2 # Only high quality and recent AI study should pass
assert all(source.credibility_score >= 0.6 for source in filtered_sources)
# Should filter out sources with short excerpts
assert all(len(source.excerpt) >= 50 for source in filtered_sources)
def test_filter_sources_relevance_filtering(self):
"""Test that irrelevant sources are filtered out."""
filtered_sources = self.filter.filter_sources(self.sample_sources)
# Should filter out PDF document
pdf_sources = [s for s in filtered_sources if s.url.endswith('.pdf')]
assert len(pdf_sources) == 0
def test_filter_sources_recency_filtering(self):
"""Test that old sources are filtered out."""
filtered_sources = self.filter.filter_sources(self.sample_sources)
# Should filter out old source (2020)
old_sources = [s for s in filtered_sources if s.published_at == "2020-01-01"]
assert len(old_sources) == 0
def test_filter_sources_max_limit(self):
"""Test that sources are limited to max_sources."""
# Create more sources than max_sources
many_sources = self.sample_sources * 5 # 20 sources
filtered_sources = self.filter.filter_sources(many_sources)
assert len(filtered_sources) <= self.filter.max_sources
def test_filter_grounding_metadata_confidence_filtering(self):
"""Test that grounding metadata is filtered by confidence."""
filtered_metadata = self.filter.filter_grounding_metadata(self.sample_grounding_metadata)
assert filtered_metadata is not None
# Should filter out low confidence chunks
assert len(filtered_metadata.grounding_chunks) == 2
assert all(chunk.confidence_score >= 0.7 for chunk in filtered_metadata.grounding_chunks)
# Should filter out low confidence supports
assert len(filtered_metadata.grounding_supports) == 1
assert all(max(support.confidence_scores) >= 0.7 for support in filtered_metadata.grounding_supports)
# Should filter out irrelevant citations
assert len(filtered_metadata.citations) == 2
relevant_types = ['expert_opinion', 'statistical_data', 'recent_news', 'research_study']
assert all(citation.citation_type in relevant_types for citation in filtered_metadata.citations)
def test_clean_keyword_analysis(self):
"""Test that keyword analysis is cleaned and deduplicated."""
keyword_analysis = {
'primary': ['AI', 'artificial intelligence', 'AI', 'machine learning', ''],
'secondary': ['AI adoption', 'enterprise AI', 'ai adoption'], # Case duplicates
'long_tail': ['AI trends 2024', 'ai trends 2024', 'AI TRENDS 2024'], # Case duplicates
'search_intent': 'informational',
'difficulty': 7
}
cleaned_analysis = self.filter.clean_keyword_analysis(keyword_analysis)
# Should remove duplicates and empty strings (keywords are converted to lowercase)
assert len(cleaned_analysis['primary']) == 3
assert 'ai' in cleaned_analysis['primary']
assert 'artificial intelligence' in cleaned_analysis['primary']
assert 'machine learning' in cleaned_analysis['primary']
# Should handle case-insensitive deduplication
assert len(cleaned_analysis['secondary']) == 2
assert len(cleaned_analysis['long_tail']) == 1
# Should preserve other fields
assert cleaned_analysis['search_intent'] == 'informational'
assert cleaned_analysis['difficulty'] == 7
def test_filter_content_gaps(self):
"""Test that content gaps are filtered for quality and relevance."""
content_gaps = [
'AI ethics in small businesses',
'AI implementation guide for startups',
'general overview', # Should be filtered out
'basics', # Should be filtered out
'a', # Too short, should be filtered out
'AI cost-benefit analysis for SMEs'
]
filtered_gaps = self.filter.filter_content_gaps(content_gaps, self.sample_research_response)
# Should filter out generic and short gaps
assert len(filtered_gaps) >= 3 # At least the good ones should pass
assert 'AI ethics in small businesses' in filtered_gaps
assert 'AI implementation guide for startups' in filtered_gaps
assert 'AI cost-benefit analysis for SMEs' in filtered_gaps
assert 'general overview' not in filtered_gaps
assert 'basics' not in filtered_gaps
def test_filter_research_data_integration(self):
"""Test the complete filtering pipeline."""
filtered_research = self.filter.filter_research_data(self.sample_research_response)
# Should maintain success status
assert filtered_research.success is True
# Should filter sources
assert len(filtered_research.sources) < len(self.sample_research_response.sources)
assert len(filtered_research.sources) >= 0 # May be 0 if all sources are filtered out
# Should filter grounding metadata
if filtered_research.grounding_metadata:
assert len(filtered_research.grounding_metadata.grounding_chunks) < len(self.sample_grounding_metadata.grounding_chunks)
# Should clean keyword analysis
assert 'primary' in filtered_research.keyword_analysis
assert len(filtered_research.keyword_analysis['primary']) <= self.filter.max_keywords_per_category
# Should filter content gaps
assert len(filtered_research.keyword_analysis['content_gaps']) < len(self.sample_research_response.keyword_analysis['content_gaps'])
# Should preserve other fields
assert filtered_research.suggested_angles == self.sample_research_response.suggested_angles
assert filtered_research.search_widget == self.sample_research_response.search_widget
assert filtered_research.search_queries == self.sample_research_response.search_queries
def test_filter_with_empty_data(self):
"""Test filtering with empty or None data."""
empty_research = BlogResearchResponse(
success=True,
sources=[],
keyword_analysis={},
competitor_analysis={},
suggested_angles=[],
search_widget="",
search_queries=[],
grounding_metadata=None
)
filtered_research = self.filter.filter_research_data(empty_research)
assert filtered_research.success is True
assert len(filtered_research.sources) == 0
assert filtered_research.grounding_metadata is None
# keyword_analysis may contain content_gaps even if empty
assert 'content_gaps' in filtered_research.keyword_analysis
def test_parse_date_functionality(self):
"""Test date parsing functionality."""
# Test various date formats
test_dates = [
"2024-01-15",
"2024-01-15T10:30:00",
"2024-01-15T10:30:00Z",
"January 15, 2024",
"Jan 15, 2024",
"15 January 2024",
"01/15/2024",
"15/01/2024"
]
for date_str in test_dates:
parsed_date = self.filter._parse_date(date_str)
assert parsed_date is not None
assert isinstance(parsed_date, datetime)
# Test invalid date
invalid_date = self.filter._parse_date("invalid date")
assert invalid_date is None
# Test None date
none_date = self.filter._parse_date(None)
assert none_date is None
def test_clean_keyword_list_functionality(self):
"""Test keyword list cleaning functionality."""
keywords = [
'AI',
'artificial intelligence',
'AI', # Duplicate
'the', # Stop word
'machine learning',
'', # Empty
' ', # Whitespace only
'MACHINE LEARNING', # Case duplicate
'ai' # Case duplicate
]
cleaned_keywords = self.filter._clean_keyword_list(keywords)
# Should remove duplicates, stop words, and empty strings
assert len(cleaned_keywords) == 3
assert 'ai' in cleaned_keywords
assert 'artificial intelligence' in cleaned_keywords
assert 'machine learning' in cleaned_keywords
assert 'the' not in cleaned_keywords
assert '' not in cleaned_keywords
if __name__ == '__main__':
pytest.main([__file__])

View File

@@ -0,0 +1,515 @@
"""
Unit tests for SourceToSectionMapper.
Tests the intelligent source-to-section mapping functionality.
"""
import pytest
from typing import List
from models.blog_models import (
BlogOutlineSection,
ResearchSource,
BlogResearchResponse,
GroundingMetadata,
)
from services.blog_writer.outline.source_mapper import SourceToSectionMapper
class TestSourceToSectionMapper:
"""Test cases for SourceToSectionMapper."""
def setup_method(self):
"""Set up test fixtures."""
self.mapper = SourceToSectionMapper()
# Create sample research sources
self.sample_sources = [
ResearchSource(
title="AI Trends in 2025: Machine Learning Revolution",
url="https://example.com/ai-trends-2025",
excerpt="Comprehensive analysis of artificial intelligence trends in 2025, focusing on machine learning advancements, deep learning breakthroughs, and AI automation in enterprise environments.",
credibility_score=0.95,
published_at="2025-08-15",
index=0,
source_type="web"
),
ResearchSource(
title="Enterprise AI Implementation Guide",
url="https://example.com/enterprise-ai-guide",
excerpt="Step-by-step guide for implementing artificial intelligence solutions in enterprise environments, including best practices, challenges, and success stories from leading companies.",
credibility_score=0.9,
published_at="2025-08-01",
index=1,
source_type="web"
),
ResearchSource(
title="Machine Learning Algorithms Explained",
url="https://example.com/ml-algorithms",
excerpt="Detailed explanation of various machine learning algorithms including supervised learning, unsupervised learning, and reinforcement learning techniques with practical examples.",
credibility_score=0.85,
published_at="2025-07-20",
index=2,
source_type="web"
),
ResearchSource(
title="AI Ethics and Responsible Development",
url="https://example.com/ai-ethics",
excerpt="Discussion of ethical considerations in artificial intelligence development, including bias mitigation, transparency, and responsible AI practices for developers and organizations.",
credibility_score=0.88,
published_at="2025-07-10",
index=3,
source_type="web"
),
ResearchSource(
title="Deep Learning Neural Networks Tutorial",
url="https://example.com/deep-learning-tutorial",
excerpt="Comprehensive tutorial on deep learning neural networks, covering convolutional neural networks, recurrent neural networks, and transformer architectures with code examples.",
credibility_score=0.92,
published_at="2025-06-15",
index=4,
source_type="web"
)
]
# Create sample outline sections
self.sample_sections = [
BlogOutlineSection(
id="s1",
heading="Introduction to AI and Machine Learning",
subheadings=["What is AI?", "Types of Machine Learning", "AI Applications"],
key_points=["AI definition and scope", "ML vs traditional programming", "Real-world AI examples"],
references=[],
target_words=300,
keywords=["artificial intelligence", "machine learning", "AI basics", "introduction"]
),
BlogOutlineSection(
id="s2",
heading="Enterprise AI Implementation Strategies",
subheadings=["Planning Phase", "Implementation Steps", "Best Practices"],
key_points=["Strategic planning", "Technology selection", "Change management", "ROI measurement"],
references=[],
target_words=400,
keywords=["enterprise AI", "implementation", "strategies", "business"]
),
BlogOutlineSection(
id="s3",
heading="Machine Learning Algorithms Deep Dive",
subheadings=["Supervised Learning", "Unsupervised Learning", "Deep Learning"],
key_points=["Algorithm types", "Use cases", "Performance metrics", "Model selection"],
references=[],
target_words=500,
keywords=["machine learning algorithms", "supervised learning", "deep learning", "neural networks"]
),
BlogOutlineSection(
id="s4",
heading="AI Ethics and Responsible Development",
subheadings=["Ethical Considerations", "Bias and Fairness", "Transparency"],
key_points=["Ethical frameworks", "Bias detection", "Explainable AI", "Regulatory compliance"],
references=[],
target_words=350,
keywords=["AI ethics", "responsible AI", "bias", "transparency"]
)
]
# Create sample research response
self.sample_research = BlogResearchResponse(
success=True,
sources=self.sample_sources,
keyword_analysis={
'primary': ['artificial intelligence', 'machine learning', 'AI implementation'],
'secondary': ['enterprise AI', 'deep learning', 'AI ethics'],
'long_tail': ['AI trends 2025', 'enterprise AI implementation guide', 'machine learning algorithms explained'],
'semantic_keywords': ['AI', 'ML', 'neural networks', 'automation'],
'trending_terms': ['AI 2025', 'generative AI', 'AI automation'],
'search_intent': 'informational',
'content_gaps': ['AI implementation challenges', 'ML algorithm comparison']
},
competitor_analysis={
'top_competitors': ['TechCorp AI', 'DataScience Inc', 'AI Solutions Ltd'],
'opportunities': ['Enterprise market gap', 'SME AI adoption'],
'competitive_advantages': ['Comprehensive coverage', 'Practical examples']
},
suggested_angles=[
'AI trends in 2025',
'Enterprise AI implementation',
'Machine learning fundamentals',
'AI ethics and responsibility'
],
search_widget="<div>Search widget HTML</div>",
search_queries=["AI trends 2025", "enterprise AI implementation", "machine learning guide"],
grounding_metadata=GroundingMetadata(
grounding_chunks=[],
grounding_supports=[],
citations=[],
search_entry_point="AI trends and implementation",
web_search_queries=["AI trends 2025", "enterprise AI"]
)
)
def test_semantic_similarity_calculation(self):
"""Test semantic similarity calculation between sections and sources."""
section = self.sample_sections[0] # AI Introduction section
source = self.sample_sources[0] # AI Trends source
similarity = self.mapper._calculate_semantic_similarity(section, source)
# Should have high similarity due to AI-related content
assert 0.0 <= similarity <= 1.0
assert similarity > 0.3 # Should be reasonably high for AI-related content
def test_keyword_relevance_calculation(self):
"""Test keyword-based relevance calculation."""
section = self.sample_sections[1] # Enterprise AI section
source = self.sample_sources[1] # Enterprise AI Guide source
relevance = self.mapper._calculate_keyword_relevance(section, source, self.sample_research)
# Should have reasonable relevance due to enterprise AI keywords
assert 0.0 <= relevance <= 1.0
assert relevance > 0.1 # Should be reasonable for matching enterprise AI content
def test_contextual_relevance_calculation(self):
"""Test contextual relevance calculation."""
section = self.sample_sections[2] # ML Algorithms section
source = self.sample_sources[2] # ML Algorithms source
relevance = self.mapper._calculate_contextual_relevance(section, source, self.sample_research)
# Should have high relevance due to matching content angles
assert 0.0 <= relevance <= 1.0
assert relevance > 0.2 # Should be reasonable for matching content
def test_algorithmic_source_mapping(self):
"""Test the complete algorithmic mapping process."""
mapping_results = self.mapper._algorithmic_source_mapping(self.sample_sections, self.sample_research)
# Should have mapping results for all sections
assert len(mapping_results) == len(self.sample_sections)
# Each section should have some mapped sources
for section_id, sources in mapping_results.items():
assert isinstance(sources, list)
# Each source should be a tuple of (source, score)
for source, score in sources:
assert isinstance(source, ResearchSource)
assert isinstance(score, float)
assert 0.0 <= score <= 1.0
def test_source_mapping_quality(self):
"""Test that sources are mapped to relevant sections."""
mapping_results = self.mapper._algorithmic_source_mapping(self.sample_sections, self.sample_research)
# Enterprise AI section should have enterprise AI source
enterprise_section = mapping_results["s2"]
enterprise_source_titles = [source.title for source, score in enterprise_section]
assert any("Enterprise" in title for title in enterprise_source_titles)
# ML Algorithms section should have ML algorithms source
ml_section = mapping_results["s3"]
ml_source_titles = [source.title for source, score in ml_section]
assert any("Machine Learning" in title or "Algorithms" in title for title in ml_source_titles)
# AI Ethics section should have AI ethics source
ethics_section = mapping_results["s4"]
ethics_source_titles = [source.title for source, score in ethics_section]
assert any("Ethics" in title for title in ethics_source_titles)
def test_complete_mapping_pipeline(self):
"""Test the complete mapping pipeline from sections to mapped sections."""
mapped_sections = self.mapper.map_sources_to_sections(self.sample_sections, self.sample_research)
# Should return same number of sections
assert len(mapped_sections) == len(self.sample_sections)
# Each section should have mapped sources
for section in mapped_sections:
assert isinstance(section.references, list)
assert len(section.references) <= self.mapper.max_sources_per_section
# All references should be ResearchSource objects
for source in section.references:
assert isinstance(source, ResearchSource)
def test_mapping_with_empty_sources(self):
"""Test mapping behavior with empty sources list."""
empty_research = BlogResearchResponse(
success=True,
sources=[],
keyword_analysis={},
competitor_analysis={},
suggested_angles=[],
search_widget="",
search_queries=[],
grounding_metadata=None
)
mapped_sections = self.mapper.map_sources_to_sections(self.sample_sections, empty_research)
# Should return sections with empty references
for section in mapped_sections:
assert section.references == []
def test_mapping_with_empty_sections(self):
"""Test mapping behavior with empty sections list."""
mapped_sections = self.mapper.map_sources_to_sections([], self.sample_research)
# Should return empty list
assert mapped_sections == []
def test_meaningful_words_extraction(self):
"""Test extraction of meaningful words from text."""
text = "Artificial Intelligence and Machine Learning are transforming the world of technology and business applications."
words = self.mapper._extract_meaningful_words(text)
# Should extract meaningful words and remove stop words
assert "artificial" in words
assert "intelligence" in words
assert "machine" in words
assert "learning" in words
assert "the" not in words # Stop word should be removed
assert "and" not in words # Stop word should be removed
def test_phrase_similarity_calculation(self):
"""Test phrase similarity calculation."""
text1 = "machine learning algorithms"
text2 = "This article covers machine learning algorithms and their applications"
similarity = self.mapper._calculate_phrase_similarity(text1, text2)
# Should find phrase matches
assert similarity > 0.0
assert similarity <= 0.3 # Should be capped at 0.3
def test_intent_keywords_extraction(self):
"""Test extraction of intent-specific keywords."""
informational_keywords = self.mapper._get_intent_keywords("informational")
transactional_keywords = self.mapper._get_intent_keywords("transactional")
# Should return appropriate keywords for each intent
assert "what" in informational_keywords
assert "how" in informational_keywords
assert "guide" in informational_keywords
assert "buy" in transactional_keywords
assert "purchase" in transactional_keywords
assert "price" in transactional_keywords
def test_mapping_statistics(self):
"""Test mapping statistics calculation."""
mapping_results = self.mapper._algorithmic_source_mapping(self.sample_sections, self.sample_research)
stats = self.mapper.get_mapping_statistics(mapping_results)
# Should have valid statistics
assert stats['total_sections'] == len(self.sample_sections)
assert stats['total_mappings'] > 0
assert stats['sections_with_sources'] > 0
assert 0.0 <= stats['average_score'] <= 1.0
assert 0.0 <= stats['max_score'] <= 1.0
assert 0.0 <= stats['min_score'] <= 1.0
assert 0.0 <= stats['mapping_coverage'] <= 1.0
def test_source_quality_filtering(self):
"""Test that low-quality sources are filtered out."""
# Create a low-quality source
low_quality_source = ResearchSource(
title="Random Article",
url="https://example.com/random",
excerpt="This is a completely unrelated article about cooking recipes and gardening tips.",
credibility_score=0.3,
published_at="2025-08-01",
index=5,
source_type="web"
)
# Add to research data
research_with_low_quality = BlogResearchResponse(
success=True,
sources=self.sample_sources + [low_quality_source],
keyword_analysis=self.sample_research.keyword_analysis,
competitor_analysis=self.sample_research.competitor_analysis,
suggested_angles=self.sample_research.suggested_angles,
search_widget=self.sample_research.search_widget,
search_queries=self.sample_research.search_queries,
grounding_metadata=self.sample_research.grounding_metadata
)
mapping_results = self.mapper._algorithmic_source_mapping(self.sample_sections, research_with_low_quality)
# Low-quality source should not be mapped to any section
all_mapped_sources = []
for sources in mapping_results.values():
all_mapped_sources.extend([source for source, score in sources])
assert low_quality_source not in all_mapped_sources
def test_max_sources_per_section_limit(self):
"""Test that the maximum sources per section limit is enforced."""
# Create many sources
many_sources = self.sample_sources * 3 # 15 sources
research_with_many_sources = BlogResearchResponse(
success=True,
sources=many_sources,
keyword_analysis=self.sample_research.keyword_analysis,
competitor_analysis=self.sample_research.competitor_analysis,
suggested_angles=self.sample_research.suggested_angles,
search_widget=self.sample_research.search_widget,
search_queries=self.sample_research.search_queries,
grounding_metadata=self.sample_research.grounding_metadata
)
mapping_results = self.mapper._algorithmic_source_mapping(self.sample_sections, research_with_many_sources)
# Each section should have at most max_sources_per_section sources
for section_id, sources in mapping_results.items():
assert len(sources) <= self.mapper.max_sources_per_section
def test_ai_validation_prompt_building(self):
"""Test AI validation prompt building."""
mapping_results = self.mapper._algorithmic_source_mapping(self.sample_sections, self.sample_research)
prompt = self.mapper._build_validation_prompt(mapping_results, self.sample_research)
# Should contain key elements
assert "expert content strategist" in prompt
assert "Research Topic:" in prompt
assert "ALGORITHMIC MAPPING RESULTS" in prompt
assert "AVAILABLE SOURCES" in prompt
assert "VALIDATION TASK" in prompt
assert "RESPONSE FORMAT" in prompt
assert "overall_quality_score" in prompt
assert "section_improvements" in prompt
def test_ai_validation_response_parsing(self):
"""Test AI validation response parsing."""
# Mock AI response
mock_response = """
Here's my analysis of the source-to-section mapping:
```json
{
"overall_quality_score": 8,
"section_improvements": [
{
"section_id": "s1",
"current_sources": ["AI Trends in 2025: Machine Learning Revolution"],
"recommended_sources": ["AI Trends in 2025: Machine Learning Revolution", "Machine Learning Algorithms Explained"],
"reasoning": "Adding ML algorithms source provides more technical depth",
"confidence": 0.9
}
],
"summary": "Good mapping overall, minor improvements suggested"
}
```
"""
original_mapping = self.mapper._algorithmic_source_mapping(self.sample_sections, self.sample_research)
parsed_mapping = self.mapper._parse_validation_response(mock_response, original_mapping, self.sample_research)
# Should have improved mapping
assert "s1" in parsed_mapping
assert len(parsed_mapping["s1"]) > 0
# Should maintain other sections
assert len(parsed_mapping) == len(original_mapping)
def test_ai_validation_fallback_handling(self):
"""Test AI validation fallback when parsing fails."""
# Mock invalid AI response
invalid_response = "This is not a valid JSON response"
original_mapping = self.mapper._algorithmic_source_mapping(self.sample_sections, self.sample_research)
parsed_mapping = self.mapper._parse_validation_response(invalid_response, original_mapping, self.sample_research)
# Should fallback to original mapping
assert parsed_mapping == original_mapping
def test_ai_validation_with_missing_sources(self):
"""Test AI validation when recommended sources don't exist."""
# Mock AI response with non-existent source
mock_response = """
```json
{
"overall_quality_score": 7,
"section_improvements": [
{
"section_id": "s1",
"current_sources": ["AI Trends in 2025: Machine Learning Revolution"],
"recommended_sources": ["Non-existent Source", "Another Fake Source"],
"reasoning": "These sources would be better",
"confidence": 0.8
}
],
"summary": "Suggested improvements"
}
```
"""
original_mapping = self.mapper._algorithmic_source_mapping(self.sample_sections, self.sample_research)
parsed_mapping = self.mapper._parse_validation_response(mock_response, original_mapping, self.sample_research)
# Should fallback to original mapping for s1 since no valid sources found
assert parsed_mapping["s1"] == original_mapping["s1"]
def test_ai_validation_integration(self):
"""Test complete AI validation integration (with mocked LLM)."""
# This test would require mocking the LLM provider
# For now, we'll test that the method doesn't crash
mapping_results = self.mapper._algorithmic_source_mapping(self.sample_sections, self.sample_research)
# Test that AI validation method exists and can be called
# (In real implementation, this would call the actual LLM)
try:
# This will fail in test environment due to no LLM, but should not crash
validated_mapping = self.mapper._ai_validate_mapping(mapping_results, self.sample_research)
# If it doesn't crash, it should return the original mapping as fallback
assert validated_mapping == mapping_results
except Exception as e:
# Expected to fail in test environment, but should be handled gracefully
assert "AI validation failed" in str(e) or "Failed to get AI validation response" in str(e)
def test_format_sections_for_prompt(self):
"""Test formatting of sections for AI prompt."""
sections_info = [
{
'id': 's1',
'sources': [
{
'title': 'Test Source 1',
'algorithmic_score': 0.85
}
]
}
]
formatted = self.mapper._format_sections_for_prompt(sections_info)
assert "Section s1:" in formatted
assert "Test Source 1" in formatted
assert "0.85" in formatted
def test_format_sources_for_prompt(self):
"""Test formatting of sources for AI prompt."""
sources = [
{
'title': 'Test Source',
'url': 'https://example.com',
'credibility_score': 0.9,
'excerpt': 'This is a test excerpt for the source.'
}
]
formatted = self.mapper._format_sources_for_prompt(sources)
assert "Test Source" in formatted
assert "https://example.com" in formatted
assert "0.9" in formatted
assert "This is a test excerpt" in formatted
if __name__ == '__main__':
pytest.main([__file__])