ALwrity AI Blog Writer - Added Google Grounding UI Implementation

2025-09-18 18:45:53 +05:30
parent 9f13daf443
commit 4d153b292d
72 changed files with 11944 additions and 1526 deletions
--- a/backend/test/BILLING_SYSTEM_INTEGRATION.md
+++ b/backend/test/BILLING_SYSTEM_INTEGRATION.md
@@ -0,0 +1,256 @@
+# ALwrity Billing & Subscription System Integration
+
+## Overview
+
+The ALwrity backend now includes a comprehensive billing and subscription system that automatically tracks API usage, calculates costs, and manages subscription limits. This system is fully integrated into the startup process and provides real-time monitoring capabilities.
+
+## 🚀 Quick Start
+
+### 1. Start the Backend with Billing System
+
+```bash
+# From the backend directory
+python start_alwrity_backend.py
+```
+
+The startup script will automatically:
+- ✅ Create billing and subscription database tables
+- ✅ Initialize default pricing and subscription plans
+- ✅ Set up usage tracking middleware
+- ✅ Verify all billing components are working
+- ✅ Start the server with billing endpoints enabled
+
+### 2. Verify Installation
+
+```bash
+# Run the comprehensive verification script
+python verify_billing_setup.py
+```
+
+### 3. Test API Endpoints
+
+```bash
+# Get subscription plans
+curl http://localhost:8000/api/subscription/plans
+
+# Get user usage (replace 'demo' with actual user ID)
+curl http://localhost:8000/api/subscription/usage/demo
+
+# Get billing dashboard data
+curl http://localhost:8000/api/subscription/dashboard/demo
+
+# Get API pricing information
+curl http://localhost:8000/api/subscription/pricing
+```
+
+## 📊 Database Tables
+
+The billing system creates the following tables:
+
+| Table Name | Purpose |
+|------------|---------|
+| `subscription_plans` | Available subscription tiers and pricing |
+| `user_subscriptions` | User subscription assignments |
+| `api_usage_logs` | Detailed API usage tracking |
+| `usage_summaries` | Aggregated usage statistics |
+| `api_provider_pricing` | Cost per token for each AI provider |
+| `usage_alerts` | Usage limit warnings and notifications |
+| `billing_history` | Historical billing records |
+
+## 🔧 System Components
+
+### 1. Database Models (`models/subscription_models.py`)
+- **SubscriptionPlan**: Subscription tiers and pricing
+- **UserSubscription**: User subscription assignments
+- **APIUsageLog**: Detailed usage tracking
+- **UsageSummary**: Aggregated statistics
+- **APIProviderPricing**: Cost calculations
+- **UsageAlert**: Limit notifications
+
+### 2. Services
+- **PricingService** (`services/pricing_service.py`): Cost calculations and plan management
+- **UsageTrackingService** (`services/usage_tracking_service.py`): Usage monitoring and limits
+- **SubscriptionExceptionHandler** (`services/subscription_exception_handler.py`): Error handling
+
+### 3. API Endpoints (`api/subscription_api.py`)
+- `GET /api/subscription/plans` - Available subscription plans
+- `GET /api/subscription/usage/{user_id}` - User usage statistics
+- `GET /api/subscription/dashboard/{user_id}` - Dashboard data
+- `GET /api/subscription/pricing` - API pricing information
+- `GET /api/subscription/trends/{user_id}` - Usage trends
+
+### 4. Middleware Integration
+- **Monitoring Middleware** (`middleware/monitoring_middleware.py`): Automatic usage tracking
+- **Exception Handling**: Graceful error handling for billing issues
+
+## 🎯 Frontend Integration
+
+The billing system is fully integrated with the frontend dashboard:
+
+### CompactBillingDashboard
+- Real-time usage metrics
+- Cost tracking
+- System health monitoring
+- Interactive tooltips and help text
+
+### EnhancedBillingDashboard
+- Detailed usage breakdowns
+- Provider-specific costs
+- Usage trends and analytics
+- Alert management
+
+## 📈 Usage Tracking
+
+The system automatically tracks:
+
+- **API Calls**: Number of requests to each provider
+- **Token Usage**: Input and output tokens for each request
+- **Costs**: Real-time cost calculations
+- **Response Times**: Performance monitoring
+- **Error Rates**: Failed request tracking
+- **User Activity**: Per-user usage patterns
+
+## 💰 Pricing Configuration
+
+### Default AI Provider Pricing (per token)
+
+| Provider | Model | Input Cost | Output Cost |
+|----------|-------|------------|-------------|
+| OpenAI | GPT-4 | $0.00003 | $0.00006 |
+| OpenAI | GPT-3.5-turbo | $0.0000015 | $0.000002 |
+| Gemini | Gemini Pro | $0.0000005 | $0.0000015 |
+| Anthropic | Claude-3 | $0.000008 | $0.000024 |
+| Mistral | Mistral-7B | $0.0000002 | $0.0000006 |
+
+### Subscription Plans
+
+| Plan | Monthly Price | Yearly Price | API Limits |
+|------|---------------|--------------|------------|
+| Free | $0 | $0 | 1,000 calls/month |
+| Starter | $29 | $290 | 10,000 calls/month |
+| Professional | $99 | $990 | 100,000 calls/month |
+| Enterprise | $299 | $2,990 | Unlimited |
+
+## 🔍 Monitoring & Alerts
+
+### Real-time Monitoring
+- Usage tracking for all API calls
+- Cost calculations in real-time
+- Performance metrics
+- Error rate monitoring
+
+### Alert System
+- Usage approaching limits (80% threshold)
+- Cost overruns
+- System health issues
+- Provider-specific problems
+
+## 🛠️ Development Mode
+
+For development with auto-reload:
+
+```bash
+# Development mode with auto-reload
+python start_alwrity_backend.py --dev
+
+# Or with explicit reload flag
+python start_alwrity_backend.py --reload
+```
+
+## 📝 Configuration
+
+### Environment Variables
+
+The system uses the following environment variables:
+
+```bash
+# Database
+DATABASE_URL=sqlite:///./alwrity.db
+
+# API Keys (configured through onboarding)
+OPENAI_API_KEY=your_key_here
+GEMINI_API_KEY=your_key_here
+ANTHROPIC_API_KEY=your_key_here
+MISTRAL_API_KEY=your_key_here
+
+# Server Configuration
+HOST=0.0.0.0
+PORT=8000
+DEBUG=true
+```
+
+### Custom Pricing
+
+To modify pricing, update the `PricingService.initialize_default_pricing()` method in `services/pricing_service.py`.
+
+## 🧪 Testing
+
+### Run Verification Script
+```bash
+python verify_billing_setup.py
+```
+
+### Test Individual Components
+```bash
+# Test subscription system
+python test_subscription_system.py
+
+# Test billing tables creation
+python scripts/create_billing_tables.py
+```
+
+## 🚨 Troubleshooting
+
+### Common Issues
+
+1. **Tables not created**: Run `python scripts/create_billing_tables.py`
+2. **Missing dependencies**: Run `pip install -r requirements.txt`
+3. **Database errors**: Check `DATABASE_URL` in environment
+4. **API key issues**: Verify API keys are configured
+
+### Debug Mode
+
+Enable debug logging by setting `DEBUG=true` in your environment.
+
+## 📚 API Documentation
+
+Once the server is running, access the interactive API documentation:
+
+- **Swagger UI**: http://localhost:8000/api/docs
+- **ReDoc**: http://localhost:8000/api/redoc
+
+## 🔄 Updates and Maintenance
+
+### Adding New Providers
+
+1. Add provider to `APIProvider` enum in `models/subscription_models.py`
+2. Update pricing in `PricingService.initialize_default_pricing()`
+3. Add provider detection in middleware
+4. Update frontend provider chips
+
+### Modifying Plans
+
+1. Update `PricingService.initialize_default_plans()`
+2. Modify plan limits and pricing
+3. Test with verification script
+
+## 📞 Support
+
+For issues or questions:
+
+1. Check the verification script output
+2. Review the startup logs
+3. Test individual components
+4. Check database table creation
+
+## 🎉 Success Indicators
+
+You'll know the billing system is working when:
+
+- ✅ Startup script shows "Billing and subscription tables created successfully"
+- ✅ Verification script passes all checks
+- ✅ API endpoints return data
+- ✅ Frontend dashboard shows usage metrics
+- ✅ Usage tracking middleware is active
+
+The billing system is now fully integrated and ready for production use!
--- a/backend/test/check_db.py
+++ b/backend/test/check_db.py
@@ -0,0 +1,159 @@
+#!/usr/bin/env python3
+"""
+Database check and sample data creation script
+"""
+
+from services.database import get_db_session
+from models.content_planning import ContentStrategy, ContentGapAnalysis, AIAnalysisResult
+from sqlalchemy.orm import Session
+import json
+
+def check_database():
+    """Check what data exists in the database"""
+    db = get_db_session()
+    
+    try:
+        # Check strategies
+        strategies = db.query(ContentStrategy).all()
+        print(f"Found {len(strategies)} strategies")
+        for strategy in strategies:
+            print(f"  Strategy {strategy.id}: {strategy.name} - {strategy.industry}")
+        
+        # Check gap analyses
+        gap_analyses = db.query(ContentGapAnalysis).all()
+        print(f"Found {len(gap_analyses)} gap analyses")
+        
+        # Check AI analytics
+        ai_analytics = db.query(AIAnalysisResult).all()
+        print(f"Found {len(ai_analytics)} AI analytics")
+        
+    except Exception as e:
+        print(f"Error checking database: {e}")
+    finally:
+        db.close()
+
+def create_sample_data():
+    """Create sample data for Strategic Intelligence and Keyword Research tabs"""
+    db = get_db_session()
+    
+    try:
+        # Create a sample strategy if none exists
+        existing_strategies = db.query(ContentStrategy).all()
+        if not existing_strategies:
+            sample_strategy = ContentStrategy(
+                name="Sample Content Strategy",
+                industry="Digital Marketing",
+                target_audience={"demographics": "Small to medium businesses", "interests": ["marketing", "technology"]},
+                content_pillars=["Educational Content", "Thought Leadership", "Case Studies"],
+                ai_recommendations={
+                    "market_positioning": {
+                        "score": 75,
+                        "strengths": ["Strong brand voice", "Consistent content quality"],
+                        "weaknesses": ["Limited video content", "Slow content production"]
+                    },
+                    "competitive_advantages": [
+                        {"advantage": "AI-powered content creation", "impact": "High", "implementation": "In Progress"},
+                        {"advantage": "Data-driven strategy", "impact": "Medium", "implementation": "Complete"}
+                    ],
+                    "strategic_risks": [
+                        {"risk": "Content saturation in market", "probability": "Medium", "impact": "High"},
+                        {"risk": "Algorithm changes affecting reach", "probability": "High", "impact": "Medium"}
+                    ]
+                },
+                user_id=1
+            )
+            db.add(sample_strategy)
+            db.commit()
+            print("Created sample strategy")
+        
+        # Create sample gap analysis
+        existing_gaps = db.query(ContentGapAnalysis).all()
+        if not existing_gaps:
+            sample_gap = ContentGapAnalysis(
+                website_url="https://example.com",
+                competitor_urls=["competitor1.com", "competitor2.com"],
+                target_keywords=["content marketing", "digital strategy", "SEO"],
+                analysis_results={
+                    "gaps": ["Video content gap", "Local SEO opportunities"],
+                    "opportunities": [
+                        {"keyword": "AI content tools", "search_volume": "5K-10K", "competition": "Low", "cpc": "$2.50"},
+                        {"keyword": "content marketing ROI", "search_volume": "1K-5K", "competition": "Medium", "cpc": "$4.20"}
+                    ]
+                },
+                recommendations=[
+                    {
+                        "type": "content",
+                        "title": "Create video tutorials",
+                        "description": "Address the video content gap",
+                        "priority": "high"
+                    },
+                    {
+                        "type": "seo",
+                        "title": "Optimize for local search",
+                        "description": "Target local keywords",
+                        "priority": "medium"
+                    }
+                ],
+                user_id=1
+            )
+            db.add(sample_gap)
+            db.commit()
+            print("Created sample gap analysis")
+        
+        # Create sample AI analytics
+        existing_ai = db.query(AIAnalysisResult).all()
+        if not existing_ai:
+            sample_ai = AIAnalysisResult(
+                analysis_type="strategic_intelligence",
+                insights=[
+                    "Focus on video content to address market gap",
+                    "Leverage AI tools for competitive advantage",
+                    "Monitor algorithm changes closely"
+                ],
+                recommendations=[
+                    {
+                        "type": "content",
+                        "title": "Increase video content production",
+                        "description": "Address the video content gap identified in analysis",
+                        "priority": "high"
+                    },
+                    {
+                        "type": "strategy",
+                        "title": "Implement AI-powered content creation",
+                        "description": "Leverage AI tools for competitive advantage",
+                        "priority": "medium"
+                    }
+                ],
+                performance_metrics={
+                    "content_engagement": 78.5,
+                    "traffic_growth": 25.3,
+                    "conversion_rate": 2.1
+                },
+                personalized_data_used={
+                    "onboarding_data": True,
+                    "user_preferences": True,
+                    "historical_performance": True
+                },
+                processing_time=15.2,
+                ai_service_status="operational",
+                user_id=1
+            )
+            db.add(sample_ai)
+            db.commit()
+            print("Created sample AI analytics")
+            
+    except Exception as e:
+        print(f"Error creating sample data: {e}")
+        db.rollback()
+    finally:
+        db.close()
+
+if __name__ == "__main__":
+    print("Checking database...")
+    check_database()
+    
+    print("\nCreating sample data...")
+    create_sample_data()
+    
+    print("\nFinal database state:")
+    check_database() 
--- a/backend/test/linkedin_keyword_test_report_20250914_223930.json
+++ b/backend/test/linkedin_keyword_test_report_20250914_223930.json
@@ -0,0 +1,104 @@
+{
+  "test_summary": {
+    "total_duration": 52.56023073196411,
+    "total_tests": 4,
+    "successful_tests": 4,
+    "failed_tests": 0,
+    "total_api_calls": 4
+  },
+  "test_results": [
+    {
+      "test_name": "Single Phrase Test (Should be preserved as-is)",
+      "keyword_phrase": "ALwrity content generation",
+      "success": true,
+      "duration": 8.364419937133789,
+      "api_calls": 1,
+      "error": null,
+      "content_length": 44,
+      "sources_count": 0,
+      "citations_count": 0,
+      "grounding_status": {
+        "status": "success",
+        "sources_used": 0,
+        "citation_coverage": 0,
+        "quality_score": 0.0
+      },
+      "generation_metadata": {
+        "model_used": "gemini-2.0-flash-001",
+        "generation_time": 0.002626,
+        "research_time": 0.000537,
+        "grounding_enabled": true
+      }
+    },
+    {
+      "test_name": "Comma-Separated Test (Should be split by commas)",
+      "keyword_phrase": "AI tools, content creation, marketing automation",
+      "success": true,
+      "duration": 12.616755723953247,
+      "api_calls": 1,
+      "error": null,
+      "content_length": 44,
+      "sources_count": 5,
+      "citations_count": 3,
+      "grounding_status": {
+        "status": "success",
+        "sources_used": 5,
+        "citation_coverage": 0.6,
+        "quality_score": 0.359
+      },
+      "generation_metadata": {
+        "model_used": "gemini-2.0-flash-001",
+        "generation_time": 0.009273,
+        "research_time": 0.000285,
+        "grounding_enabled": true
+      }
+    },
+    {
+      "test_name": "Another Single Phrase Test",
+      "keyword_phrase": "LinkedIn content strategy",
+      "success": true,
+      "duration": 11.366000652313232,
+      "api_calls": 1,
+      "error": null,
+      "content_length": 44,
+      "sources_count": 4,
+      "citations_count": 3,
+      "grounding_status": {
+        "status": "success",
+        "sources_used": 4,
+        "citation_coverage": 0.75,
+        "quality_score": 0.359
+      },
+      "generation_metadata": {
+        "model_used": "gemini-2.0-flash-001",
+        "generation_time": 0.008166,
+        "research_time": 0.000473,
+        "grounding_enabled": true
+      }
+    },
+    {
+      "test_name": "Another Comma-Separated Test",
+      "keyword_phrase": "social media, digital marketing, brand awareness",
+      "success": true,
+      "duration": 12.107932806015015,
+      "api_calls": 1,
+      "error": null,
+      "content_length": 44,
+      "sources_count": 0,
+      "citations_count": 0,
+      "grounding_status": {
+        "status": "success",
+        "sources_used": 0,
+        "citation_coverage": 0,
+        "quality_score": 0.0
+      },
+      "generation_metadata": {
+        "model_used": "gemini-2.0-flash-001",
+        "generation_time": 0.004575,
+        "research_time": 0.000323,
+        "grounding_enabled": true
+      }
+    }
+  ],
+  "timestamp": "2025-09-14T22:39:30.220518"
+}
--- a/backend/test/test_grounding_engine.py
+++ b/backend/test/test_grounding_engine.py
@@ -0,0 +1,495 @@
+"""
+Unit tests for GroundingContextEngine.
+
+Tests the enhanced grounding metadata utilization functionality.
+"""
+
+import pytest
+from typing import List
+
+from models.blog_models import (
+    GroundingMetadata,
+    GroundingChunk,
+    GroundingSupport,
+    Citation,
+    BlogOutlineSection,
+    BlogResearchResponse,
+    ResearchSource,
+)
+from services.blog_writer.outline.grounding_engine import GroundingContextEngine
+
+
+class TestGroundingContextEngine:
+    """Test cases for GroundingContextEngine."""
+    
+    def setup_method(self):
+        """Set up test fixtures."""
+        self.engine = GroundingContextEngine()
+        
+        # Create sample grounding chunks
+        self.sample_chunks = [
+            GroundingChunk(
+                title="AI Research Study 2025: Machine Learning Breakthroughs",
+                url="https://research.university.edu/ai-study-2025",
+                confidence_score=0.95
+            ),
+            GroundingChunk(
+                title="Enterprise AI Implementation Guide",
+                url="https://techcorp.com/enterprise-ai-guide",
+                confidence_score=0.88
+            ),
+            GroundingChunk(
+                title="Machine Learning Algorithms Explained",
+                url="https://blog.datascience.com/ml-algorithms",
+                confidence_score=0.82
+            ),
+            GroundingChunk(
+                title="AI Ethics and Responsible Development",
+                url="https://ethics.org/ai-responsible-development",
+                confidence_score=0.90
+            ),
+            GroundingChunk(
+                title="Personal Opinion on AI Trends",
+                url="https://personal-blog.com/ai-opinion",
+                confidence_score=0.65
+            )
+        ]
+        
+        # Create sample grounding supports
+        self.sample_supports = [
+            GroundingSupport(
+                confidence_scores=[0.92, 0.89],
+                grounding_chunk_indices=[0, 1],
+                segment_text="Recent research shows that artificial intelligence is transforming enterprise operations with significant improvements in efficiency and decision-making capabilities.",
+                start_index=0,
+                end_index=150
+            ),
+            GroundingSupport(
+                confidence_scores=[0.85, 0.78],
+                grounding_chunk_indices=[2, 3],
+                segment_text="Machine learning algorithms are becoming more sophisticated, enabling better pattern recognition and predictive analytics in business applications.",
+                start_index=151,
+                end_index=300
+            ),
+            GroundingSupport(
+                confidence_scores=[0.45, 0.52],
+                grounding_chunk_indices=[4],
+                segment_text="Some people think AI is overhyped and won't deliver on its promises.",
+                start_index=301,
+                end_index=400
+            )
+        ]
+        
+        # Create sample citations
+        self.sample_citations = [
+            Citation(
+                citation_type="expert_opinion",
+                start_index=0,
+                end_index=50,
+                text="AI research shows significant improvements in enterprise operations",
+                source_indices=[0],
+                reference="Source 1"
+            ),
+            Citation(
+                citation_type="statistical_data",
+                start_index=51,
+                end_index=100,
+                text="85% of enterprises report improved efficiency with AI implementation",
+                source_indices=[1],
+                reference="Source 2"
+            ),
+            Citation(
+                citation_type="research_study",
+                start_index=101,
+                end_index=150,
+                text="University study demonstrates 40% increase in decision-making accuracy",
+                source_indices=[0],
+                reference="Source 1"
+            )
+        ]
+        
+        # Create sample grounding metadata
+        self.sample_grounding_metadata = GroundingMetadata(
+            grounding_chunks=self.sample_chunks,
+            grounding_supports=self.sample_supports,
+            citations=self.sample_citations,
+            search_entry_point="AI trends and enterprise implementation",
+            web_search_queries=[
+                "AI trends 2025 enterprise",
+                "machine learning business applications",
+                "AI implementation best practices"
+            ]
+        )
+        
+        # Create sample outline section
+        self.sample_section = BlogOutlineSection(
+            id="s1",
+            heading="AI Implementation in Enterprise",
+            subheadings=["Benefits of AI", "Implementation Challenges", "Best Practices"],
+            key_points=["Improved efficiency", "Cost reduction", "Better decision making"],
+            references=[],
+            target_words=400,
+            keywords=["AI", "enterprise", "implementation", "machine learning"]
+        )
+    
+    def test_extract_contextual_insights(self):
+        """Test extraction of contextual insights from grounding metadata."""
+        insights = self.engine.extract_contextual_insights(self.sample_grounding_metadata)
+        
+        # Should have all insight categories
+        expected_categories = [
+            'confidence_analysis', 'authority_analysis', 'temporal_analysis',
+            'content_relationships', 'citation_insights', 'search_intent_insights',
+            'quality_indicators'
+        ]
+        
+        for category in expected_categories:
+            assert category in insights
+        
+        # Test confidence analysis
+        confidence_analysis = insights['confidence_analysis']
+        assert 'average_confidence' in confidence_analysis
+        assert 'high_confidence_count' in confidence_analysis
+        assert confidence_analysis['average_confidence'] > 0.0
+        
+        # Test authority analysis
+        authority_analysis = insights['authority_analysis']
+        assert 'average_authority' in authority_analysis
+        assert 'high_authority_sources' in authority_analysis
+        assert 'authority_distribution' in authority_analysis
+    
+    def test_extract_contextual_insights_empty_metadata(self):
+        """Test extraction with empty grounding metadata."""
+        insights = self.engine.extract_contextual_insights(None)
+        
+        # Should return empty insights structure
+        assert insights['confidence_analysis']['average_confidence'] == 0.0
+        assert insights['authority_analysis']['high_authority_sources'] == 0
+        assert insights['temporal_analysis']['recent_content'] == 0
+    
+    def test_analyze_confidence_patterns(self):
+        """Test confidence pattern analysis."""
+        confidence_analysis = self.engine._analyze_confidence_patterns(self.sample_grounding_metadata)
+        
+        assert 'average_confidence' in confidence_analysis
+        assert 'high_confidence_count' in confidence_analysis
+        assert 'confidence_distribution' in confidence_analysis
+        
+        # Should have reasonable confidence values
+        assert 0.0 <= confidence_analysis['average_confidence'] <= 1.0
+        assert confidence_analysis['high_confidence_count'] >= 0
+    
+    def test_analyze_source_authority(self):
+        """Test source authority analysis."""
+        authority_analysis = self.engine._analyze_source_authority(self.sample_grounding_metadata)
+        
+        assert 'average_authority' in authority_analysis
+        assert 'high_authority_sources' in authority_analysis
+        assert 'authority_distribution' in authority_analysis
+        
+        # Should have reasonable authority values
+        assert 0.0 <= authority_analysis['average_authority'] <= 1.0
+        assert authority_analysis['high_authority_sources'] >= 0
+    
+    def test_analyze_temporal_relevance(self):
+        """Test temporal relevance analysis."""
+        temporal_analysis = self.engine._analyze_temporal_relevance(self.sample_grounding_metadata)
+        
+        assert 'recent_content' in temporal_analysis
+        assert 'trending_topics' in temporal_analysis
+        assert 'evergreen_content' in temporal_analysis
+        assert 'temporal_balance' in temporal_analysis
+        
+        # Should have reasonable temporal values
+        assert temporal_analysis['recent_content'] >= 0
+        assert temporal_analysis['evergreen_content'] >= 0
+        assert temporal_analysis['temporal_balance'] in ['recent_heavy', 'evergreen_heavy', 'balanced', 'unknown']
+    
+    def test_analyze_content_relationships(self):
+        """Test content relationship analysis."""
+        relationships = self.engine._analyze_content_relationships(self.sample_grounding_metadata)
+        
+        assert 'related_concepts' in relationships
+        assert 'content_gaps' in relationships
+        assert 'concept_coverage' in relationships
+        assert 'gap_count' in relationships
+        
+        # Should have reasonable relationship values
+        assert isinstance(relationships['related_concepts'], list)
+        assert isinstance(relationships['content_gaps'], list)
+        assert relationships['concept_coverage'] >= 0
+        assert relationships['gap_count'] >= 0
+    
+    def test_analyze_citation_patterns(self):
+        """Test citation pattern analysis."""
+        citation_analysis = self.engine._analyze_citation_patterns(self.sample_grounding_metadata)
+        
+        assert 'citation_types' in citation_analysis
+        assert 'total_citations' in citation_analysis
+        assert 'citation_density' in citation_analysis
+        assert 'citation_quality' in citation_analysis
+        
+        # Should have reasonable citation values
+        assert citation_analysis['total_citations'] == len(self.sample_citations)
+        assert citation_analysis['citation_density'] >= 0.0
+        assert 0.0 <= citation_analysis['citation_quality'] <= 1.0
+    
+    def test_analyze_search_intent(self):
+        """Test search intent analysis."""
+        intent_analysis = self.engine._analyze_search_intent(self.sample_grounding_metadata)
+        
+        assert 'intent_signals' in intent_analysis
+        assert 'user_questions' in intent_analysis
+        assert 'primary_intent' in intent_analysis
+        
+        # Should have reasonable intent values
+        assert isinstance(intent_analysis['intent_signals'], list)
+        assert isinstance(intent_analysis['user_questions'], list)
+        assert intent_analysis['primary_intent'] in ['informational', 'comparison', 'transactional']
+    
+    def test_assess_quality_indicators(self):
+        """Test quality indicator assessment."""
+        quality_indicators = self.engine._assess_quality_indicators(self.sample_grounding_metadata)
+        
+        assert 'overall_quality' in quality_indicators
+        assert 'quality_factors' in quality_indicators
+        assert 'quality_grade' in quality_indicators
+        
+        # Should have reasonable quality values
+        assert 0.0 <= quality_indicators['overall_quality'] <= 1.0
+        assert isinstance(quality_indicators['quality_factors'], list)
+        assert quality_indicators['quality_grade'] in ['A', 'B', 'C', 'D', 'F']
+    
+    def test_calculate_chunk_authority(self):
+        """Test chunk authority calculation."""
+        # Test high authority chunk
+        high_authority_chunk = self.sample_chunks[0]  # Research study
+        authority_score = self.engine._calculate_chunk_authority(high_authority_chunk)
+        assert 0.0 <= authority_score <= 1.0
+        assert authority_score > 0.5  # Should be high authority
+        
+        # Test low authority chunk
+        low_authority_chunk = self.sample_chunks[4]  # Personal opinion
+        authority_score = self.engine._calculate_chunk_authority(low_authority_chunk)
+        assert 0.0 <= authority_score <= 1.0
+        assert authority_score < 0.7  # Should be lower authority
+    
+    def test_get_authority_sources(self):
+        """Test getting high-authority sources."""
+        authority_sources = self.engine.get_authority_sources(self.sample_grounding_metadata)
+        
+        # Should return list of tuples
+        assert isinstance(authority_sources, list)
+        
+        # Each item should be (chunk, score) tuple
+        for chunk, score in authority_sources:
+            assert isinstance(chunk, GroundingChunk)
+            assert isinstance(score, float)
+            assert 0.0 <= score <= 1.0
+        
+        # Should be sorted by authority score (descending)
+        if len(authority_sources) > 1:
+            for i in range(len(authority_sources) - 1):
+                assert authority_sources[i][1] >= authority_sources[i + 1][1]
+    
+    def test_get_high_confidence_insights(self):
+        """Test getting high-confidence insights."""
+        insights = self.engine.get_high_confidence_insights(self.sample_grounding_metadata)
+        
+        # Should return list of insights
+        assert isinstance(insights, list)
+        
+        # Each insight should be a string
+        for insight in insights:
+            assert isinstance(insight, str)
+            assert len(insight) > 0
+    
+    def test_enhance_sections_with_grounding(self):
+        """Test section enhancement with grounding insights."""
+        sections = [self.sample_section]
+        insights = self.engine.extract_contextual_insights(self.sample_grounding_metadata)
+        
+        enhanced_sections = self.engine.enhance_sections_with_grounding(
+            sections, self.sample_grounding_metadata, insights
+        )
+        
+        # Should return same number of sections
+        assert len(enhanced_sections) == len(sections)
+        
+        # Enhanced section should have same basic structure
+        enhanced_section = enhanced_sections[0]
+        assert enhanced_section.id == self.sample_section.id
+        assert enhanced_section.heading == self.sample_section.heading
+        
+        # Should have enhanced content
+        assert len(enhanced_section.subheadings) >= len(self.sample_section.subheadings)
+        assert len(enhanced_section.key_points) >= len(self.sample_section.key_points)
+        assert len(enhanced_section.keywords) >= len(self.sample_section.keywords)
+    
+    def test_enhance_sections_with_empty_grounding(self):
+        """Test section enhancement with empty grounding metadata."""
+        sections = [self.sample_section]
+        
+        enhanced_sections = self.engine.enhance_sections_with_grounding(
+            sections, None, {}
+        )
+        
+        # Should return original sections unchanged
+        assert len(enhanced_sections) == len(sections)
+        assert enhanced_sections[0].subheadings == self.sample_section.subheadings
+        assert enhanced_sections[0].key_points == self.sample_section.key_points
+        assert enhanced_sections[0].keywords == self.sample_section.keywords
+    
+    def test_find_relevant_chunks(self):
+        """Test finding relevant chunks for a section."""
+        relevant_chunks = self.engine._find_relevant_chunks(
+            self.sample_section, self.sample_grounding_metadata
+        )
+        
+        # Should return list of relevant chunks
+        assert isinstance(relevant_chunks, list)
+        
+        # Each chunk should be a GroundingChunk
+        for chunk in relevant_chunks:
+            assert isinstance(chunk, GroundingChunk)
+    
+    def test_find_relevant_supports(self):
+        """Test finding relevant supports for a section."""
+        relevant_supports = self.engine._find_relevant_supports(
+            self.sample_section, self.sample_grounding_metadata
+        )
+        
+        # Should return list of relevant supports
+        assert isinstance(relevant_supports, list)
+        
+        # Each support should be a GroundingSupport
+        for support in relevant_supports:
+            assert isinstance(support, GroundingSupport)
+    
+    def test_extract_insight_from_segment(self):
+        """Test insight extraction from segment text."""
+        # Test with valid segment
+        segment = "This is a comprehensive analysis of AI trends in enterprise applications."
+        insight = self.engine._extract_insight_from_segment(segment)
+        assert insight == segment
+        
+        # Test with short segment
+        short_segment = "Short"
+        insight = self.engine._extract_insight_from_segment(short_segment)
+        assert insight is None
+        
+        # Test with long segment
+        long_segment = "This is a very long segment that exceeds the maximum length limit and should be truncated appropriately to ensure it fits within the expected constraints and provides comprehensive coverage of the topic while maintaining readability and clarity for the intended audience."
+        insight = self.engine._extract_insight_from_segment(long_segment)
+        assert insight is not None
+        assert len(insight) <= 203  # 200 + "..."
+        assert insight.endswith("...")
+    
+    def test_get_confidence_distribution(self):
+        """Test confidence distribution calculation."""
+        confidences = [0.95, 0.88, 0.82, 0.90, 0.65]
+        distribution = self.engine._get_confidence_distribution(confidences)
+        
+        assert 'high' in distribution
+        assert 'medium' in distribution
+        assert 'low' in distribution
+        
+        # Should have reasonable distribution
+        total = distribution['high'] + distribution['medium'] + distribution['low']
+        assert total == len(confidences)
+    
+    def test_calculate_temporal_balance(self):
+        """Test temporal balance calculation."""
+        # Test recent heavy
+        balance = self.engine._calculate_temporal_balance(8, 2)
+        assert balance == 'recent_heavy'
+        
+        # Test evergreen heavy
+        balance = self.engine._calculate_temporal_balance(2, 8)
+        assert balance == 'evergreen_heavy'
+        
+        # Test balanced
+        balance = self.engine._calculate_temporal_balance(5, 5)
+        assert balance == 'balanced'
+        
+        # Test empty
+        balance = self.engine._calculate_temporal_balance(0, 0)
+        assert balance == 'unknown'
+    
+    def test_extract_related_concepts(self):
+        """Test related concept extraction."""
+        text_list = [
+            "Artificial Intelligence is transforming Machine Learning applications",
+            "Deep Learning algorithms are improving Neural Network performance",
+            "Natural Language Processing is advancing AI capabilities"
+        ]
+        
+        concepts = self.engine._extract_related_concepts(text_list)
+        
+        # Should extract capitalized concepts
+        assert isinstance(concepts, list)
+        assert len(concepts) > 0
+        
+        # Should contain expected concepts
+        expected_concepts = ['Artificial', 'Intelligence', 'Machine', 'Learning', 'Deep', 'Neural', 'Network']
+        for concept in expected_concepts:
+            assert concept in concepts
+    
+    def test_identify_content_gaps(self):
+        """Test content gap identification."""
+        text_list = [
+            "The research shows significant improvements in AI applications",
+            "However, there is a lack of comprehensive studies on AI ethics",
+            "The gap in understanding AI bias remains unexplored",
+            "Current research does not cover all aspects of AI implementation"
+        ]
+        
+        gaps = self.engine._identify_content_gaps(text_list)
+        
+        # Should identify gaps
+        assert isinstance(gaps, list)
+        assert len(gaps) > 0
+    
+    def test_assess_citation_quality(self):
+        """Test citation quality assessment."""
+        quality = self.engine._assess_citation_quality(self.sample_citations)
+        
+        # Should have reasonable quality score
+        assert 0.0 <= quality <= 1.0
+        assert quality > 0.0  # Should have some quality
+    
+    def test_determine_primary_intent(self):
+        """Test primary intent determination."""
+        # Test informational intent
+        intent = self.engine._determine_primary_intent(['informational', 'informational', 'comparison'])
+        assert intent == 'informational'
+        
+        # Test empty signals
+        intent = self.engine._determine_primary_intent([])
+        assert intent == 'informational'
+    
+    def test_get_quality_grade(self):
+        """Test quality grade calculation."""
+        # Test A grade
+        grade = self.engine._get_quality_grade(0.95)
+        assert grade == 'A'
+        
+        # Test B grade
+        grade = self.engine._get_quality_grade(0.85)
+        assert grade == 'B'
+        
+        # Test C grade
+        grade = self.engine._get_quality_grade(0.75)
+        assert grade == 'C'
+        
+        # Test D grade
+        grade = self.engine._get_quality_grade(0.65)
+        assert grade == 'D'
+        
+        # Test F grade
+        grade = self.engine._get_quality_grade(0.45)
+        assert grade == 'F'
+
+
+if __name__ == '__main__':
+    pytest.main([__file__])
--- a/backend/test/test_linkedin_keyword_fix.py
+++ b/backend/test/test_linkedin_keyword_fix.py
@@ -0,0 +1,271 @@
+#!/usr/bin/env python3
+"""
+Test Script for LinkedIn Content Generation Keyword Fix
+
+This script tests the fixed keyword processing by calling the LinkedIn content generation
+endpoint directly and capturing detailed logs to analyze API usage patterns.
+"""
+
+import asyncio
+import json
+import time
+import logging
+from datetime import datetime
+from typing import Dict, Any
+import sys
+import os
+
+# Add the backend directory to the Python path
+sys.path.append(os.path.dirname(os.path.abspath(__file__)))
+
+# Configure detailed logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
+    handlers=[
+        logging.FileHandler(f'test_linkedin_keyword_fix_{datetime.now().strftime("%Y%m%d_%H%M%S")}.log'),
+        logging.StreamHandler(sys.stdout)
+    ]
+)
+
+logger = logging.getLogger(__name__)
+
+# Import the LinkedIn service
+from services.linkedin_service import LinkedInService
+from models.linkedin_models import LinkedInPostRequest, LinkedInPostType, LinkedInTone, GroundingLevel, SearchEngine
+
+
+class LinkedInKeywordTest:
+    """Test class for LinkedIn keyword processing fix."""
+    
+    def __init__(self):
+        self.linkedin_service = LinkedInService()
+        self.test_results = []
+        self.api_call_count = 0
+        self.start_time = None
+        
+    def log_api_call(self, endpoint: str, duration: float, success: bool):
+        """Log API call details."""
+        self.api_call_count += 1
+        logger.info(f"API Call #{self.api_call_count}: {endpoint} - Duration: {duration:.2f}s - Success: {success}")
+    
+    async def test_keyword_phrase(self, phrase: str, test_name: str) -> Dict[str, Any]:
+        """Test a specific keyword phrase."""
+        logger.info(f"\n{'='*60}")
+        logger.info(f"TESTING: {test_name}")
+        logger.info(f"KEYWORD PHRASE: '{phrase}'")
+        logger.info(f"{'='*60}")
+        
+        test_start = time.time()
+        
+        try:
+            # Create the request
+            request = LinkedInPostRequest(
+                topic=phrase,
+                industry="Technology",
+                post_type=LinkedInPostType.PROFESSIONAL,
+                tone=LinkedInTone.PROFESSIONAL,
+                grounding_level=GroundingLevel.ENHANCED,
+                search_engine=SearchEngine.GOOGLE,
+                research_enabled=True,
+                include_citations=True,
+                max_length=1000
+            )
+            
+            logger.info(f"Request created: {request.topic}")
+            logger.info(f"Research enabled: {request.research_enabled}")
+            logger.info(f"Search engine: {request.search_engine}")
+            logger.info(f"Grounding level: {request.grounding_level}")
+            
+            # Call the LinkedIn service
+            logger.info("Calling LinkedIn service...")
+            response = await self.linkedin_service.generate_linkedin_post(request)
+            
+            test_duration = time.time() - test_start
+            self.log_api_call("LinkedIn Post Generation", test_duration, response.success)
+            
+            # Analyze the response
+            result = {
+                "test_name": test_name,
+                "keyword_phrase": phrase,
+                "success": response.success,
+                "duration": test_duration,
+                "api_calls": self.api_call_count,
+                "error": response.error if not response.success else None,
+                "content_length": len(response.data.content) if response.success and response.data else 0,
+                "sources_count": len(response.research_sources) if response.success and response.research_sources else 0,
+                "citations_count": len(response.data.citations) if response.success and response.data and response.data.citations else 0,
+                "grounding_status": response.grounding_status if response.success else None,
+                "generation_metadata": response.generation_metadata if response.success else None
+            }
+            
+            if response.success:
+                logger.info(f"✅ SUCCESS: Generated {result['content_length']} characters")
+                logger.info(f"📊 Sources: {result['sources_count']}, Citations: {result['citations_count']}")
+                logger.info(f"⏱️ Total duration: {test_duration:.2f}s")
+                logger.info(f"🔢 API calls made: {self.api_call_count}")
+                
+                # Log content preview
+                if response.data and response.data.content:
+                    content_preview = response.data.content[:200] + "..." if len(response.data.content) > 200 else response.data.content
+                    logger.info(f"📝 Content preview: {content_preview}")
+                
+                # Log grounding status
+                if response.grounding_status:
+                    logger.info(f"🔍 Grounding status: {response.grounding_status}")
+                    
+            else:
+                logger.error(f"❌ FAILED: {response.error}")
+                
+            return result
+            
+        except Exception as e:
+            test_duration = time.time() - test_start
+            logger.error(f"❌ EXCEPTION in {test_name}: {str(e)}")
+            self.log_api_call("LinkedIn Post Generation", test_duration, False)
+            
+            return {
+                "test_name": test_name,
+                "keyword_phrase": phrase,
+                "success": False,
+                "duration": test_duration,
+                "api_calls": self.api_call_count,
+                "error": str(e),
+                "content_length": 0,
+                "sources_count": 0,
+                "citations_count": 0,
+                "grounding_status": None,
+                "generation_metadata": None
+            }
+    
+    async def run_comprehensive_test(self):
+        """Run comprehensive tests for keyword processing."""
+        logger.info("🚀 Starting LinkedIn Keyword Processing Test Suite")
+        logger.info(f"Test started at: {datetime.now()}")
+        
+        self.start_time = time.time()
+        
+        # Test cases
+        test_cases = [
+            {
+                "phrase": "ALwrity content generation",
+                "name": "Single Phrase Test (Should be preserved as-is)"
+            },
+            {
+                "phrase": "AI tools, content creation, marketing automation",
+                "name": "Comma-Separated Test (Should be split by commas)"
+            },
+            {
+                "phrase": "LinkedIn content strategy",
+                "name": "Another Single Phrase Test"
+            },
+            {
+                "phrase": "social media, digital marketing, brand awareness",
+                "name": "Another Comma-Separated Test"
+            }
+        ]
+        
+        # Run all tests
+        for test_case in test_cases:
+            result = await self.test_keyword_phrase(
+                test_case["phrase"], 
+                test_case["name"]
+            )
+            self.test_results.append(result)
+            
+            # Reset API call counter for next test
+            self.api_call_count = 0
+            
+            # Small delay between tests
+            await asyncio.sleep(2)
+        
+        # Generate summary report
+        self.generate_summary_report()
+    
+    def generate_summary_report(self):
+        """Generate a comprehensive summary report."""
+        total_time = time.time() - self.start_time
+        
+        logger.info(f"\n{'='*80}")
+        logger.info("📊 COMPREHENSIVE TEST SUMMARY REPORT")
+        logger.info(f"{'='*80}")
+        
+        logger.info(f"🕐 Total test duration: {total_time:.2f} seconds")
+        logger.info(f"🧪 Total tests run: {len(self.test_results)}")
+        
+        successful_tests = [r for r in self.test_results if r["success"]]
+        failed_tests = [r for r in self.test_results if not r["success"]]
+        
+        logger.info(f"✅ Successful tests: {len(successful_tests)}")
+        logger.info(f"❌ Failed tests: {len(failed_tests)}")
+        
+        if successful_tests:
+            avg_duration = sum(r["duration"] for r in successful_tests) / len(successful_tests)
+            avg_content_length = sum(r["content_length"] for r in successful_tests) / len(successful_tests)
+            avg_sources = sum(r["sources_count"] for r in successful_tests) / len(successful_tests)
+            avg_citations = sum(r["citations_count"] for r in successful_tests) / len(successful_tests)
+            
+            logger.info(f"📈 Average generation time: {avg_duration:.2f}s")
+            logger.info(f"📝 Average content length: {avg_content_length:.0f} characters")
+            logger.info(f"🔍 Average sources found: {avg_sources:.1f}")
+            logger.info(f"📚 Average citations: {avg_citations:.1f}")
+        
+        # Detailed results
+        logger.info(f"\n📋 DETAILED TEST RESULTS:")
+        for i, result in enumerate(self.test_results, 1):
+            status = "✅ PASS" if result["success"] else "❌ FAIL"
+            logger.info(f"{i}. {status} - {result['test_name']}")
+            logger.info(f"   Phrase: '{result['keyword_phrase']}'")
+            logger.info(f"   Duration: {result['duration']:.2f}s")
+            if result["success"]:
+                logger.info(f"   Content: {result['content_length']} chars, Sources: {result['sources_count']}, Citations: {result['citations_count']}")
+            else:
+                logger.info(f"   Error: {result['error']}")
+        
+        # API Usage Analysis
+        logger.info(f"\n🔍 API USAGE ANALYSIS:")
+        total_api_calls = sum(r["api_calls"] for r in self.test_results)
+        logger.info(f"Total API calls across all tests: {total_api_calls}")
+        
+        if successful_tests:
+            avg_api_calls = sum(r["api_calls"] for r in successful_tests) / len(successful_tests)
+            logger.info(f"Average API calls per successful test: {avg_api_calls:.1f}")
+        
+        # Save detailed results to JSON file
+        report_data = {
+            "test_summary": {
+                "total_duration": total_time,
+                "total_tests": len(self.test_results),
+                "successful_tests": len(successful_tests),
+                "failed_tests": len(failed_tests),
+                "total_api_calls": total_api_calls
+            },
+            "test_results": self.test_results,
+            "timestamp": datetime.now().isoformat()
+        }
+        
+        report_filename = f"linkedin_keyword_test_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
+        with open(report_filename, 'w') as f:
+            json.dump(report_data, f, indent=2, default=str)
+        
+        logger.info(f"📄 Detailed report saved to: {report_filename}")
+        logger.info(f"{'='*80}")
+
+
+async def main():
+    """Main test execution function."""
+    try:
+        test_suite = LinkedInKeywordTest()
+        await test_suite.run_comprehensive_test()
+        
+    except Exception as e:
+        logger.error(f"❌ Test suite failed: {str(e)}")
+        raise
+
+
+if __name__ == "__main__":
+    print("🚀 Starting LinkedIn Keyword Processing Test Suite")
+    print("This will test the keyword fix and analyze API usage patterns...")
+    print("=" * 60)
+    
+    asyncio.run(main())
--- a/backend/test/test_research_data_filter.py
+++ b/backend/test/test_research_data_filter.py
@@ -0,0 +1,366 @@
+"""
+Unit tests for ResearchDataFilter.
+
+Tests the filtering and cleaning functionality for research data.
+"""
+
+import pytest
+from datetime import datetime, timedelta
+from typing import List
+
+from models.blog_models import (
+    BlogResearchResponse,
+    ResearchSource,
+    GroundingMetadata,
+    GroundingChunk,
+    GroundingSupport,
+    Citation,
+)
+from services.blog_writer.research.data_filter import ResearchDataFilter
+
+
+class TestResearchDataFilter:
+    """Test cases for ResearchDataFilter."""
+    
+    def setup_method(self):
+        """Set up test fixtures."""
+        self.filter = ResearchDataFilter()
+        
+        # Create sample research sources
+        self.sample_sources = [
+            ResearchSource(
+                title="High Quality AI Article",
+                url="https://example.com/ai-article",
+                excerpt="This is a comprehensive article about artificial intelligence trends in 2024 with detailed analysis and expert insights.",
+                credibility_score=0.95,
+                published_at="2025-08-15",
+                index=0,
+                source_type="web"
+            ),
+            ResearchSource(
+                title="Low Quality Source",
+                url="https://example.com/low-quality",
+                excerpt="This is a low quality source with very poor credibility score and outdated information from 2020.",
+                credibility_score=0.3,
+                published_at="2020-01-01",
+                index=1,
+                source_type="web"
+            ),
+            ResearchSource(
+                title="PDF Document",
+                url="https://example.com/document.pdf",
+                excerpt="This is a PDF document with research data",
+                credibility_score=0.8,
+                published_at="2025-08-01",
+                index=2,
+                source_type="web"
+            ),
+            ResearchSource(
+                title="Recent AI Study",
+                url="https://example.com/ai-study",
+                excerpt="A recent study on AI adoption shows significant growth in enterprise usage with detailed statistics and case studies.",
+                credibility_score=0.9,
+                published_at="2025-09-01",
+                index=3,
+                source_type="web"
+            )
+        ]
+        
+        # Create sample grounding metadata
+        self.sample_grounding_metadata = GroundingMetadata(
+            grounding_chunks=[
+                GroundingChunk(
+                    title="High Confidence Chunk",
+                    url="https://example.com/chunk1",
+                    confidence_score=0.95
+                ),
+                GroundingChunk(
+                    title="Low Confidence Chunk",
+                    url="https://example.com/chunk2",
+                    confidence_score=0.5
+                ),
+                GroundingChunk(
+                    title="Medium Confidence Chunk",
+                    url="https://example.com/chunk3",
+                    confidence_score=0.8
+                )
+            ],
+            grounding_supports=[
+                GroundingSupport(
+                    confidence_scores=[0.9, 0.85],
+                    grounding_chunk_indices=[0, 1],
+                    segment_text="High confidence support text with expert insights"
+                ),
+                GroundingSupport(
+                    confidence_scores=[0.4, 0.3],
+                    grounding_chunk_indices=[2, 3],
+                    segment_text="Low confidence support text"
+                )
+            ],
+            citations=[
+                Citation(
+                    citation_type="expert_opinion",
+                    start_index=0,
+                    end_index=50,
+                    text="Expert opinion on AI trends",
+                    source_indices=[0],
+                    reference="Source 1"
+                ),
+                Citation(
+                    citation_type="statistical_data",
+                    start_index=51,
+                    end_index=100,
+                    text="Statistical data showing AI adoption rates",
+                    source_indices=[1],
+                    reference="Source 2"
+                ),
+                Citation(
+                    citation_type="inline",
+                    start_index=101,
+                    end_index=150,
+                    text="Generic inline citation",
+                    source_indices=[2],
+                    reference="Source 3"
+                )
+            ]
+        )
+        
+        # Create sample research response
+        self.sample_research_response = BlogResearchResponse(
+            success=True,
+            sources=self.sample_sources,
+            keyword_analysis={
+                'primary': ['artificial intelligence', 'AI trends', 'machine learning'],
+                'secondary': ['AI adoption', 'enterprise AI', 'AI technology'],
+                'long_tail': ['AI trends 2024', 'enterprise AI adoption rates', 'AI technology benefits'],
+                'semantic_keywords': ['artificial intelligence', 'machine learning', 'deep learning'],
+                'trending_terms': ['AI 2024', 'generative AI', 'AI automation'],
+                'content_gaps': [
+                    'AI ethics in small businesses',
+                    'AI implementation guide for startups',
+                    'AI cost-benefit analysis for SMEs',
+                    'general overview',  # Should be filtered out
+                    'basics'  # Should be filtered out
+                ],
+                'search_intent': 'informational',
+                'difficulty': 7
+            },
+            competitor_analysis={
+                'top_competitors': ['Competitor A', 'Competitor B', 'Competitor C'],
+                'opportunities': ['Market gap 1', 'Market gap 2'],
+                'competitive_advantages': ['Advantage 1', 'Advantage 2'],
+                'market_positioning': 'Premium positioning'
+            },
+            suggested_angles=[
+                'AI trends in 2024',
+                'Enterprise AI adoption',
+                'AI implementation strategies'
+            ],
+            search_widget="<div>Search widget HTML</div>",
+            search_queries=["AI trends 2024", "enterprise AI adoption"],
+            grounding_metadata=self.sample_grounding_metadata
+        )
+    
+    def test_filter_sources_quality_filtering(self):
+        """Test that sources are filtered by quality criteria."""
+        filtered_sources = self.filter.filter_sources(self.sample_sources)
+        
+        # Should filter out low quality source (credibility < 0.6) and PDF document
+        assert len(filtered_sources) == 2  # Only high quality and recent AI study should pass
+        assert all(source.credibility_score >= 0.6 for source in filtered_sources)
+        
+        # Should filter out sources with short excerpts
+        assert all(len(source.excerpt) >= 50 for source in filtered_sources)
+    
+    def test_filter_sources_relevance_filtering(self):
+        """Test that irrelevant sources are filtered out."""
+        filtered_sources = self.filter.filter_sources(self.sample_sources)
+        
+        # Should filter out PDF document
+        pdf_sources = [s for s in filtered_sources if s.url.endswith('.pdf')]
+        assert len(pdf_sources) == 0
+    
+    def test_filter_sources_recency_filtering(self):
+        """Test that old sources are filtered out."""
+        filtered_sources = self.filter.filter_sources(self.sample_sources)
+        
+        # Should filter out old source (2020)
+        old_sources = [s for s in filtered_sources if s.published_at == "2020-01-01"]
+        assert len(old_sources) == 0
+    
+    def test_filter_sources_max_limit(self):
+        """Test that sources are limited to max_sources."""
+        # Create more sources than max_sources
+        many_sources = self.sample_sources * 5  # 20 sources
+        filtered_sources = self.filter.filter_sources(many_sources)
+        
+        assert len(filtered_sources) <= self.filter.max_sources
+    
+    def test_filter_grounding_metadata_confidence_filtering(self):
+        """Test that grounding metadata is filtered by confidence."""
+        filtered_metadata = self.filter.filter_grounding_metadata(self.sample_grounding_metadata)
+        
+        assert filtered_metadata is not None
+        
+        # Should filter out low confidence chunks
+        assert len(filtered_metadata.grounding_chunks) == 2
+        assert all(chunk.confidence_score >= 0.7 for chunk in filtered_metadata.grounding_chunks)
+        
+        # Should filter out low confidence supports
+        assert len(filtered_metadata.grounding_supports) == 1
+        assert all(max(support.confidence_scores) >= 0.7 for support in filtered_metadata.grounding_supports)
+        
+        # Should filter out irrelevant citations
+        assert len(filtered_metadata.citations) == 2
+        relevant_types = ['expert_opinion', 'statistical_data', 'recent_news', 'research_study']
+        assert all(citation.citation_type in relevant_types for citation in filtered_metadata.citations)
+    
+    def test_clean_keyword_analysis(self):
+        """Test that keyword analysis is cleaned and deduplicated."""
+        keyword_analysis = {
+            'primary': ['AI', 'artificial intelligence', 'AI', 'machine learning', ''],
+            'secondary': ['AI adoption', 'enterprise AI', 'ai adoption'],  # Case duplicates
+            'long_tail': ['AI trends 2024', 'ai trends 2024', 'AI TRENDS 2024'],  # Case duplicates
+            'search_intent': 'informational',
+            'difficulty': 7
+        }
+        
+        cleaned_analysis = self.filter.clean_keyword_analysis(keyword_analysis)
+        
+        # Should remove duplicates and empty strings (keywords are converted to lowercase)
+        assert len(cleaned_analysis['primary']) == 3
+        assert 'ai' in cleaned_analysis['primary']
+        assert 'artificial intelligence' in cleaned_analysis['primary']
+        assert 'machine learning' in cleaned_analysis['primary']
+        
+        # Should handle case-insensitive deduplication
+        assert len(cleaned_analysis['secondary']) == 2
+        assert len(cleaned_analysis['long_tail']) == 1
+        
+        # Should preserve other fields
+        assert cleaned_analysis['search_intent'] == 'informational'
+        assert cleaned_analysis['difficulty'] == 7
+    
+    def test_filter_content_gaps(self):
+        """Test that content gaps are filtered for quality and relevance."""
+        content_gaps = [
+            'AI ethics in small businesses',
+            'AI implementation guide for startups',
+            'general overview',  # Should be filtered out
+            'basics',  # Should be filtered out
+            'a',  # Too short, should be filtered out
+            'AI cost-benefit analysis for SMEs'
+        ]
+        
+        filtered_gaps = self.filter.filter_content_gaps(content_gaps, self.sample_research_response)
+        
+        # Should filter out generic and short gaps
+        assert len(filtered_gaps) >= 3  # At least the good ones should pass
+        assert 'AI ethics in small businesses' in filtered_gaps
+        assert 'AI implementation guide for startups' in filtered_gaps
+        assert 'AI cost-benefit analysis for SMEs' in filtered_gaps
+        assert 'general overview' not in filtered_gaps
+        assert 'basics' not in filtered_gaps
+    
+    def test_filter_research_data_integration(self):
+        """Test the complete filtering pipeline."""
+        filtered_research = self.filter.filter_research_data(self.sample_research_response)
+        
+        # Should maintain success status
+        assert filtered_research.success is True
+        
+        # Should filter sources
+        assert len(filtered_research.sources) < len(self.sample_research_response.sources)
+        assert len(filtered_research.sources) >= 0  # May be 0 if all sources are filtered out
+        
+        # Should filter grounding metadata
+        if filtered_research.grounding_metadata:
+            assert len(filtered_research.grounding_metadata.grounding_chunks) < len(self.sample_grounding_metadata.grounding_chunks)
+        
+        # Should clean keyword analysis
+        assert 'primary' in filtered_research.keyword_analysis
+        assert len(filtered_research.keyword_analysis['primary']) <= self.filter.max_keywords_per_category
+        
+        # Should filter content gaps
+        assert len(filtered_research.keyword_analysis['content_gaps']) < len(self.sample_research_response.keyword_analysis['content_gaps'])
+        
+        # Should preserve other fields
+        assert filtered_research.suggested_angles == self.sample_research_response.suggested_angles
+        assert filtered_research.search_widget == self.sample_research_response.search_widget
+        assert filtered_research.search_queries == self.sample_research_response.search_queries
+    
+    def test_filter_with_empty_data(self):
+        """Test filtering with empty or None data."""
+        empty_research = BlogResearchResponse(
+            success=True,
+            sources=[],
+            keyword_analysis={},
+            competitor_analysis={},
+            suggested_angles=[],
+            search_widget="",
+            search_queries=[],
+            grounding_metadata=None
+        )
+        
+        filtered_research = self.filter.filter_research_data(empty_research)
+        
+        assert filtered_research.success is True
+        assert len(filtered_research.sources) == 0
+        assert filtered_research.grounding_metadata is None
+        # keyword_analysis may contain content_gaps even if empty
+        assert 'content_gaps' in filtered_research.keyword_analysis
+    
+    def test_parse_date_functionality(self):
+        """Test date parsing functionality."""
+        # Test various date formats
+        test_dates = [
+            "2024-01-15",
+            "2024-01-15T10:30:00",
+            "2024-01-15T10:30:00Z",
+            "January 15, 2024",
+            "Jan 15, 2024",
+            "15 January 2024",
+            "01/15/2024",
+            "15/01/2024"
+        ]
+        
+        for date_str in test_dates:
+            parsed_date = self.filter._parse_date(date_str)
+            assert parsed_date is not None
+            assert isinstance(parsed_date, datetime)
+        
+        # Test invalid date
+        invalid_date = self.filter._parse_date("invalid date")
+        assert invalid_date is None
+        
+        # Test None date
+        none_date = self.filter._parse_date(None)
+        assert none_date is None
+    
+    def test_clean_keyword_list_functionality(self):
+        """Test keyword list cleaning functionality."""
+        keywords = [
+            'AI',
+            'artificial intelligence',
+            'AI',  # Duplicate
+            'the',  # Stop word
+            'machine learning',
+            '',  # Empty
+            '   ',  # Whitespace only
+            'MACHINE LEARNING',  # Case duplicate
+            'ai'  # Case duplicate
+        ]
+        
+        cleaned_keywords = self.filter._clean_keyword_list(keywords)
+        
+        # Should remove duplicates, stop words, and empty strings
+        assert len(cleaned_keywords) == 3
+        assert 'ai' in cleaned_keywords
+        assert 'artificial intelligence' in cleaned_keywords
+        assert 'machine learning' in cleaned_keywords
+        assert 'the' not in cleaned_keywords
+        assert '' not in cleaned_keywords
+
+
+if __name__ == '__main__':
+    pytest.main([__file__])
--- a/backend/test/test_source_mapper.py
+++ b/backend/test/test_source_mapper.py
@@ -0,0 +1,515 @@
+"""
+Unit tests for SourceToSectionMapper.
+
+Tests the intelligent source-to-section mapping functionality.
+"""
+
+import pytest
+from typing import List
+
+from models.blog_models import (
+    BlogOutlineSection,
+    ResearchSource,
+    BlogResearchResponse,
+    GroundingMetadata,
+)
+from services.blog_writer.outline.source_mapper import SourceToSectionMapper
+
+
+class TestSourceToSectionMapper:
+    """Test cases for SourceToSectionMapper."""
+    
+    def setup_method(self):
+        """Set up test fixtures."""
+        self.mapper = SourceToSectionMapper()
+        
+        # Create sample research sources
+        self.sample_sources = [
+            ResearchSource(
+                title="AI Trends in 2025: Machine Learning Revolution",
+                url="https://example.com/ai-trends-2025",
+                excerpt="Comprehensive analysis of artificial intelligence trends in 2025, focusing on machine learning advancements, deep learning breakthroughs, and AI automation in enterprise environments.",
+                credibility_score=0.95,
+                published_at="2025-08-15",
+                index=0,
+                source_type="web"
+            ),
+            ResearchSource(
+                title="Enterprise AI Implementation Guide",
+                url="https://example.com/enterprise-ai-guide",
+                excerpt="Step-by-step guide for implementing artificial intelligence solutions in enterprise environments, including best practices, challenges, and success stories from leading companies.",
+                credibility_score=0.9,
+                published_at="2025-08-01",
+                index=1,
+                source_type="web"
+            ),
+            ResearchSource(
+                title="Machine Learning Algorithms Explained",
+                url="https://example.com/ml-algorithms",
+                excerpt="Detailed explanation of various machine learning algorithms including supervised learning, unsupervised learning, and reinforcement learning techniques with practical examples.",
+                credibility_score=0.85,
+                published_at="2025-07-20",
+                index=2,
+                source_type="web"
+            ),
+            ResearchSource(
+                title="AI Ethics and Responsible Development",
+                url="https://example.com/ai-ethics",
+                excerpt="Discussion of ethical considerations in artificial intelligence development, including bias mitigation, transparency, and responsible AI practices for developers and organizations.",
+                credibility_score=0.88,
+                published_at="2025-07-10",
+                index=3,
+                source_type="web"
+            ),
+            ResearchSource(
+                title="Deep Learning Neural Networks Tutorial",
+                url="https://example.com/deep-learning-tutorial",
+                excerpt="Comprehensive tutorial on deep learning neural networks, covering convolutional neural networks, recurrent neural networks, and transformer architectures with code examples.",
+                credibility_score=0.92,
+                published_at="2025-06-15",
+                index=4,
+                source_type="web"
+            )
+        ]
+        
+        # Create sample outline sections
+        self.sample_sections = [
+            BlogOutlineSection(
+                id="s1",
+                heading="Introduction to AI and Machine Learning",
+                subheadings=["What is AI?", "Types of Machine Learning", "AI Applications"],
+                key_points=["AI definition and scope", "ML vs traditional programming", "Real-world AI examples"],
+                references=[],
+                target_words=300,
+                keywords=["artificial intelligence", "machine learning", "AI basics", "introduction"]
+            ),
+            BlogOutlineSection(
+                id="s2",
+                heading="Enterprise AI Implementation Strategies",
+                subheadings=["Planning Phase", "Implementation Steps", "Best Practices"],
+                key_points=["Strategic planning", "Technology selection", "Change management", "ROI measurement"],
+                references=[],
+                target_words=400,
+                keywords=["enterprise AI", "implementation", "strategies", "business"]
+            ),
+            BlogOutlineSection(
+                id="s3",
+                heading="Machine Learning Algorithms Deep Dive",
+                subheadings=["Supervised Learning", "Unsupervised Learning", "Deep Learning"],
+                key_points=["Algorithm types", "Use cases", "Performance metrics", "Model selection"],
+                references=[],
+                target_words=500,
+                keywords=["machine learning algorithms", "supervised learning", "deep learning", "neural networks"]
+            ),
+            BlogOutlineSection(
+                id="s4",
+                heading="AI Ethics and Responsible Development",
+                subheadings=["Ethical Considerations", "Bias and Fairness", "Transparency"],
+                key_points=["Ethical frameworks", "Bias detection", "Explainable AI", "Regulatory compliance"],
+                references=[],
+                target_words=350,
+                keywords=["AI ethics", "responsible AI", "bias", "transparency"]
+            )
+        ]
+        
+        # Create sample research response
+        self.sample_research = BlogResearchResponse(
+            success=True,
+            sources=self.sample_sources,
+            keyword_analysis={
+                'primary': ['artificial intelligence', 'machine learning', 'AI implementation'],
+                'secondary': ['enterprise AI', 'deep learning', 'AI ethics'],
+                'long_tail': ['AI trends 2025', 'enterprise AI implementation guide', 'machine learning algorithms explained'],
+                'semantic_keywords': ['AI', 'ML', 'neural networks', 'automation'],
+                'trending_terms': ['AI 2025', 'generative AI', 'AI automation'],
+                'search_intent': 'informational',
+                'content_gaps': ['AI implementation challenges', 'ML algorithm comparison']
+            },
+            competitor_analysis={
+                'top_competitors': ['TechCorp AI', 'DataScience Inc', 'AI Solutions Ltd'],
+                'opportunities': ['Enterprise market gap', 'SME AI adoption'],
+                'competitive_advantages': ['Comprehensive coverage', 'Practical examples']
+            },
+            suggested_angles=[
+                'AI trends in 2025',
+                'Enterprise AI implementation',
+                'Machine learning fundamentals',
+                'AI ethics and responsibility'
+            ],
+            search_widget="<div>Search widget HTML</div>",
+            search_queries=["AI trends 2025", "enterprise AI implementation", "machine learning guide"],
+            grounding_metadata=GroundingMetadata(
+                grounding_chunks=[],
+                grounding_supports=[],
+                citations=[],
+                search_entry_point="AI trends and implementation",
+                web_search_queries=["AI trends 2025", "enterprise AI"]
+            )
+        )
+    
+    def test_semantic_similarity_calculation(self):
+        """Test semantic similarity calculation between sections and sources."""
+        section = self.sample_sections[0]  # AI Introduction section
+        source = self.sample_sources[0]    # AI Trends source
+        
+        similarity = self.mapper._calculate_semantic_similarity(section, source)
+        
+        # Should have high similarity due to AI-related content
+        assert 0.0 <= similarity <= 1.0
+        assert similarity > 0.3  # Should be reasonably high for AI-related content
+    
+    def test_keyword_relevance_calculation(self):
+        """Test keyword-based relevance calculation."""
+        section = self.sample_sections[1]  # Enterprise AI section
+        source = self.sample_sources[1]    # Enterprise AI Guide source
+        
+        relevance = self.mapper._calculate_keyword_relevance(section, source, self.sample_research)
+        
+        # Should have reasonable relevance due to enterprise AI keywords
+        assert 0.0 <= relevance <= 1.0
+        assert relevance > 0.1  # Should be reasonable for matching enterprise AI content
+    
+    def test_contextual_relevance_calculation(self):
+        """Test contextual relevance calculation."""
+        section = self.sample_sections[2]  # ML Algorithms section
+        source = self.sample_sources[2]    # ML Algorithms source
+        
+        relevance = self.mapper._calculate_contextual_relevance(section, source, self.sample_research)
+        
+        # Should have high relevance due to matching content angles
+        assert 0.0 <= relevance <= 1.0
+        assert relevance > 0.2  # Should be reasonable for matching content
+    
+    def test_algorithmic_source_mapping(self):
+        """Test the complete algorithmic mapping process."""
+        mapping_results = self.mapper._algorithmic_source_mapping(self.sample_sections, self.sample_research)
+        
+        # Should have mapping results for all sections
+        assert len(mapping_results) == len(self.sample_sections)
+        
+        # Each section should have some mapped sources
+        for section_id, sources in mapping_results.items():
+            assert isinstance(sources, list)
+            # Each source should be a tuple of (source, score)
+            for source, score in sources:
+                assert isinstance(source, ResearchSource)
+                assert isinstance(score, float)
+                assert 0.0 <= score <= 1.0
+    
+    def test_source_mapping_quality(self):
+        """Test that sources are mapped to relevant sections."""
+        mapping_results = self.mapper._algorithmic_source_mapping(self.sample_sections, self.sample_research)
+        
+        # Enterprise AI section should have enterprise AI source
+        enterprise_section = mapping_results["s2"]
+        enterprise_source_titles = [source.title for source, score in enterprise_section]
+        assert any("Enterprise" in title for title in enterprise_source_titles)
+        
+        # ML Algorithms section should have ML algorithms source
+        ml_section = mapping_results["s3"]
+        ml_source_titles = [source.title for source, score in ml_section]
+        assert any("Machine Learning" in title or "Algorithms" in title for title in ml_source_titles)
+        
+        # AI Ethics section should have AI ethics source
+        ethics_section = mapping_results["s4"]
+        ethics_source_titles = [source.title for source, score in ethics_section]
+        assert any("Ethics" in title for title in ethics_source_titles)
+    
+    def test_complete_mapping_pipeline(self):
+        """Test the complete mapping pipeline from sections to mapped sections."""
+        mapped_sections = self.mapper.map_sources_to_sections(self.sample_sections, self.sample_research)
+        
+        # Should return same number of sections
+        assert len(mapped_sections) == len(self.sample_sections)
+        
+        # Each section should have mapped sources
+        for section in mapped_sections:
+            assert isinstance(section.references, list)
+            assert len(section.references) <= self.mapper.max_sources_per_section
+            
+            # All references should be ResearchSource objects
+            for source in section.references:
+                assert isinstance(source, ResearchSource)
+    
+    def test_mapping_with_empty_sources(self):
+        """Test mapping behavior with empty sources list."""
+        empty_research = BlogResearchResponse(
+            success=True,
+            sources=[],
+            keyword_analysis={},
+            competitor_analysis={},
+            suggested_angles=[],
+            search_widget="",
+            search_queries=[],
+            grounding_metadata=None
+        )
+        
+        mapped_sections = self.mapper.map_sources_to_sections(self.sample_sections, empty_research)
+        
+        # Should return sections with empty references
+        for section in mapped_sections:
+            assert section.references == []
+    
+    def test_mapping_with_empty_sections(self):
+        """Test mapping behavior with empty sections list."""
+        mapped_sections = self.mapper.map_sources_to_sections([], self.sample_research)
+        
+        # Should return empty list
+        assert mapped_sections == []
+    
+    def test_meaningful_words_extraction(self):
+        """Test extraction of meaningful words from text."""
+        text = "Artificial Intelligence and Machine Learning are transforming the world of technology and business applications."
+        words = self.mapper._extract_meaningful_words(text)
+        
+        # Should extract meaningful words and remove stop words
+        assert "artificial" in words
+        assert "intelligence" in words
+        assert "machine" in words
+        assert "learning" in words
+        assert "the" not in words  # Stop word should be removed
+        assert "and" not in words  # Stop word should be removed
+    
+    def test_phrase_similarity_calculation(self):
+        """Test phrase similarity calculation."""
+        text1 = "machine learning algorithms"
+        text2 = "This article covers machine learning algorithms and their applications"
+        
+        similarity = self.mapper._calculate_phrase_similarity(text1, text2)
+        
+        # Should find phrase matches
+        assert similarity > 0.0
+        assert similarity <= 0.3  # Should be capped at 0.3
+    
+    def test_intent_keywords_extraction(self):
+        """Test extraction of intent-specific keywords."""
+        informational_keywords = self.mapper._get_intent_keywords("informational")
+        transactional_keywords = self.mapper._get_intent_keywords("transactional")
+        
+        # Should return appropriate keywords for each intent
+        assert "what" in informational_keywords
+        assert "how" in informational_keywords
+        assert "guide" in informational_keywords
+        
+        assert "buy" in transactional_keywords
+        assert "purchase" in transactional_keywords
+        assert "price" in transactional_keywords
+    
+    def test_mapping_statistics(self):
+        """Test mapping statistics calculation."""
+        mapping_results = self.mapper._algorithmic_source_mapping(self.sample_sections, self.sample_research)
+        stats = self.mapper.get_mapping_statistics(mapping_results)
+        
+        # Should have valid statistics
+        assert stats['total_sections'] == len(self.sample_sections)
+        assert stats['total_mappings'] > 0
+        assert stats['sections_with_sources'] > 0
+        assert 0.0 <= stats['average_score'] <= 1.0
+        assert 0.0 <= stats['max_score'] <= 1.0
+        assert 0.0 <= stats['min_score'] <= 1.0
+        assert 0.0 <= stats['mapping_coverage'] <= 1.0
+    
+    def test_source_quality_filtering(self):
+        """Test that low-quality sources are filtered out."""
+        # Create a low-quality source
+        low_quality_source = ResearchSource(
+            title="Random Article",
+            url="https://example.com/random",
+            excerpt="This is a completely unrelated article about cooking recipes and gardening tips.",
+            credibility_score=0.3,
+            published_at="2025-08-01",
+            index=5,
+            source_type="web"
+        )
+        
+        # Add to research data
+        research_with_low_quality = BlogResearchResponse(
+            success=True,
+            sources=self.sample_sources + [low_quality_source],
+            keyword_analysis=self.sample_research.keyword_analysis,
+            competitor_analysis=self.sample_research.competitor_analysis,
+            suggested_angles=self.sample_research.suggested_angles,
+            search_widget=self.sample_research.search_widget,
+            search_queries=self.sample_research.search_queries,
+            grounding_metadata=self.sample_research.grounding_metadata
+        )
+        
+        mapping_results = self.mapper._algorithmic_source_mapping(self.sample_sections, research_with_low_quality)
+        
+        # Low-quality source should not be mapped to any section
+        all_mapped_sources = []
+        for sources in mapping_results.values():
+            all_mapped_sources.extend([source for source, score in sources])
+        
+        assert low_quality_source not in all_mapped_sources
+    
+    def test_max_sources_per_section_limit(self):
+        """Test that the maximum sources per section limit is enforced."""
+        # Create many sources
+        many_sources = self.sample_sources * 3  # 15 sources
+        
+        research_with_many_sources = BlogResearchResponse(
+            success=True,
+            sources=many_sources,
+            keyword_analysis=self.sample_research.keyword_analysis,
+            competitor_analysis=self.sample_research.competitor_analysis,
+            suggested_angles=self.sample_research.suggested_angles,
+            search_widget=self.sample_research.search_widget,
+            search_queries=self.sample_research.search_queries,
+            grounding_metadata=self.sample_research.grounding_metadata
+        )
+        
+        mapping_results = self.mapper._algorithmic_source_mapping(self.sample_sections, research_with_many_sources)
+        
+        # Each section should have at most max_sources_per_section sources
+        for section_id, sources in mapping_results.items():
+            assert len(sources) <= self.mapper.max_sources_per_section
+    
+    def test_ai_validation_prompt_building(self):
+        """Test AI validation prompt building."""
+        mapping_results = self.mapper._algorithmic_source_mapping(self.sample_sections, self.sample_research)
+        
+        prompt = self.mapper._build_validation_prompt(mapping_results, self.sample_research)
+        
+        # Should contain key elements
+        assert "expert content strategist" in prompt
+        assert "Research Topic:" in prompt
+        assert "ALGORITHMIC MAPPING RESULTS" in prompt
+        assert "AVAILABLE SOURCES" in prompt
+        assert "VALIDATION TASK" in prompt
+        assert "RESPONSE FORMAT" in prompt
+        assert "overall_quality_score" in prompt
+        assert "section_improvements" in prompt
+    
+    def test_ai_validation_response_parsing(self):
+        """Test AI validation response parsing."""
+        # Mock AI response
+        mock_response = """
+        Here's my analysis of the source-to-section mapping:
+
+        ```json
+        {
+            "overall_quality_score": 8,
+            "section_improvements": [
+                {
+                    "section_id": "s1",
+                    "current_sources": ["AI Trends in 2025: Machine Learning Revolution"],
+                    "recommended_sources": ["AI Trends in 2025: Machine Learning Revolution", "Machine Learning Algorithms Explained"],
+                    "reasoning": "Adding ML algorithms source provides more technical depth",
+                    "confidence": 0.9
+                }
+            ],
+            "summary": "Good mapping overall, minor improvements suggested"
+        }
+        ```
+        """
+        
+        original_mapping = self.mapper._algorithmic_source_mapping(self.sample_sections, self.sample_research)
+        
+        parsed_mapping = self.mapper._parse_validation_response(mock_response, original_mapping, self.sample_research)
+        
+        # Should have improved mapping
+        assert "s1" in parsed_mapping
+        assert len(parsed_mapping["s1"]) > 0
+        
+        # Should maintain other sections
+        assert len(parsed_mapping) == len(original_mapping)
+    
+    def test_ai_validation_fallback_handling(self):
+        """Test AI validation fallback when parsing fails."""
+        # Mock invalid AI response
+        invalid_response = "This is not a valid JSON response"
+        
+        original_mapping = self.mapper._algorithmic_source_mapping(self.sample_sections, self.sample_research)
+        
+        parsed_mapping = self.mapper._parse_validation_response(invalid_response, original_mapping, self.sample_research)
+        
+        # Should fallback to original mapping
+        assert parsed_mapping == original_mapping
+    
+    def test_ai_validation_with_missing_sources(self):
+        """Test AI validation when recommended sources don't exist."""
+        # Mock AI response with non-existent source
+        mock_response = """
+        ```json
+        {
+            "overall_quality_score": 7,
+            "section_improvements": [
+                {
+                    "section_id": "s1",
+                    "current_sources": ["AI Trends in 2025: Machine Learning Revolution"],
+                    "recommended_sources": ["Non-existent Source", "Another Fake Source"],
+                    "reasoning": "These sources would be better",
+                    "confidence": 0.8
+                }
+            ],
+            "summary": "Suggested improvements"
+        }
+        ```
+        """
+        
+        original_mapping = self.mapper._algorithmic_source_mapping(self.sample_sections, self.sample_research)
+        
+        parsed_mapping = self.mapper._parse_validation_response(mock_response, original_mapping, self.sample_research)
+        
+        # Should fallback to original mapping for s1 since no valid sources found
+        assert parsed_mapping["s1"] == original_mapping["s1"]
+    
+    def test_ai_validation_integration(self):
+        """Test complete AI validation integration (with mocked LLM)."""
+        # This test would require mocking the LLM provider
+        # For now, we'll test that the method doesn't crash
+        mapping_results = self.mapper._algorithmic_source_mapping(self.sample_sections, self.sample_research)
+        
+        # Test that AI validation method exists and can be called
+        # (In real implementation, this would call the actual LLM)
+        try:
+            # This will fail in test environment due to no LLM, but should not crash
+            validated_mapping = self.mapper._ai_validate_mapping(mapping_results, self.sample_research)
+            # If it doesn't crash, it should return the original mapping as fallback
+            assert validated_mapping == mapping_results
+        except Exception as e:
+            # Expected to fail in test environment, but should be handled gracefully
+            assert "AI validation failed" in str(e) or "Failed to get AI validation response" in str(e)
+    
+    def test_format_sections_for_prompt(self):
+        """Test formatting of sections for AI prompt."""
+        sections_info = [
+            {
+                'id': 's1',
+                'sources': [
+                    {
+                        'title': 'Test Source 1',
+                        'algorithmic_score': 0.85
+                    }
+                ]
+            }
+        ]
+        
+        formatted = self.mapper._format_sections_for_prompt(sections_info)
+        
+        assert "Section s1:" in formatted
+        assert "Test Source 1" in formatted
+        assert "0.85" in formatted
+    
+    def test_format_sources_for_prompt(self):
+        """Test formatting of sources for AI prompt."""
+        sources = [
+            {
+                'title': 'Test Source',
+                'url': 'https://example.com',
+                'credibility_score': 0.9,
+                'excerpt': 'This is a test excerpt for the source.'
+            }
+        ]
+        
+        formatted = self.mapper._format_sources_for_prompt(sources)
+        
+        assert "Test Source" in formatted
+        assert "https://example.com" in formatted
+        assert "0.9" in formatted
+        assert "This is a test excerpt" in formatted
+
+
+if __name__ == '__main__':
+    pytest.main([__file__])