Files
ALwrity/backend/services/calendar_generation_datasource_framework/data_processing

Data Processing Modules for 12-Step Calendar Generation

📋 Overview

This directory contains the data processing modules that provide real data exclusively to the 12-step calendar generation process. These modules connect to actual services and databases to retrieve comprehensive user data, strategy information, and analysis results.

NO MOCK DATA - Only real data sources allowed.

🎯 12-Step Calendar Generation Data Flow

Phase 1: Foundation (Steps 1-3)

Step 1: Content Strategy Analysis

Data Processing Module: strategy_data.py
Function: StrategyDataProcessor.get_strategy_data(strategy_id)
Real Data Sources:

  • ContentPlanningDBService.get_content_strategy(strategy_id) - Real strategy data from database
  • EnhancedStrategyDBService.get_enhanced_strategy(strategy_id) - Real enhanced strategy fields
  • StrategyQualityAssessor.analyze_strategy_completeness() - Real strategy analysis

Expected Data Points (from prompt chaining document):

  • Content pillars and target audience preferences
  • Business goals and success metrics
  • Market positioning and competitive landscape
  • KPI mapping and alignment validation
  • Brand voice and editorial guidelines

File: backend/services/calendar_generation_datasource_framework/prompt_chaining/steps/phase1/phase1_steps.py
Class: ContentStrategyAnalysisStep

Step 2: Gap Analysis and Opportunity Identification

Data Processing Module: gap_analysis_data.py
Function: GapAnalysisDataProcessor.get_gap_analysis_data(user_id)
Real Data Sources:

  • ContentPlanningDBService.get_user_content_gap_analyses(user_id) - Real gap analysis results
  • ContentGapAnalyzer.analyze_content_gaps() - Real content gap analysis
  • CompetitorAnalyzer.analyze_competitors() - Real competitor insights

Expected Data Points (from prompt chaining document):

  • Prioritized content gaps with impact scores
  • High-value keyword opportunities
  • Competitor differentiation strategies
  • Opportunity implementation timeline
  • Keyword distribution and uniqueness validation

File: backend/services/calendar_generation_datasource_framework/prompt_chaining/steps/phase1/phase1_steps.py
Class: GapAnalysisStep

Step 3: Audience and Platform Strategy

Data Processing Module: comprehensive_user_data.py
Function: ComprehensiveUserDataProcessor.get_comprehensive_user_data(user_id, strategy_id)
Real Data Sources:

  • OnboardingDataService.get_personalized_ai_inputs(user_id) - Real onboarding data
  • ActiveStrategyService.get_active_strategy(user_id) - Real active strategy
  • AIAnalyticsService.generate_strategic_intelligence(strategy_id) - Real AI analysis

Expected Data Points (from prompt chaining document):

  • Audience personas and preferences
  • Platform performance analysis
  • Content mix recommendations
  • Optimal timing strategies
  • Enterprise-level strategy validation

File: backend/services/calendar_generation_datasource_framework/prompt_chaining/steps/phase1/phase1_steps.py
Class: AudiencePlatformStrategyStep

Phase 2: Structure (Steps 4-6)

Step 4: Calendar Framework and Timeline

Data Processing Module: comprehensive_user_data.py
Function: ComprehensiveUserDataProcessor.get_comprehensive_user_data(user_id, strategy_id)
Real Data Sources:

  • Phase 1 outputs (real strategy analysis, gap analysis, audience strategy)
  • strategy_data from comprehensive user data
  • gap_analysis from comprehensive user data

Expected Data Points (from prompt chaining document):

  • Calendar framework and timeline
  • Content frequency and distribution
  • Theme structure and focus areas
  • Timeline optimization recommendations
  • Duration accuracy validation

File: backend/services/calendar_generation_datasource_framework/prompt_chaining/steps/phase2/step4_implementation.py
Class: CalendarFrameworkStep

Step 5: Content Pillar Distribution

Data Processing Module: strategy_data.py
Function: StrategyDataProcessor.get_strategy_data(strategy_id)
Real Data Sources:

  • strategy_data.content_pillars from comprehensive user data
  • strategy_analysis from enhanced strategy data
  • Phase 1 outputs (real strategy analysis)

Expected Data Points (from prompt chaining document):

  • Content pillar distribution plan
  • Theme variations and content types
  • Engagement level balancing
  • Strategic alignment validation
  • Content diversity and uniqueness validation

File: backend/services/calendar_generation_datasource_framework/prompt_chaining/steps/phase2/step5_implementation.py
Class: ContentPillarDistributionStep

Step 6: Platform-Specific Strategy

Data Processing Module: comprehensive_user_data.py
Function: ComprehensiveUserDataProcessor.get_comprehensive_user_data(user_id, strategy_id)
Real Data Sources:

  • onboarding_data from comprehensive user data
  • performance_data from comprehensive user data
  • competitor_analysis from comprehensive user data

Expected Data Points (from prompt chaining document):

  • Platform-specific content strategies
  • Content adaptation guidelines
  • Platform timing optimization
  • Cross-platform coordination plan
  • Platform uniqueness validation

File: backend/services/calendar_generation_datasource_framework/prompt_chaining/steps/phase2/step6_implementation.py
Class: PlatformSpecificStrategyStep

Phase 3: Content (Steps 7-9)

Step 7: Weekly Theme Development

Data Processing Module: comprehensive_user_data.py
Function: ComprehensiveUserDataProcessor.get_comprehensive_user_data(user_id, strategy_id)
Real Data Sources:

  • Phase 2 outputs (real calendar framework, content pillars)
  • gap_analysis from comprehensive user data
  • strategy_data from comprehensive user data

Expected Data Points (from prompt chaining document):

  • Weekly theme structure
  • Content opportunity integration
  • Strategic alignment validation
  • Engagement level planning
  • Theme uniqueness and progression validation

File: backend/services/calendar_generation_datasource_framework/prompt_chaining/steps/phase3/step7_implementation.py
Class: WeeklyThemeDevelopmentStep

Step 8: Daily Content Planning

Data Processing Module: comprehensive_user_data.py
Function: ComprehensiveUserDataProcessor.get_comprehensive_user_data(user_id, strategy_id)
Real Data Sources:

  • Phase 3 outputs (real weekly themes)
  • performance_data from comprehensive user data
  • keyword_analysis from comprehensive user data

Expected Data Points (from prompt chaining document):

  • Daily content schedule
  • Timing optimization
  • Keyword integration plan
  • Content variety strategy
  • Content uniqueness and keyword distribution validation

File: backend/services/calendar_generation_datasource_framework/prompt_chaining/steps/phase3/step8_implementation.py
Class: DailyContentPlanningStep

Step 9: Content Recommendations

Data Processing Module: comprehensive_user_data.py
Function: ComprehensiveUserDataProcessor.get_comprehensive_user_data(user_id, strategy_id)
Real Data Sources:

  • recommendations_data from comprehensive user data
  • gap_analysis from comprehensive user data
  • strategy_data from comprehensive user data

Expected Data Points (from prompt chaining document):

  • Specific content recommendations
  • Gap-filling content ideas
  • Implementation guidance
  • Quality assurance metrics
  • Enterprise-level content validation

File: backend/services/calendar_generation_datasource_framework/prompt_chaining/steps/phase3/step9_implementation.py
Class: ContentRecommendationsStep

Phase 4: Optimization (Steps 10-12)

Step 10: Performance Optimization

Data Processing Module: comprehensive_user_data.py
Function: ComprehensiveUserDataProcessor.get_comprehensive_user_data(user_id, strategy_id)
Real Data Sources:

  • All previous phase outputs
  • performance_data from comprehensive user data
  • ai_analysis_results from comprehensive user data

Expected Data Points (from prompt chaining document):

  • Performance optimization recommendations
  • Quality improvement suggestions
  • Strategic alignment validation
  • Performance metric validation
  • KPI achievement and ROI validation

File: backend/services/calendar_generation_datasource_framework/prompt_chaining/steps/phase4/step10_implementation.py
Class: PerformanceOptimizationStep

Step 11: Strategy Alignment Validation

Data Processing Module: strategy_data.py
Function: StrategyDataProcessor.get_strategy_data(strategy_id)
Real Data Sources:

  • All previous phase outputs
  • strategy_data from comprehensive user data
  • strategy_analysis from enhanced strategy data

Expected Data Points (from prompt chaining document):

  • Strategy alignment validation
  • Goal achievement assessment
  • Content pillar verification
  • Audience targeting confirmation
  • Strategic objective achievement validation

File: backend/services/calendar_generation_datasource_framework/prompt_chaining/steps/phase4/step11_implementation.py
Class: StrategyAlignmentValidationStep

Step 12: Final Calendar Assembly

Data Processing Module: comprehensive_user_data.py
Function: ComprehensiveUserDataProcessor.get_comprehensive_user_data(user_id, strategy_id)
Real Data Sources:

  • All previous phase outputs
  • Complete comprehensive user data
  • All data sources summary

Expected Data Points (from prompt chaining document):

  • Complete content calendar
  • Quality assurance report
  • Data utilization summary
  • Final recommendations and insights
  • Enterprise-level quality validation

File: backend/services/calendar_generation_datasource_framework/prompt_chaining/steps/phase4/step12_implementation.py
Class: FinalCalendarAssemblyStep

📊 Data Processing Modules Details

1. comprehensive_user_data.py

Purpose: Central data aggregator for all real user data
Main Function: get_comprehensive_user_data(user_id, strategy_id)
Real Data Sources:

  • OnboardingDataService.get_personalized_ai_inputs(user_id) - Real onboarding data
  • AIAnalyticsService.generate_strategic_intelligence(strategy_id) - Real AI analysis
  • AIEngineService.generate_content_recommendations(onboarding_data) - Real AI recommendations
  • ActiveStrategyService.get_active_strategy(user_id) - Real active strategy

Data Structure:

{
    "user_id": user_id,
    "onboarding_data": onboarding_data,  # Real onboarding data
    "ai_analysis_results": ai_analysis_results,  # Real AI analysis
    "gap_analysis": {
        "content_gaps": gap_analysis_data,  # Real gap analysis
        "keyword_opportunities": onboarding_data.get("keyword_analysis", {}).get("high_value_keywords", []),
        "competitor_insights": onboarding_data.get("competitor_analysis", {}).get("top_performers", []),
        "recommendations": gap_analysis_data,
        "opportunities": onboarding_data.get("gap_analysis", {}).get("content_opportunities", [])
    },
    "strategy_data": strategy_data,  # Real strategy data
    "recommendations_data": recommendations_data,
    "performance_data": performance_data,
    "industry": strategy_data.get("industry") or onboarding_data.get("website_analysis", {}).get("industry_focus", "technology"),
    "target_audience": strategy_data.get("target_audience") or onboarding_data.get("website_analysis", {}).get("target_audience", []),
    "business_goals": strategy_data.get("business_objectives") or ["Increase brand awareness", "Generate leads", "Establish thought leadership"],
    "website_analysis": onboarding_data.get("website_analysis", {}),
    "competitor_analysis": onboarding_data.get("competitor_analysis", {}),
    "keyword_analysis": onboarding_data.get("keyword_analysis", {}),
    "strategy_analysis": strategy_data.get("strategy_analysis", {}),
    "quality_indicators": strategy_data.get("quality_indicators", {})
}

2. strategy_data.py

Purpose: Process and enhance real strategy data
Main Function: get_strategy_data(strategy_id)
Real Data Sources:

  • ContentPlanningDBService.get_content_strategy(strategy_id) - Real database strategy
  • EnhancedStrategyDBService.get_enhanced_strategy(strategy_id) - Real enhanced strategy
  • StrategyQualityAssessor.analyze_strategy_completeness() - Real quality assessment

Data Structure:

{
    "strategy_id": strategy_dict.get("id"),
    "strategy_name": strategy_dict.get("name"),
    "industry": strategy_dict.get("industry", "technology"),
    "target_audience": strategy_dict.get("target_audience", {}),
    "content_pillars": strategy_dict.get("content_pillars", []),
    "ai_recommendations": strategy_dict.get("ai_recommendations", {}),
    "strategy_analysis": await quality_assessor.analyze_strategy_completeness(strategy_dict, enhanced_strategy_data),
    "quality_indicators": await quality_assessor.calculate_strategy_quality_indicators(strategy_dict, enhanced_strategy_data),
    "data_completeness": await quality_assessor.calculate_data_completeness(strategy_dict, enhanced_strategy_data),
    "strategic_alignment": await quality_assessor.assess_strategic_alignment(strategy_dict, enhanced_strategy_data)
}

3. gap_analysis_data.py

Purpose: Process real gap analysis data
Main Function: get_gap_analysis_data(user_id)
Real Data Sources:

  • ContentPlanningDBService.get_user_content_gap_analyses(user_id) - Real database gap analysis

Data Structure:

{
    "content_gaps": latest_analysis.get("analysis_results", {}).get("content_gaps", []),
    "keyword_opportunities": latest_analysis.get("analysis_results", {}).get("keyword_opportunities", []),
    "competitor_insights": latest_analysis.get("analysis_results", {}).get("competitor_insights", []),
    "recommendations": latest_analysis.get("recommendations", []),
    "opportunities": latest_analysis.get("opportunities", [])
}

🔗 Integration Points

Orchestrator Integration

File: backend/services/calendar_generation_datasource_framework/prompt_chaining/orchestrator.py
Function: _get_comprehensive_user_data(user_id, strategy_id)
Usage:

# Line 35: Import
from calendar_generation_datasource_framework.data_processing import ComprehensiveUserDataProcessor

# Line 220+: Usage
user_data = await self.comprehensive_user_processor.get_comprehensive_user_data(user_id, strategy_id)

Step Integration

File: backend/services/calendar_generation_datasource_framework/prompt_chaining/steps/phase1/phase1_steps.py
Usage:

# Line 27-30: Imports
from calendar_generation_datasource_framework.data_processing import (
    ComprehensiveUserDataProcessor,
    StrategyDataProcessor,
    GapAnalysisDataProcessor
)

# Usage in steps
strategy_processor = StrategyDataProcessor()
processed_strategy = await strategy_processor.get_strategy_data(strategy_id)

Real Data Source Validation

Real Data Sources Confirmed

  • OnboardingDataService - Real onboarding data
  • AIAnalyticsService - Real AI analysis
  • AIEngineService - Real AI engine
  • ActiveStrategyService - Real active strategy
  • ContentPlanningDBService - Real database service
  • EnhancedStrategyDBService - Real enhanced strategy
  • StrategyQualityAssessor - Real quality assessment

No Mock Data Policy

  • No hardcoded mock data in data_processing modules
  • No fallback mock responses when services fail
  • No silent failures that mask real issues
  • All data comes from real services and databases
  • Proper error handling for missing data
  • Clear error messages when services are unavailable

🚀 Usage in 12-Step Process

Step Execution Flow

  1. Orchestrator calls ComprehensiveUserDataProcessor.get_comprehensive_user_data()
  2. Individual Steps receive real data through context from orchestrator
  3. Step-specific processors (StrategyDataProcessor, GapAnalysisDataProcessor) provide additional real data
  4. All data is real - no mock data used in the 12-step process

Data Flow by Phase

  • Phase 1: Uses ComprehensiveUserDataProcessor + StrategyDataProcessor + GapAnalysisDataProcessor
  • Phase 2: Uses Phase 1 outputs + ComprehensiveUserDataProcessor
  • Phase 3: Uses Phase 2 outputs + ComprehensiveUserDataProcessor
  • Phase 4: Uses all previous outputs + ComprehensiveUserDataProcessor

🛡️ Error Handling & Quality Assurance

Real Data Error Handling

  • Service Unavailable: Clear error messages with service name
  • Data Validation Failed: Specific field validation errors
  • Quality Gate Failed: Detailed quality score breakdown
  • No Silent Failures: All failures are explicit and traceable

Quality Validation

  • Data Completeness: All required fields present and valid
  • Service Availability: All required services responding
  • Data Quality: Real data meets quality thresholds
  • Strategic Alignment: Output aligns with business goals

📝 Notes

  • All data processing modules use real services - no mock data
  • Comprehensive error handling for missing or invalid data
  • Proper validation mechanisms that fail gracefully
  • Data validation ensures data quality and completeness
  • Integration with 12-step orchestrator is clean and efficient
  • Real data integrity maintained throughout the pipeline

Last Updated: January 2025
Status: Production Ready - Real Data Only
Quality: Enterprise Grade - No Mock Data