# Writing Persona System Documentation ## Overview The Writing Persona System is an advanced AI-powered feature that analyzes user onboarding data to create highly specific, platform-optimized writing personas. These personas serve as "unbreakable, high-fidelity persona replication engines" that ensure consistent brand voice across all content creation. ## System Architecture ### Database Schema The persona system uses four main database tables: #### 1. `writing_personas` (Core Persona Table) - **Purpose**: Stores the main persona profile derived from onboarding analysis - **Key Fields**: - `persona_name`: Human-readable persona name (e.g., "Professional Tech Voice") - `archetype`: Persona archetype (e.g., "The Pragmatic Futurist") - `core_belief`: Central philosophy driving the writing style - `linguistic_fingerprint`: Quantitative linguistic analysis (JSON) - `onboarding_session_id`: Links to source onboarding data #### 2. `platform_personas` (Platform Adaptations) - **Purpose**: Stores platform-specific adaptations of the core persona - **Key Fields**: - `platform_type`: Target platform (twitter, linkedin, instagram, etc.) - `sentence_metrics`: Platform-optimized sentence structure - `lexical_features`: Platform-specific vocabulary and hashtags - `content_format_rules`: Character limits, formatting guidelines - `engagement_patterns`: Optimal posting frequency and timing #### 3. `persona_analysis_results` (AI Analysis Tracking) - **Purpose**: Stores the AI analysis process and results - **Key Fields**: - `analysis_prompt`: The prompt used for persona generation - `linguistic_analysis`: Detailed linguistic fingerprint - `platform_recommendations`: AI recommendations for each platform - `confidence_score`: AI confidence in the analysis #### 4. `persona_validation_results` (Quality Assurance) - **Purpose**: Stores validation metrics and improvement feedback - **Key Fields**: - `stylometric_accuracy`: How well persona matches original style - `consistency_score`: Consistency across generated content - `platform_compliance`: Platform optimization effectiveness ### AI Analysis Pipeline #### Phase 1: Onboarding Data Collection The system extracts data from the 6-step onboarding process: 1. **Step 1 - API Keys**: Determines available AI providers 2. **Step 2 - Website Analysis**: Core style analysis data - Writing style (tone, voice, complexity) - Content characteristics (sentence structure, vocabulary) - Target audience (demographics, expertise level) - Style patterns (common phrases, rhetorical devices) 3. **Step 3 - Research Preferences**: Content type preferences 4. **Step 4 - Personalization**: Additional style preferences 5. **Step 5 - Integrations**: Platform preferences 6. **Step 6 - Final**: Trigger persona generation #### Phase 2: Core Persona Generation Uses Gemini structured responses to analyze collected data: ```json { "identity": { "persona_name": "Generated from analysis", "archetype": "The [Adjective] [Role]", "core_belief": "Central philosophy", "brand_voice_description": "Detailed description" }, "linguistic_fingerprint": { "sentence_metrics": { "average_sentence_length_words": 14.2, "preferred_sentence_type": "simple_and_compound", "active_to_passive_ratio": "90:10" }, "lexical_features": { "go_to_words": ["leverage", "unlock", "framework"], "go_to_phrases": ["Let's get into it", "Here's the thing"], "avoid_words": ["utilize", "synergize"], "contractions": "required", "vocabulary_level": "professional" }, "rhetorical_devices": { "metaphors": "common_tech_mechanics", "analogies": "everyday_to_tech", "rhetorical_questions": "for_engagement" } }, "tonal_range": { "default_tone": "informed_casual", "permissible_tones": ["emphatic", "optimistic"], "forbidden_tones": ["academic", "salesy"] } } ``` #### Phase 3: Platform Adaptations Generates platform-specific optimizations: - **Twitter**: Character limits, hashtag strategy, engagement tactics - **LinkedIn**: Professional tone, long-form capability, networking focus - **Instagram**: Visual-first approach, emoji usage, story optimization - **Blog**: SEO optimization, header structure, readability scores - **Medium**: Storytelling focus, publication strategy, engagement optimization - **Substack**: Newsletter format, subscription focus, email optimization ## API Endpoints ### Core Endpoints #### `POST /api/personas/generate` Generates a new writing persona from onboarding data. **Request**: ```json { "onboarding_session_id": 1, "force_regenerate": false } ``` **Response**: ```json { "success": true, "persona_id": 123, "confidence_score": 85.5, "data_sufficiency": 78.0, "platforms_generated": ["twitter", "linkedin", "blog"] } ``` #### `GET /api/personas/user/{user_id}` Gets all personas for a user. #### `GET /api/personas/{persona_id}/platform/{platform}` Gets platform-specific persona adaptation. #### `GET /api/personas/preview/{user_id}` Generates a preview without saving to database. ### Integration Endpoints #### `GET /api/onboarding/persona-readiness` Checks if sufficient onboarding data exists for persona generation. #### `POST /api/onboarding/generate-persona` Generates persona as part of onboarding completion. ## Gemini Structured Response Implementation ### Core Persona Analysis Prompt The system uses a comprehensive prompt that analyzes: 1. **Website Analysis Data**: Extracted writing patterns, style characteristics 2. **Research Preferences**: Content type preferences, research depth 3. **Target Audience**: Demographics, expertise level, industry focus ### Structured Schema Design The Gemini responses follow strict JSON schemas that ensure: - **Quantitative Analysis**: Measurable writing characteristics - **Platform Optimization**: Specific adaptations for each platform - **Actionable Guidelines**: Concrete rules for content generation - **Quality Metrics**: Confidence scores and validation data ### Example Gemini Prompt Structure ``` PERSONA GENERATION TASK: Create a comprehensive writing persona based on user onboarding data. ONBOARDING DATA ANALYSIS: [Detailed website analysis, research preferences, and style data] PERSONA GENERATION REQUIREMENTS: 1. IDENTITY CREATION: Create memorable persona name and archetype 2. LINGUISTIC FINGERPRINT: Quantitative analysis of writing patterns 3. RHETORICAL ANALYSIS: Metaphor patterns, storytelling approach 4. TONAL RANGE: Default tone and permissible variations 5. STYLISTIC CONSTRAINTS: Punctuation, formatting preferences Generate a comprehensive persona profile that can replicate this writing style across platforms. ``` ## Platform-Specific Optimizations ### Twitter/X Optimization - **Character Limit**: 280 characters - **Optimal Length**: 120-150 characters - **Hashtag Strategy**: Maximum 3 hashtags - **Engagement**: Thread support, retweet optimization ### LinkedIn Optimization - **Character Limit**: 3000 characters - **Optimal Length**: 150-300 words - **Professional Tone**: Maintained throughout - **Features**: Rich media support, long-form content ### Blog Optimization - **Word Count**: 800-2000 words - **SEO Focus**: Header structure, meta descriptions - **Readability**: Optimized for target audience expertise level - **Internal Linking**: Strategic link placement ### Instagram Optimization - **Caption Limit**: 2200 characters - **Optimal Length**: 125-150 words - **Visual Focus**: Caption complements imagery - **Hashtag Strategy**: Up to 30 hashtags, strategic placement ## Data Flow ``` Onboarding Steps 1-6 → Data Collection → Gemini Analysis → Core Persona → Platform Adaptations → Database Storage ``` ### Data Sources 1. **Website Analysis** (Step 2): - Writing style analysis - Content characteristics - Target audience identification - Style pattern recognition 2. **Research Preferences** (Step 3): - Content type preferences - Research depth settings - Factual content requirements 3. **Personalization Settings** (Step 4): - Brand voice preferences - Tone specifications - Style customizations ### Quality Assurance #### Data Sufficiency Scoring - **Website Analysis**: 70% of score - Writing style: 25% - Content characteristics: 20% - Target audience: 15% - Style patterns: 10% - **Research Preferences**: 30% of score - Research depth: 10% - Content types: 10% - Writing style data: 10% #### Confidence Scoring - AI-generated confidence based on data quality - Minimum 50% data sufficiency required for generation - Platform-specific confidence scores ## Usage Examples ### 1. Generate Persona During Onboarding ```python # Automatically triggered during onboarding completion persona_service = PersonaAnalysisService() result = persona_service.generate_persona_from_onboarding(user_id=1) ``` ### 2. Get Platform-Specific Persona ```python # Get LinkedIn-optimized persona platform_persona = persona_service.get_persona_for_platform(user_id=1, platform="linkedin") ``` ### 3. Generate Content with Persona ```python # Use persona for content generation persona = get_persona_for_platform(user_id, "twitter") content = generate_content_with_persona(prompt, persona) ``` ## Implementation Notes ### Gemini Integration - Uses `gemini-2.5-flash` model for optimal performance - Low temperature (0.2) for consistent analysis - High token limit (8192) for comprehensive output - Structured JSON schema validation ### Error Handling - Graceful degradation when data is insufficient - Fallback to default personas when generation fails - Comprehensive logging for debugging ### Performance Considerations - Persona generation is asynchronous - Results cached in database for fast retrieval - Platform adaptations generated in parallel ## Future Enhancements 1. **Validation System**: Automated testing of generated content against persona 2. **Learning System**: Persona refinement based on content performance 3. **Multi-User Support**: User-specific persona management 4. **Advanced Analytics**: Persona effectiveness tracking 5. **Content Templates**: Platform-specific content templates using personas ## Troubleshooting ### Common Issues 1. **Insufficient Onboarding Data** - **Solution**: Ensure steps 2 and 3 are completed with quality data - **Check**: Data sufficiency score > 50% 2. **Gemini API Errors** - **Solution**: Verify API key configuration - **Check**: Network connectivity and rate limits 3. **Platform Adaptation Failures** - **Solution**: Check platform-specific constraints - **Check**: Schema validation and token limits ### Debugging 1. **Enable Debug Logging**: Set log level to DEBUG 2. **Check Database**: Verify table creation and data integrity 3. **Test API**: Use test script to validate functionality 4. **Monitor Performance**: Track generation times and success rates