Files
ALwrity/docs/persona/PERSONA_SYSTEM_DOCUMENTATION.md
2025-09-05 15:22:43 +05:30

11 KiB

Writing Persona System Documentation

Overview

The Writing Persona System is an advanced AI-powered feature that analyzes user onboarding data to create highly specific, platform-optimized writing personas. These personas serve as "unbreakable, high-fidelity persona replication engines" that ensure consistent brand voice across all content creation.

System Architecture

Database Schema

The persona system uses four main database tables:

1. writing_personas (Core Persona Table)

  • Purpose: Stores the main persona profile derived from onboarding analysis
  • Key Fields:
    • persona_name: Human-readable persona name (e.g., "Professional Tech Voice")
    • archetype: Persona archetype (e.g., "The Pragmatic Futurist")
    • core_belief: Central philosophy driving the writing style
    • linguistic_fingerprint: Quantitative linguistic analysis (JSON)
    • onboarding_session_id: Links to source onboarding data

2. platform_personas (Platform Adaptations)

  • Purpose: Stores platform-specific adaptations of the core persona
  • Key Fields:
    • platform_type: Target platform (twitter, linkedin, instagram, etc.)
    • sentence_metrics: Platform-optimized sentence structure
    • lexical_features: Platform-specific vocabulary and hashtags
    • content_format_rules: Character limits, formatting guidelines
    • engagement_patterns: Optimal posting frequency and timing

3. persona_analysis_results (AI Analysis Tracking)

  • Purpose: Stores the AI analysis process and results
  • Key Fields:
    • analysis_prompt: The prompt used for persona generation
    • linguistic_analysis: Detailed linguistic fingerprint
    • platform_recommendations: AI recommendations for each platform
    • confidence_score: AI confidence in the analysis

4. persona_validation_results (Quality Assurance)

  • Purpose: Stores validation metrics and improvement feedback
  • Key Fields:
    • stylometric_accuracy: How well persona matches original style
    • consistency_score: Consistency across generated content
    • platform_compliance: Platform optimization effectiveness

AI Analysis Pipeline

Phase 1: Onboarding Data Collection

The system extracts data from the 6-step onboarding process:

  1. Step 1 - API Keys: Determines available AI providers

  2. Step 2 - Website Analysis: Core style analysis data

    • Writing style (tone, voice, complexity)
    • Content characteristics (sentence structure, vocabulary)
    • Target audience (demographics, expertise level)
    • Style patterns (common phrases, rhetorical devices)
  3. Step 3 - Research Preferences: Content type preferences

  4. Step 4 - Personalization: Additional style preferences

  5. Step 5 - Integrations: Platform preferences

  6. Step 6 - Final: Trigger persona generation

Phase 2: Core Persona Generation

Uses Gemini structured responses to analyze collected data:

{
  "identity": {
    "persona_name": "Generated from analysis",
    "archetype": "The [Adjective] [Role]",
    "core_belief": "Central philosophy",
    "brand_voice_description": "Detailed description"
  },
  "linguistic_fingerprint": {
    "sentence_metrics": {
      "average_sentence_length_words": 14.2,
      "preferred_sentence_type": "simple_and_compound",
      "active_to_passive_ratio": "90:10"
    },
    "lexical_features": {
      "go_to_words": ["leverage", "unlock", "framework"],
      "go_to_phrases": ["Let's get into it", "Here's the thing"],
      "avoid_words": ["utilize", "synergize"],
      "contractions": "required",
      "vocabulary_level": "professional"
    },
    "rhetorical_devices": {
      "metaphors": "common_tech_mechanics",
      "analogies": "everyday_to_tech",
      "rhetorical_questions": "for_engagement"
    }
  },
  "tonal_range": {
    "default_tone": "informed_casual",
    "permissible_tones": ["emphatic", "optimistic"],
    "forbidden_tones": ["academic", "salesy"]
  }
}

Phase 3: Platform Adaptations

Generates platform-specific optimizations:

  • Twitter: Character limits, hashtag strategy, engagement tactics
  • LinkedIn: Professional tone, long-form capability, networking focus
  • Instagram: Visual-first approach, emoji usage, story optimization
  • Blog: SEO optimization, header structure, readability scores
  • Medium: Storytelling focus, publication strategy, engagement optimization
  • Substack: Newsletter format, subscription focus, email optimization

API Endpoints

Core Endpoints

POST /api/personas/generate

Generates a new writing persona from onboarding data.

Request:

{
  "onboarding_session_id": 1,
  "force_regenerate": false
}

Response:

{
  "success": true,
  "persona_id": 123,
  "confidence_score": 85.5,
  "data_sufficiency": 78.0,
  "platforms_generated": ["twitter", "linkedin", "blog"]
}

GET /api/personas/user/{user_id}

Gets all personas for a user.

GET /api/personas/{persona_id}/platform/{platform}

Gets platform-specific persona adaptation.

GET /api/personas/preview/{user_id}

Generates a preview without saving to database.

Integration Endpoints

GET /api/onboarding/persona-readiness

Checks if sufficient onboarding data exists for persona generation.

POST /api/onboarding/generate-persona

Generates persona as part of onboarding completion.

Gemini Structured Response Implementation

Core Persona Analysis Prompt

The system uses a comprehensive prompt that analyzes:

  1. Website Analysis Data: Extracted writing patterns, style characteristics
  2. Research Preferences: Content type preferences, research depth
  3. Target Audience: Demographics, expertise level, industry focus

Structured Schema Design

The Gemini responses follow strict JSON schemas that ensure:

  • Quantitative Analysis: Measurable writing characteristics
  • Platform Optimization: Specific adaptations for each platform
  • Actionable Guidelines: Concrete rules for content generation
  • Quality Metrics: Confidence scores and validation data

Example Gemini Prompt Structure

PERSONA GENERATION TASK: Create a comprehensive writing persona based on user onboarding data.

ONBOARDING DATA ANALYSIS:
[Detailed website analysis, research preferences, and style data]

PERSONA GENERATION REQUIREMENTS:
1. IDENTITY CREATION: Create memorable persona name and archetype
2. LINGUISTIC FINGERPRINT: Quantitative analysis of writing patterns
3. RHETORICAL ANALYSIS: Metaphor patterns, storytelling approach
4. TONAL RANGE: Default tone and permissible variations
5. STYLISTIC CONSTRAINTS: Punctuation, formatting preferences

Generate a comprehensive persona profile that can replicate this writing style across platforms.

Platform-Specific Optimizations

Twitter/X Optimization

  • Character Limit: 280 characters
  • Optimal Length: 120-150 characters
  • Hashtag Strategy: Maximum 3 hashtags
  • Engagement: Thread support, retweet optimization

LinkedIn Optimization

  • Character Limit: 3000 characters
  • Optimal Length: 150-300 words
  • Professional Tone: Maintained throughout
  • Features: Rich media support, long-form content

Blog Optimization

  • Word Count: 800-2000 words
  • SEO Focus: Header structure, meta descriptions
  • Readability: Optimized for target audience expertise level
  • Internal Linking: Strategic link placement

Instagram Optimization

  • Caption Limit: 2200 characters
  • Optimal Length: 125-150 words
  • Visual Focus: Caption complements imagery
  • Hashtag Strategy: Up to 30 hashtags, strategic placement

Data Flow

Onboarding Steps 1-6 → Data Collection → Gemini Analysis → Core Persona → Platform Adaptations → Database Storage

Data Sources

  1. Website Analysis (Step 2):

    • Writing style analysis
    • Content characteristics
    • Target audience identification
    • Style pattern recognition
  2. Research Preferences (Step 3):

    • Content type preferences
    • Research depth settings
    • Factual content requirements
  3. Personalization Settings (Step 4):

    • Brand voice preferences
    • Tone specifications
    • Style customizations

Quality Assurance

Data Sufficiency Scoring

  • Website Analysis: 70% of score
    • Writing style: 25%
    • Content characteristics: 20%
    • Target audience: 15%
    • Style patterns: 10%
  • Research Preferences: 30% of score
    • Research depth: 10%
    • Content types: 10%
    • Writing style data: 10%

Confidence Scoring

  • AI-generated confidence based on data quality
  • Minimum 50% data sufficiency required for generation
  • Platform-specific confidence scores

Usage Examples

1. Generate Persona During Onboarding

# Automatically triggered during onboarding completion
persona_service = PersonaAnalysisService()
result = persona_service.generate_persona_from_onboarding(user_id=1)

2. Get Platform-Specific Persona

# Get LinkedIn-optimized persona
platform_persona = persona_service.get_persona_for_platform(user_id=1, platform="linkedin")

3. Generate Content with Persona

# Use persona for content generation
persona = get_persona_for_platform(user_id, "twitter")
content = generate_content_with_persona(prompt, persona)

Implementation Notes

Gemini Integration

  • Uses gemini-2.5-flash model for optimal performance
  • Low temperature (0.2) for consistent analysis
  • High token limit (8192) for comprehensive output
  • Structured JSON schema validation

Error Handling

  • Graceful degradation when data is insufficient
  • Fallback to default personas when generation fails
  • Comprehensive logging for debugging

Performance Considerations

  • Persona generation is asynchronous
  • Results cached in database for fast retrieval
  • Platform adaptations generated in parallel

Future Enhancements

  1. Validation System: Automated testing of generated content against persona
  2. Learning System: Persona refinement based on content performance
  3. Multi-User Support: User-specific persona management
  4. Advanced Analytics: Persona effectiveness tracking
  5. Content Templates: Platform-specific content templates using personas

Troubleshooting

Common Issues

  1. Insufficient Onboarding Data

    • Solution: Ensure steps 2 and 3 are completed with quality data
    • Check: Data sufficiency score > 50%
  2. Gemini API Errors

    • Solution: Verify API key configuration
    • Check: Network connectivity and rate limits
  3. Platform Adaptation Failures

    • Solution: Check platform-specific constraints
    • Check: Schema validation and token limits

Debugging

  1. Enable Debug Logging: Set log level to DEBUG
  2. Check Database: Verify table creation and data integrity
  3. Test API: Use test script to validate functionality
  4. Monitor Performance: Track generation times and success rates