11 KiB
Writing Persona System Documentation
Overview
The Writing Persona System is an advanced AI-powered feature that analyzes user onboarding data to create highly specific, platform-optimized writing personas. These personas serve as "unbreakable, high-fidelity persona replication engines" that ensure consistent brand voice across all content creation.
System Architecture
Database Schema
The persona system uses four main database tables:
1. writing_personas (Core Persona Table)
- Purpose: Stores the main persona profile derived from onboarding analysis
- Key Fields:
persona_name: Human-readable persona name (e.g., "Professional Tech Voice")archetype: Persona archetype (e.g., "The Pragmatic Futurist")core_belief: Central philosophy driving the writing stylelinguistic_fingerprint: Quantitative linguistic analysis (JSON)onboarding_session_id: Links to source onboarding data
2. platform_personas (Platform Adaptations)
- Purpose: Stores platform-specific adaptations of the core persona
- Key Fields:
platform_type: Target platform (twitter, linkedin, instagram, etc.)sentence_metrics: Platform-optimized sentence structurelexical_features: Platform-specific vocabulary and hashtagscontent_format_rules: Character limits, formatting guidelinesengagement_patterns: Optimal posting frequency and timing
3. persona_analysis_results (AI Analysis Tracking)
- Purpose: Stores the AI analysis process and results
- Key Fields:
analysis_prompt: The prompt used for persona generationlinguistic_analysis: Detailed linguistic fingerprintplatform_recommendations: AI recommendations for each platformconfidence_score: AI confidence in the analysis
4. persona_validation_results (Quality Assurance)
- Purpose: Stores validation metrics and improvement feedback
- Key Fields:
stylometric_accuracy: How well persona matches original styleconsistency_score: Consistency across generated contentplatform_compliance: Platform optimization effectiveness
AI Analysis Pipeline
Phase 1: Onboarding Data Collection
The system extracts data from the 6-step onboarding process:
-
Step 1 - API Keys: Determines available AI providers
-
Step 2 - Website Analysis: Core style analysis data
- Writing style (tone, voice, complexity)
- Content characteristics (sentence structure, vocabulary)
- Target audience (demographics, expertise level)
- Style patterns (common phrases, rhetorical devices)
-
Step 3 - Research Preferences: Content type preferences
-
Step 4 - Personalization: Additional style preferences
-
Step 5 - Integrations: Platform preferences
-
Step 6 - Final: Trigger persona generation
Phase 2: Core Persona Generation
Uses Gemini structured responses to analyze collected data:
{
"identity": {
"persona_name": "Generated from analysis",
"archetype": "The [Adjective] [Role]",
"core_belief": "Central philosophy",
"brand_voice_description": "Detailed description"
},
"linguistic_fingerprint": {
"sentence_metrics": {
"average_sentence_length_words": 14.2,
"preferred_sentence_type": "simple_and_compound",
"active_to_passive_ratio": "90:10"
},
"lexical_features": {
"go_to_words": ["leverage", "unlock", "framework"],
"go_to_phrases": ["Let's get into it", "Here's the thing"],
"avoid_words": ["utilize", "synergize"],
"contractions": "required",
"vocabulary_level": "professional"
},
"rhetorical_devices": {
"metaphors": "common_tech_mechanics",
"analogies": "everyday_to_tech",
"rhetorical_questions": "for_engagement"
}
},
"tonal_range": {
"default_tone": "informed_casual",
"permissible_tones": ["emphatic", "optimistic"],
"forbidden_tones": ["academic", "salesy"]
}
}
Phase 3: Platform Adaptations
Generates platform-specific optimizations:
- Twitter: Character limits, hashtag strategy, engagement tactics
- LinkedIn: Professional tone, long-form capability, networking focus
- Instagram: Visual-first approach, emoji usage, story optimization
- Blog: SEO optimization, header structure, readability scores
- Medium: Storytelling focus, publication strategy, engagement optimization
- Substack: Newsletter format, subscription focus, email optimization
API Endpoints
Core Endpoints
POST /api/personas/generate
Generates a new writing persona from onboarding data.
Request:
{
"onboarding_session_id": 1,
"force_regenerate": false
}
Response:
{
"success": true,
"persona_id": 123,
"confidence_score": 85.5,
"data_sufficiency": 78.0,
"platforms_generated": ["twitter", "linkedin", "blog"]
}
GET /api/personas/user/{user_id}
Gets all personas for a user.
GET /api/personas/{persona_id}/platform/{platform}
Gets platform-specific persona adaptation.
GET /api/personas/preview/{user_id}
Generates a preview without saving to database.
Integration Endpoints
GET /api/onboarding/persona-readiness
Checks if sufficient onboarding data exists for persona generation.
POST /api/onboarding/generate-persona
Generates persona as part of onboarding completion.
Gemini Structured Response Implementation
Core Persona Analysis Prompt
The system uses a comprehensive prompt that analyzes:
- Website Analysis Data: Extracted writing patterns, style characteristics
- Research Preferences: Content type preferences, research depth
- Target Audience: Demographics, expertise level, industry focus
Structured Schema Design
The Gemini responses follow strict JSON schemas that ensure:
- Quantitative Analysis: Measurable writing characteristics
- Platform Optimization: Specific adaptations for each platform
- Actionable Guidelines: Concrete rules for content generation
- Quality Metrics: Confidence scores and validation data
Example Gemini Prompt Structure
PERSONA GENERATION TASK: Create a comprehensive writing persona based on user onboarding data.
ONBOARDING DATA ANALYSIS:
[Detailed website analysis, research preferences, and style data]
PERSONA GENERATION REQUIREMENTS:
1. IDENTITY CREATION: Create memorable persona name and archetype
2. LINGUISTIC FINGERPRINT: Quantitative analysis of writing patterns
3. RHETORICAL ANALYSIS: Metaphor patterns, storytelling approach
4. TONAL RANGE: Default tone and permissible variations
5. STYLISTIC CONSTRAINTS: Punctuation, formatting preferences
Generate a comprehensive persona profile that can replicate this writing style across platforms.
Platform-Specific Optimizations
Twitter/X Optimization
- Character Limit: 280 characters
- Optimal Length: 120-150 characters
- Hashtag Strategy: Maximum 3 hashtags
- Engagement: Thread support, retweet optimization
LinkedIn Optimization
- Character Limit: 3000 characters
- Optimal Length: 150-300 words
- Professional Tone: Maintained throughout
- Features: Rich media support, long-form content
Blog Optimization
- Word Count: 800-2000 words
- SEO Focus: Header structure, meta descriptions
- Readability: Optimized for target audience expertise level
- Internal Linking: Strategic link placement
Instagram Optimization
- Caption Limit: 2200 characters
- Optimal Length: 125-150 words
- Visual Focus: Caption complements imagery
- Hashtag Strategy: Up to 30 hashtags, strategic placement
Data Flow
Onboarding Steps 1-6 → Data Collection → Gemini Analysis → Core Persona → Platform Adaptations → Database Storage
Data Sources
-
Website Analysis (Step 2):
- Writing style analysis
- Content characteristics
- Target audience identification
- Style pattern recognition
-
Research Preferences (Step 3):
- Content type preferences
- Research depth settings
- Factual content requirements
-
Personalization Settings (Step 4):
- Brand voice preferences
- Tone specifications
- Style customizations
Quality Assurance
Data Sufficiency Scoring
- Website Analysis: 70% of score
- Writing style: 25%
- Content characteristics: 20%
- Target audience: 15%
- Style patterns: 10%
- Research Preferences: 30% of score
- Research depth: 10%
- Content types: 10%
- Writing style data: 10%
Confidence Scoring
- AI-generated confidence based on data quality
- Minimum 50% data sufficiency required for generation
- Platform-specific confidence scores
Usage Examples
1. Generate Persona During Onboarding
# Automatically triggered during onboarding completion
persona_service = PersonaAnalysisService()
result = persona_service.generate_persona_from_onboarding(user_id=1)
2. Get Platform-Specific Persona
# Get LinkedIn-optimized persona
platform_persona = persona_service.get_persona_for_platform(user_id=1, platform="linkedin")
3. Generate Content with Persona
# Use persona for content generation
persona = get_persona_for_platform(user_id, "twitter")
content = generate_content_with_persona(prompt, persona)
Implementation Notes
Gemini Integration
- Uses
gemini-2.5-flashmodel for optimal performance - Low temperature (0.2) for consistent analysis
- High token limit (8192) for comprehensive output
- Structured JSON schema validation
Error Handling
- Graceful degradation when data is insufficient
- Fallback to default personas when generation fails
- Comprehensive logging for debugging
Performance Considerations
- Persona generation is asynchronous
- Results cached in database for fast retrieval
- Platform adaptations generated in parallel
Future Enhancements
- Validation System: Automated testing of generated content against persona
- Learning System: Persona refinement based on content performance
- Multi-User Support: User-specific persona management
- Advanced Analytics: Persona effectiveness tracking
- Content Templates: Platform-specific content templates using personas
Troubleshooting
Common Issues
-
Insufficient Onboarding Data
- Solution: Ensure steps 2 and 3 are completed with quality data
- Check: Data sufficiency score > 50%
-
Gemini API Errors
- Solution: Verify API key configuration
- Check: Network connectivity and rate limits
-
Platform Adaptation Failures
- Solution: Check platform-specific constraints
- Check: Schema validation and token limits
Debugging
- Enable Debug Logging: Set log level to DEBUG
- Check Database: Verify table creation and data integrity
- Test API: Use test script to validate functionality
- Monitor Performance: Track generation times and success rates