Files
ALwrity/docs/PERSONA_SYSTEM_DOCUMENTATION.md
2025-08-31 08:26:51 +00:00

328 lines
11 KiB
Markdown

# Writing Persona System Documentation
## Overview
The Writing Persona System is an advanced AI-powered feature that analyzes user onboarding data to create highly specific, platform-optimized writing personas. These personas serve as "unbreakable, high-fidelity persona replication engines" that ensure consistent brand voice across all content creation.
## System Architecture
### Database Schema
The persona system uses four main database tables:
#### 1. `writing_personas` (Core Persona Table)
- **Purpose**: Stores the main persona profile derived from onboarding analysis
- **Key Fields**:
- `persona_name`: Human-readable persona name (e.g., "Professional Tech Voice")
- `archetype`: Persona archetype (e.g., "The Pragmatic Futurist")
- `core_belief`: Central philosophy driving the writing style
- `linguistic_fingerprint`: Quantitative linguistic analysis (JSON)
- `onboarding_session_id`: Links to source onboarding data
#### 2. `platform_personas` (Platform Adaptations)
- **Purpose**: Stores platform-specific adaptations of the core persona
- **Key Fields**:
- `platform_type`: Target platform (twitter, linkedin, instagram, etc.)
- `sentence_metrics`: Platform-optimized sentence structure
- `lexical_features`: Platform-specific vocabulary and hashtags
- `content_format_rules`: Character limits, formatting guidelines
- `engagement_patterns`: Optimal posting frequency and timing
#### 3. `persona_analysis_results` (AI Analysis Tracking)
- **Purpose**: Stores the AI analysis process and results
- **Key Fields**:
- `analysis_prompt`: The prompt used for persona generation
- `linguistic_analysis`: Detailed linguistic fingerprint
- `platform_recommendations`: AI recommendations for each platform
- `confidence_score`: AI confidence in the analysis
#### 4. `persona_validation_results` (Quality Assurance)
- **Purpose**: Stores validation metrics and improvement feedback
- **Key Fields**:
- `stylometric_accuracy`: How well persona matches original style
- `consistency_score`: Consistency across generated content
- `platform_compliance`: Platform optimization effectiveness
### AI Analysis Pipeline
#### Phase 1: Onboarding Data Collection
The system extracts data from the 6-step onboarding process:
1. **Step 1 - API Keys**: Determines available AI providers
2. **Step 2 - Website Analysis**: Core style analysis data
- Writing style (tone, voice, complexity)
- Content characteristics (sentence structure, vocabulary)
- Target audience (demographics, expertise level)
- Style patterns (common phrases, rhetorical devices)
3. **Step 3 - Research Preferences**: Content type preferences
4. **Step 4 - Personalization**: Additional style preferences
5. **Step 5 - Integrations**: Platform preferences
6. **Step 6 - Final**: Trigger persona generation
#### Phase 2: Core Persona Generation
Uses Gemini structured responses to analyze collected data:
```json
{
"identity": {
"persona_name": "Generated from analysis",
"archetype": "The [Adjective] [Role]",
"core_belief": "Central philosophy",
"brand_voice_description": "Detailed description"
},
"linguistic_fingerprint": {
"sentence_metrics": {
"average_sentence_length_words": 14.2,
"preferred_sentence_type": "simple_and_compound",
"active_to_passive_ratio": "90:10"
},
"lexical_features": {
"go_to_words": ["leverage", "unlock", "framework"],
"go_to_phrases": ["Let's get into it", "Here's the thing"],
"avoid_words": ["utilize", "synergize"],
"contractions": "required",
"vocabulary_level": "professional"
},
"rhetorical_devices": {
"metaphors": "common_tech_mechanics",
"analogies": "everyday_to_tech",
"rhetorical_questions": "for_engagement"
}
},
"tonal_range": {
"default_tone": "informed_casual",
"permissible_tones": ["emphatic", "optimistic"],
"forbidden_tones": ["academic", "salesy"]
}
}
```
#### Phase 3: Platform Adaptations
Generates platform-specific optimizations:
- **Twitter**: Character limits, hashtag strategy, engagement tactics
- **LinkedIn**: Professional tone, long-form capability, networking focus
- **Instagram**: Visual-first approach, emoji usage, story optimization
- **Blog**: SEO optimization, header structure, readability scores
- **Medium**: Storytelling focus, publication strategy, engagement optimization
- **Substack**: Newsletter format, subscription focus, email optimization
## API Endpoints
### Core Endpoints
#### `POST /api/personas/generate`
Generates a new writing persona from onboarding data.
**Request**:
```json
{
"onboarding_session_id": 1,
"force_regenerate": false
}
```
**Response**:
```json
{
"success": true,
"persona_id": 123,
"confidence_score": 85.5,
"data_sufficiency": 78.0,
"platforms_generated": ["twitter", "linkedin", "blog"]
}
```
#### `GET /api/personas/user/{user_id}`
Gets all personas for a user.
#### `GET /api/personas/{persona_id}/platform/{platform}`
Gets platform-specific persona adaptation.
#### `GET /api/personas/preview/{user_id}`
Generates a preview without saving to database.
### Integration Endpoints
#### `GET /api/onboarding/persona-readiness`
Checks if sufficient onboarding data exists for persona generation.
#### `POST /api/onboarding/generate-persona`
Generates persona as part of onboarding completion.
## Gemini Structured Response Implementation
### Core Persona Analysis Prompt
The system uses a comprehensive prompt that analyzes:
1. **Website Analysis Data**: Extracted writing patterns, style characteristics
2. **Research Preferences**: Content type preferences, research depth
3. **Target Audience**: Demographics, expertise level, industry focus
### Structured Schema Design
The Gemini responses follow strict JSON schemas that ensure:
- **Quantitative Analysis**: Measurable writing characteristics
- **Platform Optimization**: Specific adaptations for each platform
- **Actionable Guidelines**: Concrete rules for content generation
- **Quality Metrics**: Confidence scores and validation data
### Example Gemini Prompt Structure
```
PERSONA GENERATION TASK: Create a comprehensive writing persona based on user onboarding data.
ONBOARDING DATA ANALYSIS:
[Detailed website analysis, research preferences, and style data]
PERSONA GENERATION REQUIREMENTS:
1. IDENTITY CREATION: Create memorable persona name and archetype
2. LINGUISTIC FINGERPRINT: Quantitative analysis of writing patterns
3. RHETORICAL ANALYSIS: Metaphor patterns, storytelling approach
4. TONAL RANGE: Default tone and permissible variations
5. STYLISTIC CONSTRAINTS: Punctuation, formatting preferences
Generate a comprehensive persona profile that can replicate this writing style across platforms.
```
## Platform-Specific Optimizations
### Twitter/X Optimization
- **Character Limit**: 280 characters
- **Optimal Length**: 120-150 characters
- **Hashtag Strategy**: Maximum 3 hashtags
- **Engagement**: Thread support, retweet optimization
### LinkedIn Optimization
- **Character Limit**: 3000 characters
- **Optimal Length**: 150-300 words
- **Professional Tone**: Maintained throughout
- **Features**: Rich media support, long-form content
### Blog Optimization
- **Word Count**: 800-2000 words
- **SEO Focus**: Header structure, meta descriptions
- **Readability**: Optimized for target audience expertise level
- **Internal Linking**: Strategic link placement
### Instagram Optimization
- **Caption Limit**: 2200 characters
- **Optimal Length**: 125-150 words
- **Visual Focus**: Caption complements imagery
- **Hashtag Strategy**: Up to 30 hashtags, strategic placement
## Data Flow
```
Onboarding Steps 1-6 → Data Collection → Gemini Analysis → Core Persona → Platform Adaptations → Database Storage
```
### Data Sources
1. **Website Analysis** (Step 2):
- Writing style analysis
- Content characteristics
- Target audience identification
- Style pattern recognition
2. **Research Preferences** (Step 3):
- Content type preferences
- Research depth settings
- Factual content requirements
3. **Personalization Settings** (Step 4):
- Brand voice preferences
- Tone specifications
- Style customizations
### Quality Assurance
#### Data Sufficiency Scoring
- **Website Analysis**: 70% of score
- Writing style: 25%
- Content characteristics: 20%
- Target audience: 15%
- Style patterns: 10%
- **Research Preferences**: 30% of score
- Research depth: 10%
- Content types: 10%
- Writing style data: 10%
#### Confidence Scoring
- AI-generated confidence based on data quality
- Minimum 50% data sufficiency required for generation
- Platform-specific confidence scores
## Usage Examples
### 1. Generate Persona During Onboarding
```python
# Automatically triggered during onboarding completion
persona_service = PersonaAnalysisService()
result = persona_service.generate_persona_from_onboarding(user_id=1)
```
### 2. Get Platform-Specific Persona
```python
# Get LinkedIn-optimized persona
platform_persona = persona_service.get_persona_for_platform(user_id=1, platform="linkedin")
```
### 3. Generate Content with Persona
```python
# Use persona for content generation
persona = get_persona_for_platform(user_id, "twitter")
content = generate_content_with_persona(prompt, persona)
```
## Implementation Notes
### Gemini Integration
- Uses `gemini-2.5-flash` model for optimal performance
- Low temperature (0.2) for consistent analysis
- High token limit (8192) for comprehensive output
- Structured JSON schema validation
### Error Handling
- Graceful degradation when data is insufficient
- Fallback to default personas when generation fails
- Comprehensive logging for debugging
### Performance Considerations
- Persona generation is asynchronous
- Results cached in database for fast retrieval
- Platform adaptations generated in parallel
## Future Enhancements
1. **Validation System**: Automated testing of generated content against persona
2. **Learning System**: Persona refinement based on content performance
3. **Multi-User Support**: User-specific persona management
4. **Advanced Analytics**: Persona effectiveness tracking
5. **Content Templates**: Platform-specific content templates using personas
## Troubleshooting
### Common Issues
1. **Insufficient Onboarding Data**
- **Solution**: Ensure steps 2 and 3 are completed with quality data
- **Check**: Data sufficiency score > 50%
2. **Gemini API Errors**
- **Solution**: Verify API key configuration
- **Check**: Network connectivity and rate limits
3. **Platform Adaptation Failures**
- **Solution**: Check platform-specific constraints
- **Check**: Schema validation and token limits
### Debugging
1. **Enable Debug Logging**: Set log level to DEBUG
2. **Check Database**: Verify table creation and data integrity
3. **Test API**: Use test script to validate functionality
4. **Monitor Performance**: Track generation times and success rates