328 lines
11 KiB
Markdown
328 lines
11 KiB
Markdown
# Writing Persona System Documentation
|
|
|
|
## Overview
|
|
|
|
The Writing Persona System is an advanced AI-powered feature that analyzes user onboarding data to create highly specific, platform-optimized writing personas. These personas serve as "unbreakable, high-fidelity persona replication engines" that ensure consistent brand voice across all content creation.
|
|
|
|
## System Architecture
|
|
|
|
### Database Schema
|
|
|
|
The persona system uses four main database tables:
|
|
|
|
#### 1. `writing_personas` (Core Persona Table)
|
|
- **Purpose**: Stores the main persona profile derived from onboarding analysis
|
|
- **Key Fields**:
|
|
- `persona_name`: Human-readable persona name (e.g., "Professional Tech Voice")
|
|
- `archetype`: Persona archetype (e.g., "The Pragmatic Futurist")
|
|
- `core_belief`: Central philosophy driving the writing style
|
|
- `linguistic_fingerprint`: Quantitative linguistic analysis (JSON)
|
|
- `onboarding_session_id`: Links to source onboarding data
|
|
|
|
#### 2. `platform_personas` (Platform Adaptations)
|
|
- **Purpose**: Stores platform-specific adaptations of the core persona
|
|
- **Key Fields**:
|
|
- `platform_type`: Target platform (twitter, linkedin, instagram, etc.)
|
|
- `sentence_metrics`: Platform-optimized sentence structure
|
|
- `lexical_features`: Platform-specific vocabulary and hashtags
|
|
- `content_format_rules`: Character limits, formatting guidelines
|
|
- `engagement_patterns`: Optimal posting frequency and timing
|
|
|
|
#### 3. `persona_analysis_results` (AI Analysis Tracking)
|
|
- **Purpose**: Stores the AI analysis process and results
|
|
- **Key Fields**:
|
|
- `analysis_prompt`: The prompt used for persona generation
|
|
- `linguistic_analysis`: Detailed linguistic fingerprint
|
|
- `platform_recommendations`: AI recommendations for each platform
|
|
- `confidence_score`: AI confidence in the analysis
|
|
|
|
#### 4. `persona_validation_results` (Quality Assurance)
|
|
- **Purpose**: Stores validation metrics and improvement feedback
|
|
- **Key Fields**:
|
|
- `stylometric_accuracy`: How well persona matches original style
|
|
- `consistency_score`: Consistency across generated content
|
|
- `platform_compliance`: Platform optimization effectiveness
|
|
|
|
### AI Analysis Pipeline
|
|
|
|
#### Phase 1: Onboarding Data Collection
|
|
The system extracts data from the 6-step onboarding process:
|
|
|
|
1. **Step 1 - API Keys**: Determines available AI providers
|
|
2. **Step 2 - Website Analysis**: Core style analysis data
|
|
- Writing style (tone, voice, complexity)
|
|
- Content characteristics (sentence structure, vocabulary)
|
|
- Target audience (demographics, expertise level)
|
|
- Style patterns (common phrases, rhetorical devices)
|
|
|
|
3. **Step 3 - Research Preferences**: Content type preferences
|
|
4. **Step 4 - Personalization**: Additional style preferences
|
|
5. **Step 5 - Integrations**: Platform preferences
|
|
6. **Step 6 - Final**: Trigger persona generation
|
|
|
|
#### Phase 2: Core Persona Generation
|
|
Uses Gemini structured responses to analyze collected data:
|
|
|
|
```json
|
|
{
|
|
"identity": {
|
|
"persona_name": "Generated from analysis",
|
|
"archetype": "The [Adjective] [Role]",
|
|
"core_belief": "Central philosophy",
|
|
"brand_voice_description": "Detailed description"
|
|
},
|
|
"linguistic_fingerprint": {
|
|
"sentence_metrics": {
|
|
"average_sentence_length_words": 14.2,
|
|
"preferred_sentence_type": "simple_and_compound",
|
|
"active_to_passive_ratio": "90:10"
|
|
},
|
|
"lexical_features": {
|
|
"go_to_words": ["leverage", "unlock", "framework"],
|
|
"go_to_phrases": ["Let's get into it", "Here's the thing"],
|
|
"avoid_words": ["utilize", "synergize"],
|
|
"contractions": "required",
|
|
"vocabulary_level": "professional"
|
|
},
|
|
"rhetorical_devices": {
|
|
"metaphors": "common_tech_mechanics",
|
|
"analogies": "everyday_to_tech",
|
|
"rhetorical_questions": "for_engagement"
|
|
}
|
|
},
|
|
"tonal_range": {
|
|
"default_tone": "informed_casual",
|
|
"permissible_tones": ["emphatic", "optimistic"],
|
|
"forbidden_tones": ["academic", "salesy"]
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Phase 3: Platform Adaptations
|
|
Generates platform-specific optimizations:
|
|
|
|
- **Twitter**: Character limits, hashtag strategy, engagement tactics
|
|
- **LinkedIn**: Professional tone, long-form capability, networking focus
|
|
- **Instagram**: Visual-first approach, emoji usage, story optimization
|
|
- **Blog**: SEO optimization, header structure, readability scores
|
|
- **Medium**: Storytelling focus, publication strategy, engagement optimization
|
|
- **Substack**: Newsletter format, subscription focus, email optimization
|
|
|
|
## API Endpoints
|
|
|
|
### Core Endpoints
|
|
|
|
#### `POST /api/personas/generate`
|
|
Generates a new writing persona from onboarding data.
|
|
|
|
**Request**:
|
|
```json
|
|
{
|
|
"onboarding_session_id": 1,
|
|
"force_regenerate": false
|
|
}
|
|
```
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"persona_id": 123,
|
|
"confidence_score": 85.5,
|
|
"data_sufficiency": 78.0,
|
|
"platforms_generated": ["twitter", "linkedin", "blog"]
|
|
}
|
|
```
|
|
|
|
#### `GET /api/personas/user/{user_id}`
|
|
Gets all personas for a user.
|
|
|
|
#### `GET /api/personas/{persona_id}/platform/{platform}`
|
|
Gets platform-specific persona adaptation.
|
|
|
|
#### `GET /api/personas/preview/{user_id}`
|
|
Generates a preview without saving to database.
|
|
|
|
### Integration Endpoints
|
|
|
|
#### `GET /api/onboarding/persona-readiness`
|
|
Checks if sufficient onboarding data exists for persona generation.
|
|
|
|
#### `POST /api/onboarding/generate-persona`
|
|
Generates persona as part of onboarding completion.
|
|
|
|
## Gemini Structured Response Implementation
|
|
|
|
### Core Persona Analysis Prompt
|
|
|
|
The system uses a comprehensive prompt that analyzes:
|
|
|
|
1. **Website Analysis Data**: Extracted writing patterns, style characteristics
|
|
2. **Research Preferences**: Content type preferences, research depth
|
|
3. **Target Audience**: Demographics, expertise level, industry focus
|
|
|
|
### Structured Schema Design
|
|
|
|
The Gemini responses follow strict JSON schemas that ensure:
|
|
|
|
- **Quantitative Analysis**: Measurable writing characteristics
|
|
- **Platform Optimization**: Specific adaptations for each platform
|
|
- **Actionable Guidelines**: Concrete rules for content generation
|
|
- **Quality Metrics**: Confidence scores and validation data
|
|
|
|
### Example Gemini Prompt Structure
|
|
|
|
```
|
|
PERSONA GENERATION TASK: Create a comprehensive writing persona based on user onboarding data.
|
|
|
|
ONBOARDING DATA ANALYSIS:
|
|
[Detailed website analysis, research preferences, and style data]
|
|
|
|
PERSONA GENERATION REQUIREMENTS:
|
|
1. IDENTITY CREATION: Create memorable persona name and archetype
|
|
2. LINGUISTIC FINGERPRINT: Quantitative analysis of writing patterns
|
|
3. RHETORICAL ANALYSIS: Metaphor patterns, storytelling approach
|
|
4. TONAL RANGE: Default tone and permissible variations
|
|
5. STYLISTIC CONSTRAINTS: Punctuation, formatting preferences
|
|
|
|
Generate a comprehensive persona profile that can replicate this writing style across platforms.
|
|
```
|
|
|
|
## Platform-Specific Optimizations
|
|
|
|
### Twitter/X Optimization
|
|
- **Character Limit**: 280 characters
|
|
- **Optimal Length**: 120-150 characters
|
|
- **Hashtag Strategy**: Maximum 3 hashtags
|
|
- **Engagement**: Thread support, retweet optimization
|
|
|
|
### LinkedIn Optimization
|
|
- **Character Limit**: 3000 characters
|
|
- **Optimal Length**: 150-300 words
|
|
- **Professional Tone**: Maintained throughout
|
|
- **Features**: Rich media support, long-form content
|
|
|
|
### Blog Optimization
|
|
- **Word Count**: 800-2000 words
|
|
- **SEO Focus**: Header structure, meta descriptions
|
|
- **Readability**: Optimized for target audience expertise level
|
|
- **Internal Linking**: Strategic link placement
|
|
|
|
### Instagram Optimization
|
|
- **Caption Limit**: 2200 characters
|
|
- **Optimal Length**: 125-150 words
|
|
- **Visual Focus**: Caption complements imagery
|
|
- **Hashtag Strategy**: Up to 30 hashtags, strategic placement
|
|
|
|
## Data Flow
|
|
|
|
```
|
|
Onboarding Steps 1-6 → Data Collection → Gemini Analysis → Core Persona → Platform Adaptations → Database Storage
|
|
```
|
|
|
|
### Data Sources
|
|
|
|
1. **Website Analysis** (Step 2):
|
|
- Writing style analysis
|
|
- Content characteristics
|
|
- Target audience identification
|
|
- Style pattern recognition
|
|
|
|
2. **Research Preferences** (Step 3):
|
|
- Content type preferences
|
|
- Research depth settings
|
|
- Factual content requirements
|
|
|
|
3. **Personalization Settings** (Step 4):
|
|
- Brand voice preferences
|
|
- Tone specifications
|
|
- Style customizations
|
|
|
|
### Quality Assurance
|
|
|
|
#### Data Sufficiency Scoring
|
|
- **Website Analysis**: 70% of score
|
|
- Writing style: 25%
|
|
- Content characteristics: 20%
|
|
- Target audience: 15%
|
|
- Style patterns: 10%
|
|
- **Research Preferences**: 30% of score
|
|
- Research depth: 10%
|
|
- Content types: 10%
|
|
- Writing style data: 10%
|
|
|
|
#### Confidence Scoring
|
|
- AI-generated confidence based on data quality
|
|
- Minimum 50% data sufficiency required for generation
|
|
- Platform-specific confidence scores
|
|
|
|
## Usage Examples
|
|
|
|
### 1. Generate Persona During Onboarding
|
|
```python
|
|
# Automatically triggered during onboarding completion
|
|
persona_service = PersonaAnalysisService()
|
|
result = persona_service.generate_persona_from_onboarding(user_id=1)
|
|
```
|
|
|
|
### 2. Get Platform-Specific Persona
|
|
```python
|
|
# Get LinkedIn-optimized persona
|
|
platform_persona = persona_service.get_persona_for_platform(user_id=1, platform="linkedin")
|
|
```
|
|
|
|
### 3. Generate Content with Persona
|
|
```python
|
|
# Use persona for content generation
|
|
persona = get_persona_for_platform(user_id, "twitter")
|
|
content = generate_content_with_persona(prompt, persona)
|
|
```
|
|
|
|
## Implementation Notes
|
|
|
|
### Gemini Integration
|
|
- Uses `gemini-2.5-flash` model for optimal performance
|
|
- Low temperature (0.2) for consistent analysis
|
|
- High token limit (8192) for comprehensive output
|
|
- Structured JSON schema validation
|
|
|
|
### Error Handling
|
|
- Graceful degradation when data is insufficient
|
|
- Fallback to default personas when generation fails
|
|
- Comprehensive logging for debugging
|
|
|
|
### Performance Considerations
|
|
- Persona generation is asynchronous
|
|
- Results cached in database for fast retrieval
|
|
- Platform adaptations generated in parallel
|
|
|
|
## Future Enhancements
|
|
|
|
1. **Validation System**: Automated testing of generated content against persona
|
|
2. **Learning System**: Persona refinement based on content performance
|
|
3. **Multi-User Support**: User-specific persona management
|
|
4. **Advanced Analytics**: Persona effectiveness tracking
|
|
5. **Content Templates**: Platform-specific content templates using personas
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **Insufficient Onboarding Data**
|
|
- **Solution**: Ensure steps 2 and 3 are completed with quality data
|
|
- **Check**: Data sufficiency score > 50%
|
|
|
|
2. **Gemini API Errors**
|
|
- **Solution**: Verify API key configuration
|
|
- **Check**: Network connectivity and rate limits
|
|
|
|
3. **Platform Adaptation Failures**
|
|
- **Solution**: Check platform-specific constraints
|
|
- **Check**: Schema validation and token limits
|
|
|
|
### Debugging
|
|
|
|
1. **Enable Debug Logging**: Set log level to DEBUG
|
|
2. **Check Database**: Verify table creation and data integrity
|
|
3. **Test API**: Use test script to validate functionality
|
|
4. **Monitor Performance**: Track generation times and success rates |