Implement persona generation system with platform-specific adaptations
Co-authored-by: ajay.calsoft <ajay.calsoft@gmail.com>
This commit is contained in:
328
docs/PERSONA_SYSTEM_DOCUMENTATION.md
Normal file
328
docs/PERSONA_SYSTEM_DOCUMENTATION.md
Normal file
@@ -0,0 +1,328 @@
|
||||
# Writing Persona System Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
The Writing Persona System is an advanced AI-powered feature that analyzes user onboarding data to create highly specific, platform-optimized writing personas. These personas serve as "unbreakable, high-fidelity persona replication engines" that ensure consistent brand voice across all content creation.
|
||||
|
||||
## System Architecture
|
||||
|
||||
### Database Schema
|
||||
|
||||
The persona system uses four main database tables:
|
||||
|
||||
#### 1. `writing_personas` (Core Persona Table)
|
||||
- **Purpose**: Stores the main persona profile derived from onboarding analysis
|
||||
- **Key Fields**:
|
||||
- `persona_name`: Human-readable persona name (e.g., "Professional Tech Voice")
|
||||
- `archetype`: Persona archetype (e.g., "The Pragmatic Futurist")
|
||||
- `core_belief`: Central philosophy driving the writing style
|
||||
- `linguistic_fingerprint`: Quantitative linguistic analysis (JSON)
|
||||
- `onboarding_session_id`: Links to source onboarding data
|
||||
|
||||
#### 2. `platform_personas` (Platform Adaptations)
|
||||
- **Purpose**: Stores platform-specific adaptations of the core persona
|
||||
- **Key Fields**:
|
||||
- `platform_type`: Target platform (twitter, linkedin, instagram, etc.)
|
||||
- `sentence_metrics`: Platform-optimized sentence structure
|
||||
- `lexical_features`: Platform-specific vocabulary and hashtags
|
||||
- `content_format_rules`: Character limits, formatting guidelines
|
||||
- `engagement_patterns`: Optimal posting frequency and timing
|
||||
|
||||
#### 3. `persona_analysis_results` (AI Analysis Tracking)
|
||||
- **Purpose**: Stores the AI analysis process and results
|
||||
- **Key Fields**:
|
||||
- `analysis_prompt`: The prompt used for persona generation
|
||||
- `linguistic_analysis`: Detailed linguistic fingerprint
|
||||
- `platform_recommendations`: AI recommendations for each platform
|
||||
- `confidence_score`: AI confidence in the analysis
|
||||
|
||||
#### 4. `persona_validation_results` (Quality Assurance)
|
||||
- **Purpose**: Stores validation metrics and improvement feedback
|
||||
- **Key Fields**:
|
||||
- `stylometric_accuracy`: How well persona matches original style
|
||||
- `consistency_score`: Consistency across generated content
|
||||
- `platform_compliance`: Platform optimization effectiveness
|
||||
|
||||
### AI Analysis Pipeline
|
||||
|
||||
#### Phase 1: Onboarding Data Collection
|
||||
The system extracts data from the 6-step onboarding process:
|
||||
|
||||
1. **Step 1 - API Keys**: Determines available AI providers
|
||||
2. **Step 2 - Website Analysis**: Core style analysis data
|
||||
- Writing style (tone, voice, complexity)
|
||||
- Content characteristics (sentence structure, vocabulary)
|
||||
- Target audience (demographics, expertise level)
|
||||
- Style patterns (common phrases, rhetorical devices)
|
||||
|
||||
3. **Step 3 - Research Preferences**: Content type preferences
|
||||
4. **Step 4 - Personalization**: Additional style preferences
|
||||
5. **Step 5 - Integrations**: Platform preferences
|
||||
6. **Step 6 - Final**: Trigger persona generation
|
||||
|
||||
#### Phase 2: Core Persona Generation
|
||||
Uses Gemini structured responses to analyze collected data:
|
||||
|
||||
```json
|
||||
{
|
||||
"identity": {
|
||||
"persona_name": "Generated from analysis",
|
||||
"archetype": "The [Adjective] [Role]",
|
||||
"core_belief": "Central philosophy",
|
||||
"brand_voice_description": "Detailed description"
|
||||
},
|
||||
"linguistic_fingerprint": {
|
||||
"sentence_metrics": {
|
||||
"average_sentence_length_words": 14.2,
|
||||
"preferred_sentence_type": "simple_and_compound",
|
||||
"active_to_passive_ratio": "90:10"
|
||||
},
|
||||
"lexical_features": {
|
||||
"go_to_words": ["leverage", "unlock", "framework"],
|
||||
"go_to_phrases": ["Let's get into it", "Here's the thing"],
|
||||
"avoid_words": ["utilize", "synergize"],
|
||||
"contractions": "required",
|
||||
"vocabulary_level": "professional"
|
||||
},
|
||||
"rhetorical_devices": {
|
||||
"metaphors": "common_tech_mechanics",
|
||||
"analogies": "everyday_to_tech",
|
||||
"rhetorical_questions": "for_engagement"
|
||||
}
|
||||
},
|
||||
"tonal_range": {
|
||||
"default_tone": "informed_casual",
|
||||
"permissible_tones": ["emphatic", "optimistic"],
|
||||
"forbidden_tones": ["academic", "salesy"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Phase 3: Platform Adaptations
|
||||
Generates platform-specific optimizations:
|
||||
|
||||
- **Twitter**: Character limits, hashtag strategy, engagement tactics
|
||||
- **LinkedIn**: Professional tone, long-form capability, networking focus
|
||||
- **Instagram**: Visual-first approach, emoji usage, story optimization
|
||||
- **Blog**: SEO optimization, header structure, readability scores
|
||||
- **Medium**: Storytelling focus, publication strategy, engagement optimization
|
||||
- **Substack**: Newsletter format, subscription focus, email optimization
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Core Endpoints
|
||||
|
||||
#### `POST /api/personas/generate`
|
||||
Generates a new writing persona from onboarding data.
|
||||
|
||||
**Request**:
|
||||
```json
|
||||
{
|
||||
"onboarding_session_id": 1,
|
||||
"force_regenerate": false
|
||||
}
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"persona_id": 123,
|
||||
"confidence_score": 85.5,
|
||||
"data_sufficiency": 78.0,
|
||||
"platforms_generated": ["twitter", "linkedin", "blog"]
|
||||
}
|
||||
```
|
||||
|
||||
#### `GET /api/personas/user/{user_id}`
|
||||
Gets all personas for a user.
|
||||
|
||||
#### `GET /api/personas/{persona_id}/platform/{platform}`
|
||||
Gets platform-specific persona adaptation.
|
||||
|
||||
#### `GET /api/personas/preview/{user_id}`
|
||||
Generates a preview without saving to database.
|
||||
|
||||
### Integration Endpoints
|
||||
|
||||
#### `GET /api/onboarding/persona-readiness`
|
||||
Checks if sufficient onboarding data exists for persona generation.
|
||||
|
||||
#### `POST /api/onboarding/generate-persona`
|
||||
Generates persona as part of onboarding completion.
|
||||
|
||||
## Gemini Structured Response Implementation
|
||||
|
||||
### Core Persona Analysis Prompt
|
||||
|
||||
The system uses a comprehensive prompt that analyzes:
|
||||
|
||||
1. **Website Analysis Data**: Extracted writing patterns, style characteristics
|
||||
2. **Research Preferences**: Content type preferences, research depth
|
||||
3. **Target Audience**: Demographics, expertise level, industry focus
|
||||
|
||||
### Structured Schema Design
|
||||
|
||||
The Gemini responses follow strict JSON schemas that ensure:
|
||||
|
||||
- **Quantitative Analysis**: Measurable writing characteristics
|
||||
- **Platform Optimization**: Specific adaptations for each platform
|
||||
- **Actionable Guidelines**: Concrete rules for content generation
|
||||
- **Quality Metrics**: Confidence scores and validation data
|
||||
|
||||
### Example Gemini Prompt Structure
|
||||
|
||||
```
|
||||
PERSONA GENERATION TASK: Create a comprehensive writing persona based on user onboarding data.
|
||||
|
||||
ONBOARDING DATA ANALYSIS:
|
||||
[Detailed website analysis, research preferences, and style data]
|
||||
|
||||
PERSONA GENERATION REQUIREMENTS:
|
||||
1. IDENTITY CREATION: Create memorable persona name and archetype
|
||||
2. LINGUISTIC FINGERPRINT: Quantitative analysis of writing patterns
|
||||
3. RHETORICAL ANALYSIS: Metaphor patterns, storytelling approach
|
||||
4. TONAL RANGE: Default tone and permissible variations
|
||||
5. STYLISTIC CONSTRAINTS: Punctuation, formatting preferences
|
||||
|
||||
Generate a comprehensive persona profile that can replicate this writing style across platforms.
|
||||
```
|
||||
|
||||
## Platform-Specific Optimizations
|
||||
|
||||
### Twitter/X Optimization
|
||||
- **Character Limit**: 280 characters
|
||||
- **Optimal Length**: 120-150 characters
|
||||
- **Hashtag Strategy**: Maximum 3 hashtags
|
||||
- **Engagement**: Thread support, retweet optimization
|
||||
|
||||
### LinkedIn Optimization
|
||||
- **Character Limit**: 3000 characters
|
||||
- **Optimal Length**: 150-300 words
|
||||
- **Professional Tone**: Maintained throughout
|
||||
- **Features**: Rich media support, long-form content
|
||||
|
||||
### Blog Optimization
|
||||
- **Word Count**: 800-2000 words
|
||||
- **SEO Focus**: Header structure, meta descriptions
|
||||
- **Readability**: Optimized for target audience expertise level
|
||||
- **Internal Linking**: Strategic link placement
|
||||
|
||||
### Instagram Optimization
|
||||
- **Caption Limit**: 2200 characters
|
||||
- **Optimal Length**: 125-150 words
|
||||
- **Visual Focus**: Caption complements imagery
|
||||
- **Hashtag Strategy**: Up to 30 hashtags, strategic placement
|
||||
|
||||
## Data Flow
|
||||
|
||||
```
|
||||
Onboarding Steps 1-6 → Data Collection → Gemini Analysis → Core Persona → Platform Adaptations → Database Storage
|
||||
```
|
||||
|
||||
### Data Sources
|
||||
|
||||
1. **Website Analysis** (Step 2):
|
||||
- Writing style analysis
|
||||
- Content characteristics
|
||||
- Target audience identification
|
||||
- Style pattern recognition
|
||||
|
||||
2. **Research Preferences** (Step 3):
|
||||
- Content type preferences
|
||||
- Research depth settings
|
||||
- Factual content requirements
|
||||
|
||||
3. **Personalization Settings** (Step 4):
|
||||
- Brand voice preferences
|
||||
- Tone specifications
|
||||
- Style customizations
|
||||
|
||||
### Quality Assurance
|
||||
|
||||
#### Data Sufficiency Scoring
|
||||
- **Website Analysis**: 70% of score
|
||||
- Writing style: 25%
|
||||
- Content characteristics: 20%
|
||||
- Target audience: 15%
|
||||
- Style patterns: 10%
|
||||
- **Research Preferences**: 30% of score
|
||||
- Research depth: 10%
|
||||
- Content types: 10%
|
||||
- Writing style data: 10%
|
||||
|
||||
#### Confidence Scoring
|
||||
- AI-generated confidence based on data quality
|
||||
- Minimum 50% data sufficiency required for generation
|
||||
- Platform-specific confidence scores
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### 1. Generate Persona During Onboarding
|
||||
```python
|
||||
# Automatically triggered during onboarding completion
|
||||
persona_service = PersonaAnalysisService()
|
||||
result = persona_service.generate_persona_from_onboarding(user_id=1)
|
||||
```
|
||||
|
||||
### 2. Get Platform-Specific Persona
|
||||
```python
|
||||
# Get LinkedIn-optimized persona
|
||||
platform_persona = persona_service.get_persona_for_platform(user_id=1, platform="linkedin")
|
||||
```
|
||||
|
||||
### 3. Generate Content with Persona
|
||||
```python
|
||||
# Use persona for content generation
|
||||
persona = get_persona_for_platform(user_id, "twitter")
|
||||
content = generate_content_with_persona(prompt, persona)
|
||||
```
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Gemini Integration
|
||||
- Uses `gemini-2.5-flash` model for optimal performance
|
||||
- Low temperature (0.2) for consistent analysis
|
||||
- High token limit (8192) for comprehensive output
|
||||
- Structured JSON schema validation
|
||||
|
||||
### Error Handling
|
||||
- Graceful degradation when data is insufficient
|
||||
- Fallback to default personas when generation fails
|
||||
- Comprehensive logging for debugging
|
||||
|
||||
### Performance Considerations
|
||||
- Persona generation is asynchronous
|
||||
- Results cached in database for fast retrieval
|
||||
- Platform adaptations generated in parallel
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
1. **Validation System**: Automated testing of generated content against persona
|
||||
2. **Learning System**: Persona refinement based on content performance
|
||||
3. **Multi-User Support**: User-specific persona management
|
||||
4. **Advanced Analytics**: Persona effectiveness tracking
|
||||
5. **Content Templates**: Platform-specific content templates using personas
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Insufficient Onboarding Data**
|
||||
- **Solution**: Ensure steps 2 and 3 are completed with quality data
|
||||
- **Check**: Data sufficiency score > 50%
|
||||
|
||||
2. **Gemini API Errors**
|
||||
- **Solution**: Verify API key configuration
|
||||
- **Check**: Network connectivity and rate limits
|
||||
|
||||
3. **Platform Adaptation Failures**
|
||||
- **Solution**: Check platform-specific constraints
|
||||
- **Check**: Schema validation and token limits
|
||||
|
||||
### Debugging
|
||||
|
||||
1. **Enable Debug Logging**: Set log level to DEBUG
|
||||
2. **Check Database**: Verify table creation and data integrity
|
||||
3. **Test API**: Use test script to validate functionality
|
||||
4. **Monitor Performance**: Track generation times and success rates
|
||||
Reference in New Issue
Block a user