616 lines
18 KiB
Markdown
616 lines
18 KiB
Markdown
# Persona System: Voice Cloning & Avatar Hyper-Personalization
|
|
|
|
## Executive Summary
|
|
|
|
This document outlines the integration of voice cloning and AI avatar capabilities into ALwrity's Persona System to enable true hyper-personalization. Users will train their voice and create their avatar during onboarding, then use these across all content generation (LinkedIn, Blog, Story Writer, etc.) for consistent brand identity.
|
|
|
|
---
|
|
|
|
## Vision: AI Hyper-Personalization
|
|
|
|
**Goal**: Every piece of content generated by ALwrity should feel authentically "you" - not just in writing style, but in voice and visual presence.
|
|
|
|
**Current State**: Persona system handles writing style only
|
|
**Target State**: Persona system handles writing style + voice + avatar = complete brand identity
|
|
|
|
---
|
|
|
|
## Current Persona System Analysis
|
|
|
|
### Existing Capabilities
|
|
- **Writing Style Analysis**: Tone, voice, complexity, engagement level
|
|
- **Platform Adaptation**: LinkedIn, Facebook, Blog optimizations
|
|
- **Content Characteristics**: Sentence structure, vocabulary, patterns
|
|
- **Onboarding Integration**: Automatically generated from onboarding data
|
|
|
|
### Current Limitations
|
|
- No voice/personality in audio content
|
|
- No visual representation
|
|
- Limited to text-based personalization
|
|
- Cannot create video content with user's presence
|
|
|
|
### Persona System Architecture
|
|
**Location**: `backend/services/persona_analysis_service.py`
|
|
|
|
**Current Flow**:
|
|
1. User completes onboarding (6 steps)
|
|
2. System analyzes website content and writing style
|
|
3. Core persona generated
|
|
4. Platform-specific adaptations created
|
|
5. Persona saved to database
|
|
|
|
**Database Model**: `backend/models/persona_models.py` - `WritingPersona` table
|
|
|
|
---
|
|
|
|
## Proposed Enhancements
|
|
|
|
### 1. Voice Cloning Integration
|
|
|
|
#### 1.1 Voice Training During Onboarding
|
|
|
|
**Integration Point**: Onboarding Step 6 (Persona Generation)
|
|
|
|
**New Onboarding Flow**:
|
|
```
|
|
Step 1-5: Existing onboarding steps
|
|
Step 6: Persona Generation
|
|
├─ Writing Style Analysis (existing)
|
|
├─ Voice Training (NEW)
|
|
│ ├─ Audio sample upload (1-3 minutes)
|
|
│ ├─ Voice clone training (~2-5 minutes)
|
|
│ └─ Voice preview and approval
|
|
└─ Avatar Creation (NEW)
|
|
├─ Photo upload
|
|
├─ Avatar generation
|
|
└─ Avatar preview and approval
|
|
```
|
|
|
|
**Implementation**:
|
|
|
|
**Backend**: `backend/services/persona/voice_persona_service.py` (NEW)
|
|
```python
|
|
class VoicePersonaService:
|
|
"""
|
|
Manages voice cloning for persona system.
|
|
Integrates with Minimax voice clone API.
|
|
"""
|
|
|
|
def train_voice_from_audio(
|
|
self,
|
|
user_id: str,
|
|
audio_file_path: str,
|
|
persona_id: int,
|
|
) -> Dict[str, Any]:
|
|
"""
|
|
Train voice clone from user's audio sample.
|
|
Links voice to persona.
|
|
"""
|
|
# 1. Validate audio file (format, length, quality)
|
|
# 2. Upload to Minimax
|
|
# 3. Train voice clone
|
|
# 4. Store voice_id in persona
|
|
# 5. Return training status
|
|
pass
|
|
|
|
def generate_audio_with_persona_voice(
|
|
self,
|
|
text: str,
|
|
persona_id: int,
|
|
emotion: str = "neutral",
|
|
speed: float = 1.0,
|
|
) -> bytes:
|
|
"""
|
|
Generate audio using persona's cloned voice.
|
|
"""
|
|
# 1. Get voice_id from persona
|
|
# 2. Call Minimax voice generation
|
|
# 3. Return audio bytes
|
|
pass
|
|
```
|
|
|
|
**Database Schema Update**: `backend/models/persona_models.py`
|
|
```python
|
|
class WritingPersona(Base):
|
|
# Existing fields...
|
|
|
|
# NEW: Voice cloning fields
|
|
voice_id: Optional[str] = Column(String(255), nullable=True)
|
|
voice_training_status: Optional[str] = Column(String(50), nullable=True) # 'not_trained', 'training', 'ready', 'failed'
|
|
voice_training_audio_url: Optional[str] = Column(String(500), nullable=True)
|
|
voice_trained_at: Optional[datetime] = Column(DateTime, nullable=True)
|
|
|
|
# NEW: Avatar fields
|
|
avatar_id: Optional[str] = Column(String(255), nullable=True)
|
|
avatar_image_url: Optional[str] = Column(String(500), nullable=True)
|
|
avatar_training_status: Optional[str] = Column(String(50), nullable=True)
|
|
avatar_created_at: Optional[datetime] = Column(DateTime, nullable=True)
|
|
```
|
|
|
|
**Frontend**: `frontend/src/components/Onboarding/PersonaGenerationStep.tsx` (NEW)
|
|
```typescript
|
|
interface PersonaGenerationStepProps {
|
|
onboardingData: OnboardingData;
|
|
onComplete: (persona: Persona) => void;
|
|
}
|
|
|
|
const PersonaGenerationStep: React.FC<PersonaGenerationStepProps> = ({
|
|
onboardingData,
|
|
onComplete,
|
|
}) => {
|
|
// 1. Show writing style analysis progress
|
|
// 2. Show voice training section
|
|
// 3. Show avatar creation section
|
|
// 4. Preview complete persona
|
|
// 5. Allow approval/modification
|
|
};
|
|
```
|
|
|
|
#### 1.2 Voice Usage Across Platform
|
|
|
|
**Integration Points**:
|
|
- **Story Writer**: Use persona voice for audio narration
|
|
- **LinkedIn**: Voice-over for video posts
|
|
- **Blog**: Audio narration for blog posts
|
|
- **Email**: Personalized voice messages
|
|
- **Social Media**: Video content with user's voice
|
|
|
|
**Implementation Pattern**:
|
|
```python
|
|
# In any content generation service
|
|
def generate_content_with_persona(user_id: str, content_type: str):
|
|
# 1. Get user's persona
|
|
persona = get_persona(user_id)
|
|
|
|
# 2. Generate text content (existing)
|
|
text_content = generate_text(persona)
|
|
|
|
# 3. Generate audio with persona voice (NEW)
|
|
if persona.voice_id and persona.voice_training_status == 'ready':
|
|
audio_content = voice_service.generate_audio_with_persona_voice(
|
|
text=text_content,
|
|
persona_id=persona.id,
|
|
)
|
|
|
|
# 4. Generate video with persona avatar (NEW)
|
|
if persona.avatar_id:
|
|
video_content = avatar_service.generate_video_with_persona_avatar(
|
|
text=text_content,
|
|
audio=audio_content,
|
|
persona_id=persona.id,
|
|
)
|
|
|
|
return {
|
|
'text': text_content,
|
|
'audio': audio_content,
|
|
'video': video_content,
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 2. Avatar Creation Integration
|
|
|
|
#### 2.1 Avatar Training During Onboarding
|
|
|
|
**Integration Point**: Onboarding Step 6 (Persona Generation)
|
|
|
|
**Avatar Options**:
|
|
1. **Hunyuan Avatar**: Talking avatar from photo + audio
|
|
2. **InfiniteTalk**: Long-form avatar videos
|
|
3. **Custom Avatar**: User's photo as avatar base
|
|
|
|
**Implementation**:
|
|
|
|
**Backend**: `backend/services/persona/avatar_persona_service.py` (NEW)
|
|
```python
|
|
class AvatarPersonaService:
|
|
"""
|
|
Manages avatar creation for persona system.
|
|
Integrates with WaveSpeed Hunyuan Avatar and InfiniteTalk.
|
|
"""
|
|
|
|
def create_avatar_from_photo(
|
|
self,
|
|
user_id: str,
|
|
photo_file_path: str,
|
|
persona_id: int,
|
|
) -> Dict[str, Any]:
|
|
"""
|
|
Create avatar from user's photo.
|
|
Uses Hunyuan Avatar for initial creation.
|
|
"""
|
|
# 1. Validate photo (format, size, quality)
|
|
# 2. Upload to WaveSpeed
|
|
# 3. Create avatar
|
|
# 4. Store avatar_id in persona
|
|
# 5. Return avatar preview
|
|
pass
|
|
|
|
def generate_video_with_persona_avatar(
|
|
self,
|
|
text: str,
|
|
audio_bytes: bytes,
|
|
persona_id: int,
|
|
duration: int = 60, # seconds
|
|
) -> bytes:
|
|
"""
|
|
Generate video with persona's avatar speaking.
|
|
Uses InfiniteTalk for long-form, Hunyuan for short.
|
|
"""
|
|
# 1. Get avatar_id from persona
|
|
# 2. Get voice_id from persona (for audio)
|
|
# 3. Call WaveSpeed API
|
|
# 4. Return video bytes
|
|
pass
|
|
```
|
|
|
|
#### 2.2 Avatar Usage Across Platform
|
|
|
|
**Use Cases**:
|
|
- **LinkedIn Video Posts**: User's avatar presenting content
|
|
- **Story Writer**: Avatar narrating story scenes
|
|
- **Blog Videos**: Avatar explaining blog content
|
|
- **Email Campaigns**: Personalized video messages
|
|
- **Social Media**: Consistent avatar across platforms
|
|
|
|
---
|
|
|
|
### 3. Enhanced Persona Management
|
|
|
|
#### 3.1 Persona Dashboard
|
|
|
|
**New UI Component**: `frontend/src/components/Persona/PersonaDashboard.tsx`
|
|
|
|
**Features**:
|
|
- Persona overview (writing style, voice, avatar)
|
|
- Voice training status and preview
|
|
- Avatar preview and management
|
|
- Usage statistics (where persona is used)
|
|
- Edit/update options
|
|
|
|
#### 3.2 Persona Settings
|
|
|
|
**New UI Component**: `frontend/src/components/Persona/PersonaSettings.tsx`
|
|
|
|
**Settings**:
|
|
- Voice parameters (emotion, speed, tone)
|
|
- Avatar appearance (clothing, background, style)
|
|
- Platform-specific adaptations
|
|
- Content type preferences
|
|
|
|
---
|
|
|
|
## Implementation Phases
|
|
|
|
### Phase 1: Voice Cloning Integration (Week 1-3)
|
|
|
|
**Priority**: HIGH - Core hyper-personalization feature
|
|
|
|
**Tasks**:
|
|
1. ✅ Create `VoicePersonaService`
|
|
2. ✅ Integrate Minimax voice clone API
|
|
3. ✅ Add voice fields to `WritingPersona` model
|
|
4. ✅ Update onboarding Step 6 with voice training
|
|
5. ✅ Create voice training UI component
|
|
6. ✅ Add voice preview and testing
|
|
7. ✅ Integrate voice into Story Writer
|
|
8. ✅ Add voice usage tracking
|
|
9. ✅ Update persona dashboard
|
|
10. ✅ Testing and optimization
|
|
|
|
**Files to Create**:
|
|
- `backend/services/persona/voice_persona_service.py`
|
|
- `frontend/src/components/Onboarding/VoiceTrainingSection.tsx`
|
|
- `frontend/src/components/Persona/VoiceManagement.tsx`
|
|
|
|
**Files to Modify**:
|
|
- `backend/models/persona_models.py`
|
|
- `backend/services/persona_analysis_service.py`
|
|
- `backend/api/onboarding_utils/` (onboarding routes)
|
|
- `frontend/src/components/Onboarding/PersonaGenerationStep.tsx`
|
|
- `backend/services/story_writer/audio_generation_service.py`
|
|
|
|
**Success Criteria**:
|
|
- Users can train voice during onboarding
|
|
- Voice used automatically in Story Writer
|
|
- Voice quality significantly better than gTTS
|
|
- Voice linked to persona
|
|
- Cost tracking accurate
|
|
|
|
---
|
|
|
|
### Phase 2: Avatar Creation Integration (Week 4-6)
|
|
|
|
**Priority**: HIGH - Visual personalization
|
|
|
|
**Tasks**:
|
|
1. ✅ Create `AvatarPersonaService`
|
|
2. ✅ Integrate Hunyuan Avatar API
|
|
3. ✅ Add avatar fields to `WritingPersona` model
|
|
4. ✅ Update onboarding Step 6 with avatar creation
|
|
5. ✅ Create avatar creation UI component
|
|
6. ✅ Add avatar preview and testing
|
|
7. ✅ Integrate avatar into content generation
|
|
8. ✅ Add avatar usage tracking
|
|
9. ✅ Update persona dashboard
|
|
10. ✅ Testing and optimization
|
|
|
|
**Files to Create**:
|
|
- `backend/services/persona/avatar_persona_service.py`
|
|
- `frontend/src/components/Onboarding/AvatarCreationSection.tsx`
|
|
- `frontend/src/components/Persona/AvatarManagement.tsx`
|
|
|
|
**Files to Modify**:
|
|
- `backend/models/persona_models.py`
|
|
- `backend/services/persona_analysis_service.py`
|
|
- `frontend/src/components/Onboarding/PersonaGenerationStep.tsx`
|
|
- `backend/services/story_writer/video_generation_service.py`
|
|
|
|
**Success Criteria**:
|
|
- Users can create avatar during onboarding
|
|
- Avatar used in video content generation
|
|
- Avatar quality good
|
|
- Avatar linked to persona
|
|
- Cost tracking accurate
|
|
|
|
---
|
|
|
|
### Phase 3: Cross-Platform Integration (Week 7-8)
|
|
|
|
**Priority**: MEDIUM - Complete hyper-personalization
|
|
|
|
**Tasks**:
|
|
1. ✅ Integrate persona voice into LinkedIn Writer
|
|
2. ✅ Integrate persona avatar into LinkedIn Writer
|
|
3. ✅ Integrate persona voice into Blog Writer
|
|
4. ✅ Integrate persona avatar into Blog Writer
|
|
5. ✅ Add persona usage analytics
|
|
6. ✅ Update all content generation services
|
|
7. ✅ Create persona usage dashboard
|
|
8. ✅ Documentation and user guides
|
|
|
|
**Success Criteria**:
|
|
- Persona voice/avatar used across all platforms
|
|
- Consistent brand identity
|
|
- Good user experience
|
|
- Analytics working
|
|
|
|
---
|
|
|
|
## Cost Management
|
|
|
|
### Voice Cloning Costs
|
|
|
|
**One-Time Training**: $0.75 per voice
|
|
**Per-Minute Generation**: $0.02 per minute
|
|
|
|
**Cost Optimization**:
|
|
- Train voice once during onboarding (included in Pro/Enterprise)
|
|
- Free tier: gTTS only
|
|
- Basic tier: Voice training available ($0.75 one-time)
|
|
- Pro/Enterprise: Voice training included
|
|
|
|
### Avatar Creation Costs
|
|
|
|
**Hunyuan Avatar**: $0.15-0.30 per 5 seconds
|
|
**InfiniteTalk**: $0.15-0.30 per 5 seconds (up to 10 minutes)
|
|
|
|
**Cost Optimization**:
|
|
- Avatar creation: One-time during onboarding
|
|
- Video generation: Pay-per-use
|
|
- Default to shorter videos (5 seconds)
|
|
- Allow longer videos for premium users
|
|
|
|
### Subscription Integration
|
|
|
|
**Update Subscription Tiers**:
|
|
- **Free**: Writing persona only, no voice/avatar
|
|
- **Basic**: Writing persona + voice training ($0.75 one-time)
|
|
- **Pro**: Writing persona + voice + avatar creation included
|
|
- **Enterprise**: All features + unlimited usage
|
|
|
|
---
|
|
|
|
## User Experience Flow
|
|
|
|
### Onboarding Flow (Enhanced)
|
|
|
|
```
|
|
Step 1-5: Existing onboarding steps
|
|
↓
|
|
Step 6: Persona Generation
|
|
├─ Writing Style Analysis
|
|
│ └─ [Progress: Analyzing your writing style...]
|
|
│
|
|
├─ Voice Training (NEW)
|
|
│ ├─ Upload audio sample (1-3 minutes)
|
|
│ ├─ [Training your voice...] (~2-5 minutes)
|
|
│ ├─ Preview generated voice
|
|
│ └─ Approve or retrain
|
|
│
|
|
└─ Avatar Creation (NEW)
|
|
├─ Upload photo
|
|
├─ [Creating your avatar...] (~1-2 minutes)
|
|
├─ Preview avatar
|
|
└─ Approve or recreate
|
|
↓
|
|
Step 7: Persona Preview
|
|
├─ Writing Style Summary
|
|
├─ Voice Preview
|
|
├─ Avatar Preview
|
|
└─ Approve Complete Persona
|
|
```
|
|
|
|
### Content Generation Flow (Enhanced)
|
|
|
|
```
|
|
User creates content (LinkedIn/Blog/Story)
|
|
↓
|
|
System loads user's persona
|
|
├─ Writing style → Text generation
|
|
├─ Voice ID → Audio generation (if available)
|
|
└─ Avatar ID → Video generation (if available)
|
|
↓
|
|
Content generated with full personalization
|
|
├─ Text matches writing style
|
|
├─ Audio uses user's voice
|
|
└─ Video shows user's avatar
|
|
```
|
|
|
|
---
|
|
|
|
## Technical Architecture
|
|
|
|
### Backend Services
|
|
|
|
```
|
|
backend/services/
|
|
├── persona/
|
|
│ ├── __init__.py
|
|
│ ├── voice_persona_service.py # NEW: Voice cloning
|
|
│ ├── avatar_persona_service.py # NEW: Avatar creation
|
|
│ └── persona_analysis_service.py # Enhanced
|
|
├── minimax/
|
|
│ └── voice_clone.py # Shared with Story Writer
|
|
└── wavespeed/
|
|
└── avatar_generation.py # Shared with Story Writer
|
|
```
|
|
|
|
### Frontend Components
|
|
|
|
```
|
|
frontend/src/components/
|
|
├── Onboarding/
|
|
│ ├── PersonaGenerationStep.tsx # Enhanced
|
|
│ ├── VoiceTrainingSection.tsx # NEW
|
|
│ └── AvatarCreationSection.tsx # NEW
|
|
└── Persona/
|
|
├── PersonaDashboard.tsx # NEW
|
|
├── VoiceManagement.tsx # NEW
|
|
├── AvatarManagement.tsx # NEW
|
|
└── PersonaSettings.tsx # NEW
|
|
```
|
|
|
|
### Database Schema
|
|
|
|
```sql
|
|
-- Enhanced WritingPersona table
|
|
ALTER TABLE writing_persona ADD COLUMN voice_id VARCHAR(255);
|
|
ALTER TABLE writing_persona ADD COLUMN voice_training_status VARCHAR(50);
|
|
ALTER TABLE writing_persona ADD COLUMN voice_training_audio_url VARCHAR(500);
|
|
ALTER TABLE writing_persona ADD COLUMN voice_trained_at TIMESTAMP;
|
|
|
|
ALTER TABLE writing_persona ADD COLUMN avatar_id VARCHAR(255);
|
|
ALTER TABLE writing_persona ADD COLUMN avatar_image_url VARCHAR(500);
|
|
ALTER TABLE writing_persona ADD COLUMN avatar_training_status VARCHAR(50);
|
|
ALTER TABLE writing_persona ADD COLUMN avatar_created_at TIMESTAMP;
|
|
```
|
|
|
|
---
|
|
|
|
## Integration with Existing Systems
|
|
|
|
### Story Writer Integration
|
|
|
|
**Location**: `backend/services/story_writer/audio_generation_service.py`
|
|
|
|
**Enhancement**:
|
|
```python
|
|
def generate_scene_audio(
|
|
self,
|
|
scene: Dict[str, Any],
|
|
user_id: str,
|
|
use_persona_voice: bool = True, # NEW: Use persona voice
|
|
) -> Dict[str, Any]:
|
|
if use_persona_voice:
|
|
# Get user's persona
|
|
persona = get_persona(user_id)
|
|
if persona.voice_id and persona.voice_training_status == 'ready':
|
|
# Use persona voice
|
|
return self._generate_with_persona_voice(scene, persona)
|
|
|
|
# Fallback to default provider
|
|
return self._generate_with_gtts(scene)
|
|
```
|
|
|
|
### LinkedIn Writer Integration
|
|
|
|
**Enhancement**: Add video generation with persona avatar
|
|
- LinkedIn video posts with user's avatar
|
|
- Voice-over with user's voice
|
|
- Consistent brand presence
|
|
|
|
### Blog Writer Integration
|
|
|
|
**Enhancement**: Add audio/video options
|
|
- Audio narration with persona voice
|
|
- Video explanations with persona avatar
|
|
- Enhanced blog content
|
|
|
|
---
|
|
|
|
## Success Metrics
|
|
|
|
### Adoption Metrics
|
|
- Voice training completion rate (target: >60% of Pro users)
|
|
- Avatar creation completion rate (target: >50% of Pro users)
|
|
- Persona usage across platforms (target: >80% of content uses persona)
|
|
|
|
### Quality Metrics
|
|
- Voice quality satisfaction (target: >4.5/5)
|
|
- Avatar quality satisfaction (target: >4.5/5)
|
|
- Brand consistency score (target: >90%)
|
|
|
|
### Business Metrics
|
|
- User retention (persona users vs. non-persona)
|
|
- Content engagement (persona content vs. generic)
|
|
- Premium tier conversion (persona as differentiator)
|
|
|
|
---
|
|
|
|
## Risk Mitigation
|
|
|
|
| Risk | Mitigation |
|
|
|------|------------|
|
|
| Voice training failure | Quality checks, clear error messages, retry option |
|
|
| Avatar quality issues | Preview before approval, regeneration option |
|
|
| Cost concerns | Clear pricing, tier-based access, cost estimates |
|
|
| User privacy | Secure storage, opt-in consent, data encryption |
|
|
| API reliability | Fallback options, retry logic, error handling |
|
|
|
|
---
|
|
|
|
## Privacy & Security
|
|
|
|
### Data Storage
|
|
- Voice samples: Encrypted storage, deleted after training
|
|
- Avatar photos: Encrypted storage, user can delete
|
|
- Voice/Avatar IDs: Secure API keys, no raw data stored
|
|
|
|
### User Control
|
|
- Users can delete voice/avatar anytime
|
|
- Users can retrain voice/avatar
|
|
- Users can opt-out of voice/avatar features
|
|
- Clear privacy policy
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **Week 1**: Set up Minimax API access
|
|
2. **Week 1-2**: Implement voice persona service
|
|
3. **Week 2-3**: Integrate into onboarding
|
|
4. **Week 3-4**: Integrate into Story Writer
|
|
5. **Week 4-5**: Set up WaveSpeed avatar API
|
|
6. **Week 5-6**: Implement avatar persona service
|
|
7. **Week 6-7**: Integrate into onboarding
|
|
8. **Week 7-8**: Cross-platform integration
|
|
|
|
---
|
|
|
|
*Document Version: 1.0*
|
|
*Last Updated: January 2025*
|
|
*Priority: HIGH - Core Hyper-Personalization Feature*
|
|
|