18 KiB
Persona System: Voice Cloning & Avatar Hyper-Personalization
Executive Summary
This document outlines the integration of voice cloning and AI avatar capabilities into ALwrity's Persona System to enable true hyper-personalization. Users will train their voice and create their avatar during onboarding, then use these across all content generation (LinkedIn, Blog, Story Writer, etc.) for consistent brand identity.
Vision: AI Hyper-Personalization
Goal: Every piece of content generated by ALwrity should feel authentically "you" - not just in writing style, but in voice and visual presence.
Current State: Persona system handles writing style only
Target State: Persona system handles writing style + voice + avatar = complete brand identity
Current Persona System Analysis
Existing Capabilities
- Writing Style Analysis: Tone, voice, complexity, engagement level
- Platform Adaptation: LinkedIn, Facebook, Blog optimizations
- Content Characteristics: Sentence structure, vocabulary, patterns
- Onboarding Integration: Automatically generated from onboarding data
Current Limitations
- No voice/personality in audio content
- No visual representation
- Limited to text-based personalization
- Cannot create video content with user's presence
Persona System Architecture
Location: backend/services/persona_analysis_service.py
Current Flow:
- User completes onboarding (6 steps)
- System analyzes website content and writing style
- Core persona generated
- Platform-specific adaptations created
- Persona saved to database
Database Model: backend/models/persona_models.py - WritingPersona table
Proposed Enhancements
1. Voice Cloning Integration
1.1 Voice Training During Onboarding
Integration Point: Onboarding Step 6 (Persona Generation)
New Onboarding Flow:
Step 1-5: Existing onboarding steps
Step 6: Persona Generation
├─ Writing Style Analysis (existing)
├─ Voice Training (NEW)
│ ├─ Audio sample upload (1-3 minutes)
│ ├─ Voice clone training (~2-5 minutes)
│ └─ Voice preview and approval
└─ Avatar Creation (NEW)
├─ Photo upload
├─ Avatar generation
└─ Avatar preview and approval
Implementation:
Backend: backend/services/persona/voice_persona_service.py (NEW)
class VoicePersonaService:
"""
Manages voice cloning for persona system.
Integrates with Minimax voice clone API.
"""
def train_voice_from_audio(
self,
user_id: str,
audio_file_path: str,
persona_id: int,
) -> Dict[str, Any]:
"""
Train voice clone from user's audio sample.
Links voice to persona.
"""
# 1. Validate audio file (format, length, quality)
# 2. Upload to Minimax
# 3. Train voice clone
# 4. Store voice_id in persona
# 5. Return training status
pass
def generate_audio_with_persona_voice(
self,
text: str,
persona_id: int,
emotion: str = "neutral",
speed: float = 1.0,
) -> bytes:
"""
Generate audio using persona's cloned voice.
"""
# 1. Get voice_id from persona
# 2. Call Minimax voice generation
# 3. Return audio bytes
pass
Database Schema Update: backend/models/persona_models.py
class WritingPersona(Base):
# Existing fields...
# NEW: Voice cloning fields
voice_id: Optional[str] = Column(String(255), nullable=True)
voice_training_status: Optional[str] = Column(String(50), nullable=True) # 'not_trained', 'training', 'ready', 'failed'
voice_training_audio_url: Optional[str] = Column(String(500), nullable=True)
voice_trained_at: Optional[datetime] = Column(DateTime, nullable=True)
# NEW: Avatar fields
avatar_id: Optional[str] = Column(String(255), nullable=True)
avatar_image_url: Optional[str] = Column(String(500), nullable=True)
avatar_training_status: Optional[str] = Column(String(50), nullable=True)
avatar_created_at: Optional[datetime] = Column(DateTime, nullable=True)
Frontend: frontend/src/components/Onboarding/PersonaGenerationStep.tsx (NEW)
interface PersonaGenerationStepProps {
onboardingData: OnboardingData;
onComplete: (persona: Persona) => void;
}
const PersonaGenerationStep: React.FC<PersonaGenerationStepProps> = ({
onboardingData,
onComplete,
}) => {
// 1. Show writing style analysis progress
// 2. Show voice training section
// 3. Show avatar creation section
// 4. Preview complete persona
// 5. Allow approval/modification
};
1.2 Voice Usage Across Platform
Integration Points:
- Story Writer: Use persona voice for audio narration
- LinkedIn: Voice-over for video posts
- Blog: Audio narration for blog posts
- Email: Personalized voice messages
- Social Media: Video content with user's voice
Implementation Pattern:
# In any content generation service
def generate_content_with_persona(user_id: str, content_type: str):
# 1. Get user's persona
persona = get_persona(user_id)
# 2. Generate text content (existing)
text_content = generate_text(persona)
# 3. Generate audio with persona voice (NEW)
if persona.voice_id and persona.voice_training_status == 'ready':
audio_content = voice_service.generate_audio_with_persona_voice(
text=text_content,
persona_id=persona.id,
)
# 4. Generate video with persona avatar (NEW)
if persona.avatar_id:
video_content = avatar_service.generate_video_with_persona_avatar(
text=text_content,
audio=audio_content,
persona_id=persona.id,
)
return {
'text': text_content,
'audio': audio_content,
'video': video_content,
}
2. Avatar Creation Integration
2.1 Avatar Training During Onboarding
Integration Point: Onboarding Step 6 (Persona Generation)
Avatar Options:
- Hunyuan Avatar: Talking avatar from photo + audio
- InfiniteTalk: Long-form avatar videos
- Custom Avatar: User's photo as avatar base
Implementation:
Backend: backend/services/persona/avatar_persona_service.py (NEW)
class AvatarPersonaService:
"""
Manages avatar creation for persona system.
Integrates with WaveSpeed Hunyuan Avatar and InfiniteTalk.
"""
def create_avatar_from_photo(
self,
user_id: str,
photo_file_path: str,
persona_id: int,
) -> Dict[str, Any]:
"""
Create avatar from user's photo.
Uses Hunyuan Avatar for initial creation.
"""
# 1. Validate photo (format, size, quality)
# 2. Upload to WaveSpeed
# 3. Create avatar
# 4. Store avatar_id in persona
# 5. Return avatar preview
pass
def generate_video_with_persona_avatar(
self,
text: str,
audio_bytes: bytes,
persona_id: int,
duration: int = 60, # seconds
) -> bytes:
"""
Generate video with persona's avatar speaking.
Uses InfiniteTalk for long-form, Hunyuan for short.
"""
# 1. Get avatar_id from persona
# 2. Get voice_id from persona (for audio)
# 3. Call WaveSpeed API
# 4. Return video bytes
pass
2.2 Avatar Usage Across Platform
Use Cases:
- LinkedIn Video Posts: User's avatar presenting content
- Story Writer: Avatar narrating story scenes
- Blog Videos: Avatar explaining blog content
- Email Campaigns: Personalized video messages
- Social Media: Consistent avatar across platforms
3. Enhanced Persona Management
3.1 Persona Dashboard
New UI Component: frontend/src/components/Persona/PersonaDashboard.tsx
Features:
- Persona overview (writing style, voice, avatar)
- Voice training status and preview
- Avatar preview and management
- Usage statistics (where persona is used)
- Edit/update options
3.2 Persona Settings
New UI Component: frontend/src/components/Persona/PersonaSettings.tsx
Settings:
- Voice parameters (emotion, speed, tone)
- Avatar appearance (clothing, background, style)
- Platform-specific adaptations
- Content type preferences
Implementation Phases
Phase 1: Voice Cloning Integration (Week 1-3)
Priority: HIGH - Core hyper-personalization feature
Tasks:
- ✅ Create
VoicePersonaService - ✅ Integrate Minimax voice clone API
- ✅ Add voice fields to
WritingPersonamodel - ✅ Update onboarding Step 6 with voice training
- ✅ Create voice training UI component
- ✅ Add voice preview and testing
- ✅ Integrate voice into Story Writer
- ✅ Add voice usage tracking
- ✅ Update persona dashboard
- ✅ Testing and optimization
Files to Create:
backend/services/persona/voice_persona_service.pyfrontend/src/components/Onboarding/VoiceTrainingSection.tsxfrontend/src/components/Persona/VoiceManagement.tsx
Files to Modify:
backend/models/persona_models.pybackend/services/persona_analysis_service.pybackend/api/onboarding_utils/(onboarding routes)frontend/src/components/Onboarding/PersonaGenerationStep.tsxbackend/services/story_writer/audio_generation_service.py
Success Criteria:
- Users can train voice during onboarding
- Voice used automatically in Story Writer
- Voice quality significantly better than gTTS
- Voice linked to persona
- Cost tracking accurate
Phase 2: Avatar Creation Integration (Week 4-6)
Priority: HIGH - Visual personalization
Tasks:
- ✅ Create
AvatarPersonaService - ✅ Integrate Hunyuan Avatar API
- ✅ Add avatar fields to
WritingPersonamodel - ✅ Update onboarding Step 6 with avatar creation
- ✅ Create avatar creation UI component
- ✅ Add avatar preview and testing
- ✅ Integrate avatar into content generation
- ✅ Add avatar usage tracking
- ✅ Update persona dashboard
- ✅ Testing and optimization
Files to Create:
backend/services/persona/avatar_persona_service.pyfrontend/src/components/Onboarding/AvatarCreationSection.tsxfrontend/src/components/Persona/AvatarManagement.tsx
Files to Modify:
backend/models/persona_models.pybackend/services/persona_analysis_service.pyfrontend/src/components/Onboarding/PersonaGenerationStep.tsxbackend/services/story_writer/video_generation_service.py
Success Criteria:
- Users can create avatar during onboarding
- Avatar used in video content generation
- Avatar quality good
- Avatar linked to persona
- Cost tracking accurate
Phase 3: Cross-Platform Integration (Week 7-8)
Priority: MEDIUM - Complete hyper-personalization
Tasks:
- ✅ Integrate persona voice into LinkedIn Writer
- ✅ Integrate persona avatar into LinkedIn Writer
- ✅ Integrate persona voice into Blog Writer
- ✅ Integrate persona avatar into Blog Writer
- ✅ Add persona usage analytics
- ✅ Update all content generation services
- ✅ Create persona usage dashboard
- ✅ Documentation and user guides
Success Criteria:
- Persona voice/avatar used across all platforms
- Consistent brand identity
- Good user experience
- Analytics working
Cost Management
Voice Cloning Costs
One-Time Training: $0.75 per voice Per-Minute Generation: $0.02 per minute
Cost Optimization:
- Train voice once during onboarding (included in Pro/Enterprise)
- Free tier: gTTS only
- Basic tier: Voice training available ($0.75 one-time)
- Pro/Enterprise: Voice training included
Avatar Creation Costs
Hunyuan Avatar: $0.15-0.30 per 5 seconds InfiniteTalk: $0.15-0.30 per 5 seconds (up to 10 minutes)
Cost Optimization:
- Avatar creation: One-time during onboarding
- Video generation: Pay-per-use
- Default to shorter videos (5 seconds)
- Allow longer videos for premium users
Subscription Integration
Update Subscription Tiers:
- Free: Writing persona only, no voice/avatar
- Basic: Writing persona + voice training ($0.75 one-time)
- Pro: Writing persona + voice + avatar creation included
- Enterprise: All features + unlimited usage
User Experience Flow
Onboarding Flow (Enhanced)
Step 1-5: Existing onboarding steps
↓
Step 6: Persona Generation
├─ Writing Style Analysis
│ └─ [Progress: Analyzing your writing style...]
│
├─ Voice Training (NEW)
│ ├─ Upload audio sample (1-3 minutes)
│ ├─ [Training your voice...] (~2-5 minutes)
│ ├─ Preview generated voice
│ └─ Approve or retrain
│
└─ Avatar Creation (NEW)
├─ Upload photo
├─ [Creating your avatar...] (~1-2 minutes)
├─ Preview avatar
└─ Approve or recreate
↓
Step 7: Persona Preview
├─ Writing Style Summary
├─ Voice Preview
├─ Avatar Preview
└─ Approve Complete Persona
Content Generation Flow (Enhanced)
User creates content (LinkedIn/Blog/Story)
↓
System loads user's persona
├─ Writing style → Text generation
├─ Voice ID → Audio generation (if available)
└─ Avatar ID → Video generation (if available)
↓
Content generated with full personalization
├─ Text matches writing style
├─ Audio uses user's voice
└─ Video shows user's avatar
Technical Architecture
Backend Services
backend/services/
├── persona/
│ ├── __init__.py
│ ├── voice_persona_service.py # NEW: Voice cloning
│ ├── avatar_persona_service.py # NEW: Avatar creation
│ └── persona_analysis_service.py # Enhanced
├── minimax/
│ └── voice_clone.py # Shared with Story Writer
└── wavespeed/
└── avatar_generation.py # Shared with Story Writer
Frontend Components
frontend/src/components/
├── Onboarding/
│ ├── PersonaGenerationStep.tsx # Enhanced
│ ├── VoiceTrainingSection.tsx # NEW
│ └── AvatarCreationSection.tsx # NEW
└── Persona/
├── PersonaDashboard.tsx # NEW
├── VoiceManagement.tsx # NEW
├── AvatarManagement.tsx # NEW
└── PersonaSettings.tsx # NEW
Database Schema
-- Enhanced WritingPersona table
ALTER TABLE writing_persona ADD COLUMN voice_id VARCHAR(255);
ALTER TABLE writing_persona ADD COLUMN voice_training_status VARCHAR(50);
ALTER TABLE writing_persona ADD COLUMN voice_training_audio_url VARCHAR(500);
ALTER TABLE writing_persona ADD COLUMN voice_trained_at TIMESTAMP;
ALTER TABLE writing_persona ADD COLUMN avatar_id VARCHAR(255);
ALTER TABLE writing_persona ADD COLUMN avatar_image_url VARCHAR(500);
ALTER TABLE writing_persona ADD COLUMN avatar_training_status VARCHAR(50);
ALTER TABLE writing_persona ADD COLUMN avatar_created_at TIMESTAMP;
Integration with Existing Systems
Story Writer Integration
Location: backend/services/story_writer/audio_generation_service.py
Enhancement:
def generate_scene_audio(
self,
scene: Dict[str, Any],
user_id: str,
use_persona_voice: bool = True, # NEW: Use persona voice
) -> Dict[str, Any]:
if use_persona_voice:
# Get user's persona
persona = get_persona(user_id)
if persona.voice_id and persona.voice_training_status == 'ready':
# Use persona voice
return self._generate_with_persona_voice(scene, persona)
# Fallback to default provider
return self._generate_with_gtts(scene)
LinkedIn Writer Integration
Enhancement: Add video generation with persona avatar
- LinkedIn video posts with user's avatar
- Voice-over with user's voice
- Consistent brand presence
Blog Writer Integration
Enhancement: Add audio/video options
- Audio narration with persona voice
- Video explanations with persona avatar
- Enhanced blog content
Success Metrics
Adoption Metrics
- Voice training completion rate (target: >60% of Pro users)
- Avatar creation completion rate (target: >50% of Pro users)
- Persona usage across platforms (target: >80% of content uses persona)
Quality Metrics
- Voice quality satisfaction (target: >4.5/5)
- Avatar quality satisfaction (target: >4.5/5)
- Brand consistency score (target: >90%)
Business Metrics
- User retention (persona users vs. non-persona)
- Content engagement (persona content vs. generic)
- Premium tier conversion (persona as differentiator)
Risk Mitigation
| Risk | Mitigation |
|---|---|
| Voice training failure | Quality checks, clear error messages, retry option |
| Avatar quality issues | Preview before approval, regeneration option |
| Cost concerns | Clear pricing, tier-based access, cost estimates |
| User privacy | Secure storage, opt-in consent, data encryption |
| API reliability | Fallback options, retry logic, error handling |
Privacy & Security
Data Storage
- Voice samples: Encrypted storage, deleted after training
- Avatar photos: Encrypted storage, user can delete
- Voice/Avatar IDs: Secure API keys, no raw data stored
User Control
- Users can delete voice/avatar anytime
- Users can retrain voice/avatar
- Users can opt-out of voice/avatar features
- Clear privacy policy
Next Steps
- Week 1: Set up Minimax API access
- Week 1-2: Implement voice persona service
- Week 2-3: Integrate into onboarding
- Week 3-4: Integrate into Story Writer
- Week 4-5: Set up WaveSpeed avatar API
- Week 5-6: Implement avatar persona service
- Week 6-7: Integrate into onboarding
- Week 7-8: Cross-platform integration
Document Version: 1.0
Last Updated: January 2025
Priority: HIGH - Core Hyper-Personalization Feature