# Persona System: Voice Cloning & Avatar Hyper-Personalization ## Executive Summary This document outlines the integration of voice cloning and AI avatar capabilities into ALwrity's Persona System to enable true hyper-personalization. Users will train their voice and create their avatar during onboarding, then use these across all content generation (LinkedIn, Blog, Story Writer, etc.) for consistent brand identity. --- ## Vision: AI Hyper-Personalization **Goal**: Every piece of content generated by ALwrity should feel authentically "you" - not just in writing style, but in voice and visual presence. **Current State**: Persona system handles writing style only **Target State**: Persona system handles writing style + voice + avatar = complete brand identity --- ## Current Persona System Analysis ### Existing Capabilities - **Writing Style Analysis**: Tone, voice, complexity, engagement level - **Platform Adaptation**: LinkedIn, Facebook, Blog optimizations - **Content Characteristics**: Sentence structure, vocabulary, patterns - **Onboarding Integration**: Automatically generated from onboarding data ### Current Limitations - No voice/personality in audio content - No visual representation - Limited to text-based personalization - Cannot create video content with user's presence ### Persona System Architecture **Location**: `backend/services/persona_analysis_service.py` **Current Flow**: 1. User completes onboarding (6 steps) 2. System analyzes website content and writing style 3. Core persona generated 4. Platform-specific adaptations created 5. Persona saved to database **Database Model**: `backend/models/persona_models.py` - `WritingPersona` table --- ## Proposed Enhancements ### 1. Voice Cloning Integration #### 1.1 Voice Training During Onboarding **Integration Point**: Onboarding Step 6 (Persona Generation) **New Onboarding Flow**: ``` Step 1-5: Existing onboarding steps Step 6: Persona Generation ├─ Writing Style Analysis (existing) ├─ Voice Training (NEW) │ ├─ Audio sample upload (1-3 minutes) │ ├─ Voice clone training (~2-5 minutes) │ └─ Voice preview and approval └─ Avatar Creation (NEW) ├─ Photo upload ├─ Avatar generation └─ Avatar preview and approval ``` **Implementation**: **Backend**: `backend/services/persona/voice_persona_service.py` (NEW) ```python class VoicePersonaService: """ Manages voice cloning for persona system. Integrates with Minimax voice clone API. """ def train_voice_from_audio( self, user_id: str, audio_file_path: str, persona_id: int, ) -> Dict[str, Any]: """ Train voice clone from user's audio sample. Links voice to persona. """ # 1. Validate audio file (format, length, quality) # 2. Upload to Minimax # 3. Train voice clone # 4. Store voice_id in persona # 5. Return training status pass def generate_audio_with_persona_voice( self, text: str, persona_id: int, emotion: str = "neutral", speed: float = 1.0, ) -> bytes: """ Generate audio using persona's cloned voice. """ # 1. Get voice_id from persona # 2. Call Minimax voice generation # 3. Return audio bytes pass ``` **Database Schema Update**: `backend/models/persona_models.py` ```python class WritingPersona(Base): # Existing fields... # NEW: Voice cloning fields voice_id: Optional[str] = Column(String(255), nullable=True) voice_training_status: Optional[str] = Column(String(50), nullable=True) # 'not_trained', 'training', 'ready', 'failed' voice_training_audio_url: Optional[str] = Column(String(500), nullable=True) voice_trained_at: Optional[datetime] = Column(DateTime, nullable=True) # NEW: Avatar fields avatar_id: Optional[str] = Column(String(255), nullable=True) avatar_image_url: Optional[str] = Column(String(500), nullable=True) avatar_training_status: Optional[str] = Column(String(50), nullable=True) avatar_created_at: Optional[datetime] = Column(DateTime, nullable=True) ``` **Frontend**: `frontend/src/components/Onboarding/PersonaGenerationStep.tsx` (NEW) ```typescript interface PersonaGenerationStepProps { onboardingData: OnboardingData; onComplete: (persona: Persona) => void; } const PersonaGenerationStep: React.FC = ({ onboardingData, onComplete, }) => { // 1. Show writing style analysis progress // 2. Show voice training section // 3. Show avatar creation section // 4. Preview complete persona // 5. Allow approval/modification }; ``` #### 1.2 Voice Usage Across Platform **Integration Points**: - **Story Writer**: Use persona voice for audio narration - **LinkedIn**: Voice-over for video posts - **Blog**: Audio narration for blog posts - **Email**: Personalized voice messages - **Social Media**: Video content with user's voice **Implementation Pattern**: ```python # In any content generation service def generate_content_with_persona(user_id: str, content_type: str): # 1. Get user's persona persona = get_persona(user_id) # 2. Generate text content (existing) text_content = generate_text(persona) # 3. Generate audio with persona voice (NEW) if persona.voice_id and persona.voice_training_status == 'ready': audio_content = voice_service.generate_audio_with_persona_voice( text=text_content, persona_id=persona.id, ) # 4. Generate video with persona avatar (NEW) if persona.avatar_id: video_content = avatar_service.generate_video_with_persona_avatar( text=text_content, audio=audio_content, persona_id=persona.id, ) return { 'text': text_content, 'audio': audio_content, 'video': video_content, } ``` --- ### 2. Avatar Creation Integration #### 2.1 Avatar Training During Onboarding **Integration Point**: Onboarding Step 6 (Persona Generation) **Avatar Options**: 1. **Hunyuan Avatar**: Talking avatar from photo + audio 2. **InfiniteTalk**: Long-form avatar videos 3. **Custom Avatar**: User's photo as avatar base **Implementation**: **Backend**: `backend/services/persona/avatar_persona_service.py` (NEW) ```python class AvatarPersonaService: """ Manages avatar creation for persona system. Integrates with WaveSpeed Hunyuan Avatar and InfiniteTalk. """ def create_avatar_from_photo( self, user_id: str, photo_file_path: str, persona_id: int, ) -> Dict[str, Any]: """ Create avatar from user's photo. Uses Hunyuan Avatar for initial creation. """ # 1. Validate photo (format, size, quality) # 2. Upload to WaveSpeed # 3. Create avatar # 4. Store avatar_id in persona # 5. Return avatar preview pass def generate_video_with_persona_avatar( self, text: str, audio_bytes: bytes, persona_id: int, duration: int = 60, # seconds ) -> bytes: """ Generate video with persona's avatar speaking. Uses InfiniteTalk for long-form, Hunyuan for short. """ # 1. Get avatar_id from persona # 2. Get voice_id from persona (for audio) # 3. Call WaveSpeed API # 4. Return video bytes pass ``` #### 2.2 Avatar Usage Across Platform **Use Cases**: - **LinkedIn Video Posts**: User's avatar presenting content - **Story Writer**: Avatar narrating story scenes - **Blog Videos**: Avatar explaining blog content - **Email Campaigns**: Personalized video messages - **Social Media**: Consistent avatar across platforms --- ### 3. Enhanced Persona Management #### 3.1 Persona Dashboard **New UI Component**: `frontend/src/components/Persona/PersonaDashboard.tsx` **Features**: - Persona overview (writing style, voice, avatar) - Voice training status and preview - Avatar preview and management - Usage statistics (where persona is used) - Edit/update options #### 3.2 Persona Settings **New UI Component**: `frontend/src/components/Persona/PersonaSettings.tsx` **Settings**: - Voice parameters (emotion, speed, tone) - Avatar appearance (clothing, background, style) - Platform-specific adaptations - Content type preferences --- ## Implementation Phases ### Phase 1: Voice Cloning Integration (Week 1-3) **Priority**: HIGH - Core hyper-personalization feature **Tasks**: 1. ✅ Create `VoicePersonaService` 2. ✅ Integrate Minimax voice clone API 3. ✅ Add voice fields to `WritingPersona` model 4. ✅ Update onboarding Step 6 with voice training 5. ✅ Create voice training UI component 6. ✅ Add voice preview and testing 7. ✅ Integrate voice into Story Writer 8. ✅ Add voice usage tracking 9. ✅ Update persona dashboard 10. ✅ Testing and optimization **Files to Create**: - `backend/services/persona/voice_persona_service.py` - `frontend/src/components/Onboarding/VoiceTrainingSection.tsx` - `frontend/src/components/Persona/VoiceManagement.tsx` **Files to Modify**: - `backend/models/persona_models.py` - `backend/services/persona_analysis_service.py` - `backend/api/onboarding_utils/` (onboarding routes) - `frontend/src/components/Onboarding/PersonaGenerationStep.tsx` - `backend/services/story_writer/audio_generation_service.py` **Success Criteria**: - Users can train voice during onboarding - Voice used automatically in Story Writer - Voice quality significantly better than gTTS - Voice linked to persona - Cost tracking accurate --- ### Phase 2: Avatar Creation Integration (Week 4-6) **Priority**: HIGH - Visual personalization **Tasks**: 1. ✅ Create `AvatarPersonaService` 2. ✅ Integrate Hunyuan Avatar API 3. ✅ Add avatar fields to `WritingPersona` model 4. ✅ Update onboarding Step 6 with avatar creation 5. ✅ Create avatar creation UI component 6. ✅ Add avatar preview and testing 7. ✅ Integrate avatar into content generation 8. ✅ Add avatar usage tracking 9. ✅ Update persona dashboard 10. ✅ Testing and optimization **Files to Create**: - `backend/services/persona/avatar_persona_service.py` - `frontend/src/components/Onboarding/AvatarCreationSection.tsx` - `frontend/src/components/Persona/AvatarManagement.tsx` **Files to Modify**: - `backend/models/persona_models.py` - `backend/services/persona_analysis_service.py` - `frontend/src/components/Onboarding/PersonaGenerationStep.tsx` - `backend/services/story_writer/video_generation_service.py` **Success Criteria**: - Users can create avatar during onboarding - Avatar used in video content generation - Avatar quality good - Avatar linked to persona - Cost tracking accurate --- ### Phase 3: Cross-Platform Integration (Week 7-8) **Priority**: MEDIUM - Complete hyper-personalization **Tasks**: 1. ✅ Integrate persona voice into LinkedIn Writer 2. ✅ Integrate persona avatar into LinkedIn Writer 3. ✅ Integrate persona voice into Blog Writer 4. ✅ Integrate persona avatar into Blog Writer 5. ✅ Add persona usage analytics 6. ✅ Update all content generation services 7. ✅ Create persona usage dashboard 8. ✅ Documentation and user guides **Success Criteria**: - Persona voice/avatar used across all platforms - Consistent brand identity - Good user experience - Analytics working --- ## Cost Management ### Voice Cloning Costs **One-Time Training**: $0.75 per voice **Per-Minute Generation**: $0.02 per minute **Cost Optimization**: - Train voice once during onboarding (included in Pro/Enterprise) - Free tier: gTTS only - Basic tier: Voice training available ($0.75 one-time) - Pro/Enterprise: Voice training included ### Avatar Creation Costs **Hunyuan Avatar**: $0.15-0.30 per 5 seconds **InfiniteTalk**: $0.15-0.30 per 5 seconds (up to 10 minutes) **Cost Optimization**: - Avatar creation: One-time during onboarding - Video generation: Pay-per-use - Default to shorter videos (5 seconds) - Allow longer videos for premium users ### Subscription Integration **Update Subscription Tiers**: - **Free**: Writing persona only, no voice/avatar - **Basic**: Writing persona + voice training ($0.75 one-time) - **Pro**: Writing persona + voice + avatar creation included - **Enterprise**: All features + unlimited usage --- ## User Experience Flow ### Onboarding Flow (Enhanced) ``` Step 1-5: Existing onboarding steps ↓ Step 6: Persona Generation ├─ Writing Style Analysis │ └─ [Progress: Analyzing your writing style...] │ ├─ Voice Training (NEW) │ ├─ Upload audio sample (1-3 minutes) │ ├─ [Training your voice...] (~2-5 minutes) │ ├─ Preview generated voice │ └─ Approve or retrain │ └─ Avatar Creation (NEW) ├─ Upload photo ├─ [Creating your avatar...] (~1-2 minutes) ├─ Preview avatar └─ Approve or recreate ↓ Step 7: Persona Preview ├─ Writing Style Summary ├─ Voice Preview ├─ Avatar Preview └─ Approve Complete Persona ``` ### Content Generation Flow (Enhanced) ``` User creates content (LinkedIn/Blog/Story) ↓ System loads user's persona ├─ Writing style → Text generation ├─ Voice ID → Audio generation (if available) └─ Avatar ID → Video generation (if available) ↓ Content generated with full personalization ├─ Text matches writing style ├─ Audio uses user's voice └─ Video shows user's avatar ``` --- ## Technical Architecture ### Backend Services ``` backend/services/ ├── persona/ │ ├── __init__.py │ ├── voice_persona_service.py # NEW: Voice cloning │ ├── avatar_persona_service.py # NEW: Avatar creation │ └── persona_analysis_service.py # Enhanced ├── minimax/ │ └── voice_clone.py # Shared with Story Writer └── wavespeed/ └── avatar_generation.py # Shared with Story Writer ``` ### Frontend Components ``` frontend/src/components/ ├── Onboarding/ │ ├── PersonaGenerationStep.tsx # Enhanced │ ├── VoiceTrainingSection.tsx # NEW │ └── AvatarCreationSection.tsx # NEW └── Persona/ ├── PersonaDashboard.tsx # NEW ├── VoiceManagement.tsx # NEW ├── AvatarManagement.tsx # NEW └── PersonaSettings.tsx # NEW ``` ### Database Schema ```sql -- Enhanced WritingPersona table ALTER TABLE writing_persona ADD COLUMN voice_id VARCHAR(255); ALTER TABLE writing_persona ADD COLUMN voice_training_status VARCHAR(50); ALTER TABLE writing_persona ADD COLUMN voice_training_audio_url VARCHAR(500); ALTER TABLE writing_persona ADD COLUMN voice_trained_at TIMESTAMP; ALTER TABLE writing_persona ADD COLUMN avatar_id VARCHAR(255); ALTER TABLE writing_persona ADD COLUMN avatar_image_url VARCHAR(500); ALTER TABLE writing_persona ADD COLUMN avatar_training_status VARCHAR(50); ALTER TABLE writing_persona ADD COLUMN avatar_created_at TIMESTAMP; ``` --- ## Integration with Existing Systems ### Story Writer Integration **Location**: `backend/services/story_writer/audio_generation_service.py` **Enhancement**: ```python def generate_scene_audio( self, scene: Dict[str, Any], user_id: str, use_persona_voice: bool = True, # NEW: Use persona voice ) -> Dict[str, Any]: if use_persona_voice: # Get user's persona persona = get_persona(user_id) if persona.voice_id and persona.voice_training_status == 'ready': # Use persona voice return self._generate_with_persona_voice(scene, persona) # Fallback to default provider return self._generate_with_gtts(scene) ``` ### LinkedIn Writer Integration **Enhancement**: Add video generation with persona avatar - LinkedIn video posts with user's avatar - Voice-over with user's voice - Consistent brand presence ### Blog Writer Integration **Enhancement**: Add audio/video options - Audio narration with persona voice - Video explanations with persona avatar - Enhanced blog content --- ## Success Metrics ### Adoption Metrics - Voice training completion rate (target: >60% of Pro users) - Avatar creation completion rate (target: >50% of Pro users) - Persona usage across platforms (target: >80% of content uses persona) ### Quality Metrics - Voice quality satisfaction (target: >4.5/5) - Avatar quality satisfaction (target: >4.5/5) - Brand consistency score (target: >90%) ### Business Metrics - User retention (persona users vs. non-persona) - Content engagement (persona content vs. generic) - Premium tier conversion (persona as differentiator) --- ## Risk Mitigation | Risk | Mitigation | |------|------------| | Voice training failure | Quality checks, clear error messages, retry option | | Avatar quality issues | Preview before approval, regeneration option | | Cost concerns | Clear pricing, tier-based access, cost estimates | | User privacy | Secure storage, opt-in consent, data encryption | | API reliability | Fallback options, retry logic, error handling | --- ## Privacy & Security ### Data Storage - Voice samples: Encrypted storage, deleted after training - Avatar photos: Encrypted storage, user can delete - Voice/Avatar IDs: Secure API keys, no raw data stored ### User Control - Users can delete voice/avatar anytime - Users can retrain voice/avatar - Users can opt-out of voice/avatar features - Clear privacy policy --- ## Next Steps 1. **Week 1**: Set up Minimax API access 2. **Week 1-2**: Implement voice persona service 3. **Week 2-3**: Integrate into onboarding 4. **Week 3-4**: Integrate into Story Writer 5. **Week 4-5**: Set up WaveSpeed avatar API 6. **Week 5-6**: Implement avatar persona service 7. **Week 6-7**: Integrate into onboarding 8. **Week 7-8**: Cross-platform integration --- *Document Version: 1.0* *Last Updated: January 2025* *Priority: HIGH - Core Hyper-Personalization Feature*