Files
moreminimore-marketing/docs/PERSONA_VOICE_AVATAR_HYPERPERSONALIZATION.md
Kunthawat Greethong c35fa52117 Base code
2026-01-08 22:39:53 +07:00

18 KiB

Persona System: Voice Cloning & Avatar Hyper-Personalization

Executive Summary

This document outlines the integration of voice cloning and AI avatar capabilities into ALwrity's Persona System to enable true hyper-personalization. Users will train their voice and create their avatar during onboarding, then use these across all content generation (LinkedIn, Blog, Story Writer, etc.) for consistent brand identity.


Vision: AI Hyper-Personalization

Goal: Every piece of content generated by ALwrity should feel authentically "you" - not just in writing style, but in voice and visual presence.

Current State: Persona system handles writing style only
Target State: Persona system handles writing style + voice + avatar = complete brand identity


Current Persona System Analysis

Existing Capabilities

  • Writing Style Analysis: Tone, voice, complexity, engagement level
  • Platform Adaptation: LinkedIn, Facebook, Blog optimizations
  • Content Characteristics: Sentence structure, vocabulary, patterns
  • Onboarding Integration: Automatically generated from onboarding data

Current Limitations

  • No voice/personality in audio content
  • No visual representation
  • Limited to text-based personalization
  • Cannot create video content with user's presence

Persona System Architecture

Location: backend/services/persona_analysis_service.py

Current Flow:

  1. User completes onboarding (6 steps)
  2. System analyzes website content and writing style
  3. Core persona generated
  4. Platform-specific adaptations created
  5. Persona saved to database

Database Model: backend/models/persona_models.py - WritingPersona table


Proposed Enhancements

1. Voice Cloning Integration

1.1 Voice Training During Onboarding

Integration Point: Onboarding Step 6 (Persona Generation)

New Onboarding Flow:

Step 1-5: Existing onboarding steps
Step 6: Persona Generation
  ├─ Writing Style Analysis (existing)
  ├─ Voice Training (NEW)
  │   ├─ Audio sample upload (1-3 minutes)
  │   ├─ Voice clone training (~2-5 minutes)
  │   └─ Voice preview and approval
  └─ Avatar Creation (NEW)
      ├─ Photo upload
      ├─ Avatar generation
      └─ Avatar preview and approval

Implementation:

Backend: backend/services/persona/voice_persona_service.py (NEW)

class VoicePersonaService:
    """
    Manages voice cloning for persona system.
    Integrates with Minimax voice clone API.
    """
    
    def train_voice_from_audio(
        self,
        user_id: str,
        audio_file_path: str,
        persona_id: int,
    ) -> Dict[str, Any]:
        """
        Train voice clone from user's audio sample.
        Links voice to persona.
        """
        # 1. Validate audio file (format, length, quality)
        # 2. Upload to Minimax
        # 3. Train voice clone
        # 4. Store voice_id in persona
        # 5. Return training status
        pass
    
    def generate_audio_with_persona_voice(
        self,
        text: str,
        persona_id: int,
        emotion: str = "neutral",
        speed: float = 1.0,
    ) -> bytes:
        """
        Generate audio using persona's cloned voice.
        """
        # 1. Get voice_id from persona
        # 2. Call Minimax voice generation
        # 3. Return audio bytes
        pass

Database Schema Update: backend/models/persona_models.py

class WritingPersona(Base):
    # Existing fields...
    
    # NEW: Voice cloning fields
    voice_id: Optional[str] = Column(String(255), nullable=True)
    voice_training_status: Optional[str] = Column(String(50), nullable=True)  # 'not_trained', 'training', 'ready', 'failed'
    voice_training_audio_url: Optional[str] = Column(String(500), nullable=True)
    voice_trained_at: Optional[datetime] = Column(DateTime, nullable=True)
    
    # NEW: Avatar fields
    avatar_id: Optional[str] = Column(String(255), nullable=True)
    avatar_image_url: Optional[str] = Column(String(500), nullable=True)
    avatar_training_status: Optional[str] = Column(String(50), nullable=True)
    avatar_created_at: Optional[datetime] = Column(DateTime, nullable=True)

Frontend: frontend/src/components/Onboarding/PersonaGenerationStep.tsx (NEW)

interface PersonaGenerationStepProps {
  onboardingData: OnboardingData;
  onComplete: (persona: Persona) => void;
}

const PersonaGenerationStep: React.FC<PersonaGenerationStepProps> = ({
  onboardingData,
  onComplete,
}) => {
  // 1. Show writing style analysis progress
  // 2. Show voice training section
  // 3. Show avatar creation section
  // 4. Preview complete persona
  // 5. Allow approval/modification
};

1.2 Voice Usage Across Platform

Integration Points:

  • Story Writer: Use persona voice for audio narration
  • LinkedIn: Voice-over for video posts
  • Blog: Audio narration for blog posts
  • Email: Personalized voice messages
  • Social Media: Video content with user's voice

Implementation Pattern:

# In any content generation service
def generate_content_with_persona(user_id: str, content_type: str):
    # 1. Get user's persona
    persona = get_persona(user_id)
    
    # 2. Generate text content (existing)
    text_content = generate_text(persona)
    
    # 3. Generate audio with persona voice (NEW)
    if persona.voice_id and persona.voice_training_status == 'ready':
        audio_content = voice_service.generate_audio_with_persona_voice(
            text=text_content,
            persona_id=persona.id,
        )
    
    # 4. Generate video with persona avatar (NEW)
    if persona.avatar_id:
        video_content = avatar_service.generate_video_with_persona_avatar(
            text=text_content,
            audio=audio_content,
            persona_id=persona.id,
        )
    
    return {
        'text': text_content,
        'audio': audio_content,
        'video': video_content,
    }

2. Avatar Creation Integration

2.1 Avatar Training During Onboarding

Integration Point: Onboarding Step 6 (Persona Generation)

Avatar Options:

  1. Hunyuan Avatar: Talking avatar from photo + audio
  2. InfiniteTalk: Long-form avatar videos
  3. Custom Avatar: User's photo as avatar base

Implementation:

Backend: backend/services/persona/avatar_persona_service.py (NEW)

class AvatarPersonaService:
    """
    Manages avatar creation for persona system.
    Integrates with WaveSpeed Hunyuan Avatar and InfiniteTalk.
    """
    
    def create_avatar_from_photo(
        self,
        user_id: str,
        photo_file_path: str,
        persona_id: int,
    ) -> Dict[str, Any]:
        """
        Create avatar from user's photo.
        Uses Hunyuan Avatar for initial creation.
        """
        # 1. Validate photo (format, size, quality)
        # 2. Upload to WaveSpeed
        # 3. Create avatar
        # 4. Store avatar_id in persona
        # 5. Return avatar preview
        pass
    
    def generate_video_with_persona_avatar(
        self,
        text: str,
        audio_bytes: bytes,
        persona_id: int,
        duration: int = 60,  # seconds
    ) -> bytes:
        """
        Generate video with persona's avatar speaking.
        Uses InfiniteTalk for long-form, Hunyuan for short.
        """
        # 1. Get avatar_id from persona
        # 2. Get voice_id from persona (for audio)
        # 3. Call WaveSpeed API
        # 4. Return video bytes
        pass

2.2 Avatar Usage Across Platform

Use Cases:

  • LinkedIn Video Posts: User's avatar presenting content
  • Story Writer: Avatar narrating story scenes
  • Blog Videos: Avatar explaining blog content
  • Email Campaigns: Personalized video messages
  • Social Media: Consistent avatar across platforms

3. Enhanced Persona Management

3.1 Persona Dashboard

New UI Component: frontend/src/components/Persona/PersonaDashboard.tsx

Features:

  • Persona overview (writing style, voice, avatar)
  • Voice training status and preview
  • Avatar preview and management
  • Usage statistics (where persona is used)
  • Edit/update options

3.2 Persona Settings

New UI Component: frontend/src/components/Persona/PersonaSettings.tsx

Settings:

  • Voice parameters (emotion, speed, tone)
  • Avatar appearance (clothing, background, style)
  • Platform-specific adaptations
  • Content type preferences

Implementation Phases

Phase 1: Voice Cloning Integration (Week 1-3)

Priority: HIGH - Core hyper-personalization feature

Tasks:

  1. Create VoicePersonaService
  2. Integrate Minimax voice clone API
  3. Add voice fields to WritingPersona model
  4. Update onboarding Step 6 with voice training
  5. Create voice training UI component
  6. Add voice preview and testing
  7. Integrate voice into Story Writer
  8. Add voice usage tracking
  9. Update persona dashboard
  10. Testing and optimization

Files to Create:

  • backend/services/persona/voice_persona_service.py
  • frontend/src/components/Onboarding/VoiceTrainingSection.tsx
  • frontend/src/components/Persona/VoiceManagement.tsx

Files to Modify:

  • backend/models/persona_models.py
  • backend/services/persona_analysis_service.py
  • backend/api/onboarding_utils/ (onboarding routes)
  • frontend/src/components/Onboarding/PersonaGenerationStep.tsx
  • backend/services/story_writer/audio_generation_service.py

Success Criteria:

  • Users can train voice during onboarding
  • Voice used automatically in Story Writer
  • Voice quality significantly better than gTTS
  • Voice linked to persona
  • Cost tracking accurate

Phase 2: Avatar Creation Integration (Week 4-6)

Priority: HIGH - Visual personalization

Tasks:

  1. Create AvatarPersonaService
  2. Integrate Hunyuan Avatar API
  3. Add avatar fields to WritingPersona model
  4. Update onboarding Step 6 with avatar creation
  5. Create avatar creation UI component
  6. Add avatar preview and testing
  7. Integrate avatar into content generation
  8. Add avatar usage tracking
  9. Update persona dashboard
  10. Testing and optimization

Files to Create:

  • backend/services/persona/avatar_persona_service.py
  • frontend/src/components/Onboarding/AvatarCreationSection.tsx
  • frontend/src/components/Persona/AvatarManagement.tsx

Files to Modify:

  • backend/models/persona_models.py
  • backend/services/persona_analysis_service.py
  • frontend/src/components/Onboarding/PersonaGenerationStep.tsx
  • backend/services/story_writer/video_generation_service.py

Success Criteria:

  • Users can create avatar during onboarding
  • Avatar used in video content generation
  • Avatar quality good
  • Avatar linked to persona
  • Cost tracking accurate

Phase 3: Cross-Platform Integration (Week 7-8)

Priority: MEDIUM - Complete hyper-personalization

Tasks:

  1. Integrate persona voice into LinkedIn Writer
  2. Integrate persona avatar into LinkedIn Writer
  3. Integrate persona voice into Blog Writer
  4. Integrate persona avatar into Blog Writer
  5. Add persona usage analytics
  6. Update all content generation services
  7. Create persona usage dashboard
  8. Documentation and user guides

Success Criteria:

  • Persona voice/avatar used across all platforms
  • Consistent brand identity
  • Good user experience
  • Analytics working

Cost Management

Voice Cloning Costs

One-Time Training: $0.75 per voice Per-Minute Generation: $0.02 per minute

Cost Optimization:

  • Train voice once during onboarding (included in Pro/Enterprise)
  • Free tier: gTTS only
  • Basic tier: Voice training available ($0.75 one-time)
  • Pro/Enterprise: Voice training included

Avatar Creation Costs

Hunyuan Avatar: $0.15-0.30 per 5 seconds InfiniteTalk: $0.15-0.30 per 5 seconds (up to 10 minutes)

Cost Optimization:

  • Avatar creation: One-time during onboarding
  • Video generation: Pay-per-use
  • Default to shorter videos (5 seconds)
  • Allow longer videos for premium users

Subscription Integration

Update Subscription Tiers:

  • Free: Writing persona only, no voice/avatar
  • Basic: Writing persona + voice training ($0.75 one-time)
  • Pro: Writing persona + voice + avatar creation included
  • Enterprise: All features + unlimited usage

User Experience Flow

Onboarding Flow (Enhanced)

Step 1-5: Existing onboarding steps
         ↓
Step 6: Persona Generation
         ├─ Writing Style Analysis
         │   └─ [Progress: Analyzing your writing style...]
         │
         ├─ Voice Training (NEW)
         │   ├─ Upload audio sample (1-3 minutes)
         │   ├─ [Training your voice...] (~2-5 minutes)
         │   ├─ Preview generated voice
         │   └─ Approve or retrain
         │
         └─ Avatar Creation (NEW)
             ├─ Upload photo
             ├─ [Creating your avatar...] (~1-2 minutes)
             ├─ Preview avatar
             └─ Approve or recreate
         ↓
Step 7: Persona Preview
         ├─ Writing Style Summary
         ├─ Voice Preview
         ├─ Avatar Preview
         └─ Approve Complete Persona

Content Generation Flow (Enhanced)

User creates content (LinkedIn/Blog/Story)
         ↓
System loads user's persona
         ├─ Writing style → Text generation
         ├─ Voice ID → Audio generation (if available)
         └─ Avatar ID → Video generation (if available)
         ↓
Content generated with full personalization
         ├─ Text matches writing style
         ├─ Audio uses user's voice
         └─ Video shows user's avatar

Technical Architecture

Backend Services

backend/services/
├── persona/
│   ├── __init__.py
│   ├── voice_persona_service.py      # NEW: Voice cloning
│   ├── avatar_persona_service.py     # NEW: Avatar creation
│   └── persona_analysis_service.py    # Enhanced
├── minimax/
│   └── voice_clone.py                 # Shared with Story Writer
└── wavespeed/
    └── avatar_generation.py           # Shared with Story Writer

Frontend Components

frontend/src/components/
├── Onboarding/
│   ├── PersonaGenerationStep.tsx       # Enhanced
│   ├── VoiceTrainingSection.tsx       # NEW
│   └── AvatarCreationSection.tsx       # NEW
└── Persona/
    ├── PersonaDashboard.tsx            # NEW
    ├── VoiceManagement.tsx             # NEW
    ├── AvatarManagement.tsx            # NEW
    └── PersonaSettings.tsx             # NEW

Database Schema

-- Enhanced WritingPersona table
ALTER TABLE writing_persona ADD COLUMN voice_id VARCHAR(255);
ALTER TABLE writing_persona ADD COLUMN voice_training_status VARCHAR(50);
ALTER TABLE writing_persona ADD COLUMN voice_training_audio_url VARCHAR(500);
ALTER TABLE writing_persona ADD COLUMN voice_trained_at TIMESTAMP;

ALTER TABLE writing_persona ADD COLUMN avatar_id VARCHAR(255);
ALTER TABLE writing_persona ADD COLUMN avatar_image_url VARCHAR(500);
ALTER TABLE writing_persona ADD COLUMN avatar_training_status VARCHAR(50);
ALTER TABLE writing_persona ADD COLUMN avatar_created_at TIMESTAMP;

Integration with Existing Systems

Story Writer Integration

Location: backend/services/story_writer/audio_generation_service.py

Enhancement:

def generate_scene_audio(
    self,
    scene: Dict[str, Any],
    user_id: str,
    use_persona_voice: bool = True,  # NEW: Use persona voice
) -> Dict[str, Any]:
    if use_persona_voice:
        # Get user's persona
        persona = get_persona(user_id)
        if persona.voice_id and persona.voice_training_status == 'ready':
            # Use persona voice
            return self._generate_with_persona_voice(scene, persona)
    
    # Fallback to default provider
    return self._generate_with_gtts(scene)

LinkedIn Writer Integration

Enhancement: Add video generation with persona avatar

  • LinkedIn video posts with user's avatar
  • Voice-over with user's voice
  • Consistent brand presence

Blog Writer Integration

Enhancement: Add audio/video options

  • Audio narration with persona voice
  • Video explanations with persona avatar
  • Enhanced blog content

Success Metrics

Adoption Metrics

  • Voice training completion rate (target: >60% of Pro users)
  • Avatar creation completion rate (target: >50% of Pro users)
  • Persona usage across platforms (target: >80% of content uses persona)

Quality Metrics

  • Voice quality satisfaction (target: >4.5/5)
  • Avatar quality satisfaction (target: >4.5/5)
  • Brand consistency score (target: >90%)

Business Metrics

  • User retention (persona users vs. non-persona)
  • Content engagement (persona content vs. generic)
  • Premium tier conversion (persona as differentiator)

Risk Mitigation

Risk Mitigation
Voice training failure Quality checks, clear error messages, retry option
Avatar quality issues Preview before approval, regeneration option
Cost concerns Clear pricing, tier-based access, cost estimates
User privacy Secure storage, opt-in consent, data encryption
API reliability Fallback options, retry logic, error handling

Privacy & Security

Data Storage

  • Voice samples: Encrypted storage, deleted after training
  • Avatar photos: Encrypted storage, user can delete
  • Voice/Avatar IDs: Secure API keys, no raw data stored

User Control

  • Users can delete voice/avatar anytime
  • Users can retrain voice/avatar
  • Users can opt-out of voice/avatar features
  • Clear privacy policy

Next Steps

  1. Week 1: Set up Minimax API access
  2. Week 1-2: Implement voice persona service
  3. Week 2-3: Integrate into onboarding
  4. Week 3-4: Integrate into Story Writer
  5. Week 4-5: Set up WaveSpeed avatar API
  6. Week 5-6: Implement avatar persona service
  7. Week 6-7: Integrate into onboarding
  8. Week 7-8: Cross-platform integration

Document Version: 1.0
Last Updated: January 2025
Priority: HIGH - Core Hyper-Personalization Feature