Base code

2026-01-08 22:39:53 +07:00
parent 697115c61a
commit c35fa52117
2169 changed files with 626670 additions and 0 deletions
--- a/docs/PERSONA_VOICE_AVATAR_HYPERPERSONALIZATION.md
+++ b/docs/PERSONA_VOICE_AVATAR_HYPERPERSONALIZATION.md
@@ -0,0 +1,615 @@
+# Persona System: Voice Cloning & Avatar Hyper-Personalization
+
+## Executive Summary
+
+This document outlines the integration of voice cloning and AI avatar capabilities into ALwrity's Persona System to enable true hyper-personalization. Users will train their voice and create their avatar during onboarding, then use these across all content generation (LinkedIn, Blog, Story Writer, etc.) for consistent brand identity.
+
+---
+
+## Vision: AI Hyper-Personalization
+
+**Goal**: Every piece of content generated by ALwrity should feel authentically "you" - not just in writing style, but in voice and visual presence.
+
+**Current State**: Persona system handles writing style only  
+**Target State**: Persona system handles writing style + voice + avatar = complete brand identity
+
+---
+
+## Current Persona System Analysis
+
+### Existing Capabilities
+- **Writing Style Analysis**: Tone, voice, complexity, engagement level
+- **Platform Adaptation**: LinkedIn, Facebook, Blog optimizations
+- **Content Characteristics**: Sentence structure, vocabulary, patterns
+- **Onboarding Integration**: Automatically generated from onboarding data
+
+### Current Limitations
+- No voice/personality in audio content
+- No visual representation
+- Limited to text-based personalization
+- Cannot create video content with user's presence
+
+### Persona System Architecture
+**Location**: `backend/services/persona_analysis_service.py`
+
+**Current Flow**:
+1. User completes onboarding (6 steps)
+2. System analyzes website content and writing style
+3. Core persona generated
+4. Platform-specific adaptations created
+5. Persona saved to database
+
+**Database Model**: `backend/models/persona_models.py` - `WritingPersona` table
+
+---
+
+## Proposed Enhancements
+
+### 1. Voice Cloning Integration
+
+#### 1.1 Voice Training During Onboarding
+
+**Integration Point**: Onboarding Step 6 (Persona Generation)
+
+**New Onboarding Flow**:
+```
+Step 1-5: Existing onboarding steps
+Step 6: Persona Generation
+  ├─ Writing Style Analysis (existing)
+  ├─ Voice Training (NEW)
+  │   ├─ Audio sample upload (1-3 minutes)
+  │   ├─ Voice clone training (~2-5 minutes)
+  │   └─ Voice preview and approval
+  └─ Avatar Creation (NEW)
+      ├─ Photo upload
+      ├─ Avatar generation
+      └─ Avatar preview and approval
+```
+
+**Implementation**:
+
+**Backend**: `backend/services/persona/voice_persona_service.py` (NEW)
+```python
+class VoicePersonaService:
+    """
+    Manages voice cloning for persona system.
+    Integrates with Minimax voice clone API.
+    """
+    
+    def train_voice_from_audio(
+        self,
+        user_id: str,
+        audio_file_path: str,
+        persona_id: int,
+    ) -> Dict[str, Any]:
+        """
+        Train voice clone from user's audio sample.
+        Links voice to persona.
+        """
+        # 1. Validate audio file (format, length, quality)
+        # 2. Upload to Minimax
+        # 3. Train voice clone
+        # 4. Store voice_id in persona
+        # 5. Return training status
+        pass
+    
+    def generate_audio_with_persona_voice(
+        self,
+        text: str,
+        persona_id: int,
+        emotion: str = "neutral",
+        speed: float = 1.0,
+    ) -> bytes:
+        """
+        Generate audio using persona's cloned voice.
+        """
+        # 1. Get voice_id from persona
+        # 2. Call Minimax voice generation
+        # 3. Return audio bytes
+        pass
+```
+
+**Database Schema Update**: `backend/models/persona_models.py`
+```python
+class WritingPersona(Base):
+    # Existing fields...
+    
+    # NEW: Voice cloning fields
+    voice_id: Optional[str] = Column(String(255), nullable=True)
+    voice_training_status: Optional[str] = Column(String(50), nullable=True)  # 'not_trained', 'training', 'ready', 'failed'
+    voice_training_audio_url: Optional[str] = Column(String(500), nullable=True)
+    voice_trained_at: Optional[datetime] = Column(DateTime, nullable=True)
+    
+    # NEW: Avatar fields
+    avatar_id: Optional[str] = Column(String(255), nullable=True)
+    avatar_image_url: Optional[str] = Column(String(500), nullable=True)
+    avatar_training_status: Optional[str] = Column(String(50), nullable=True)
+    avatar_created_at: Optional[datetime] = Column(DateTime, nullable=True)
+```
+
+**Frontend**: `frontend/src/components/Onboarding/PersonaGenerationStep.tsx` (NEW)
+```typescript
+interface PersonaGenerationStepProps {
+  onboardingData: OnboardingData;
+  onComplete: (persona: Persona) => void;
+}
+
+const PersonaGenerationStep: React.FC<PersonaGenerationStepProps> = ({
+  onboardingData,
+  onComplete,
+}) => {
+  // 1. Show writing style analysis progress
+  // 2. Show voice training section
+  // 3. Show avatar creation section
+  // 4. Preview complete persona
+  // 5. Allow approval/modification
+};
+```
+
+#### 1.2 Voice Usage Across Platform
+
+**Integration Points**:
+- **Story Writer**: Use persona voice for audio narration
+- **LinkedIn**: Voice-over for video posts
+- **Blog**: Audio narration for blog posts
+- **Email**: Personalized voice messages
+- **Social Media**: Video content with user's voice
+
+**Implementation Pattern**:
+```python
+# In any content generation service
+def generate_content_with_persona(user_id: str, content_type: str):
+    # 1. Get user's persona
+    persona = get_persona(user_id)
+    
+    # 2. Generate text content (existing)
+    text_content = generate_text(persona)
+    
+    # 3. Generate audio with persona voice (NEW)
+    if persona.voice_id and persona.voice_training_status == 'ready':
+        audio_content = voice_service.generate_audio_with_persona_voice(
+            text=text_content,
+            persona_id=persona.id,
+        )
+    
+    # 4. Generate video with persona avatar (NEW)
+    if persona.avatar_id:
+        video_content = avatar_service.generate_video_with_persona_avatar(
+            text=text_content,
+            audio=audio_content,
+            persona_id=persona.id,
+        )
+    
+    return {
+        'text': text_content,
+        'audio': audio_content,
+        'video': video_content,
+    }
+```
+
+---
+
+### 2. Avatar Creation Integration
+
+#### 2.1 Avatar Training During Onboarding
+
+**Integration Point**: Onboarding Step 6 (Persona Generation)
+
+**Avatar Options**:
+1. **Hunyuan Avatar**: Talking avatar from photo + audio
+2. **InfiniteTalk**: Long-form avatar videos
+3. **Custom Avatar**: User's photo as avatar base
+
+**Implementation**:
+
+**Backend**: `backend/services/persona/avatar_persona_service.py` (NEW)
+```python
+class AvatarPersonaService:
+    """
+    Manages avatar creation for persona system.
+    Integrates with WaveSpeed Hunyuan Avatar and InfiniteTalk.
+    """
+    
+    def create_avatar_from_photo(
+        self,
+        user_id: str,
+        photo_file_path: str,
+        persona_id: int,
+    ) -> Dict[str, Any]:
+        """
+        Create avatar from user's photo.
+        Uses Hunyuan Avatar for initial creation.
+        """
+        # 1. Validate photo (format, size, quality)
+        # 2. Upload to WaveSpeed
+        # 3. Create avatar
+        # 4. Store avatar_id in persona
+        # 5. Return avatar preview
+        pass
+    
+    def generate_video_with_persona_avatar(
+        self,
+        text: str,
+        audio_bytes: bytes,
+        persona_id: int,
+        duration: int = 60,  # seconds
+    ) -> bytes:
+        """
+        Generate video with persona's avatar speaking.
+        Uses InfiniteTalk for long-form, Hunyuan for short.
+        """
+        # 1. Get avatar_id from persona
+        # 2. Get voice_id from persona (for audio)
+        # 3. Call WaveSpeed API
+        # 4. Return video bytes
+        pass
+```
+
+#### 2.2 Avatar Usage Across Platform
+
+**Use Cases**:
+- **LinkedIn Video Posts**: User's avatar presenting content
+- **Story Writer**: Avatar narrating story scenes
+- **Blog Videos**: Avatar explaining blog content
+- **Email Campaigns**: Personalized video messages
+- **Social Media**: Consistent avatar across platforms
+
+---
+
+### 3. Enhanced Persona Management
+
+#### 3.1 Persona Dashboard
+
+**New UI Component**: `frontend/src/components/Persona/PersonaDashboard.tsx`
+
+**Features**:
+- Persona overview (writing style, voice, avatar)
+- Voice training status and preview
+- Avatar preview and management
+- Usage statistics (where persona is used)
+- Edit/update options
+
+#### 3.2 Persona Settings
+
+**New UI Component**: `frontend/src/components/Persona/PersonaSettings.tsx`
+
+**Settings**:
+- Voice parameters (emotion, speed, tone)
+- Avatar appearance (clothing, background, style)
+- Platform-specific adaptations
+- Content type preferences
+
+---
+
+## Implementation Phases
+
+### Phase 1: Voice Cloning Integration (Week 1-3)
+
+**Priority**: HIGH - Core hyper-personalization feature
+
+**Tasks**:
+1. ✅ Create `VoicePersonaService`
+2. ✅ Integrate Minimax voice clone API
+3. ✅ Add voice fields to `WritingPersona` model
+4. ✅ Update onboarding Step 6 with voice training
+5. ✅ Create voice training UI component
+6. ✅ Add voice preview and testing
+7. ✅ Integrate voice into Story Writer
+8. ✅ Add voice usage tracking
+9. ✅ Update persona dashboard
+10. ✅ Testing and optimization
+
+**Files to Create**:
+- `backend/services/persona/voice_persona_service.py`
+- `frontend/src/components/Onboarding/VoiceTrainingSection.tsx`
+- `frontend/src/components/Persona/VoiceManagement.tsx`
+
+**Files to Modify**:
+- `backend/models/persona_models.py`
+- `backend/services/persona_analysis_service.py`
+- `backend/api/onboarding_utils/` (onboarding routes)
+- `frontend/src/components/Onboarding/PersonaGenerationStep.tsx`
+- `backend/services/story_writer/audio_generation_service.py`
+
+**Success Criteria**:
+- Users can train voice during onboarding
+- Voice used automatically in Story Writer
+- Voice quality significantly better than gTTS
+- Voice linked to persona
+- Cost tracking accurate
+
+---
+
+### Phase 2: Avatar Creation Integration (Week 4-6)
+
+**Priority**: HIGH - Visual personalization
+
+**Tasks**:
+1. ✅ Create `AvatarPersonaService`
+2. ✅ Integrate Hunyuan Avatar API
+3. ✅ Add avatar fields to `WritingPersona` model
+4. ✅ Update onboarding Step 6 with avatar creation
+5. ✅ Create avatar creation UI component
+6. ✅ Add avatar preview and testing
+7. ✅ Integrate avatar into content generation
+8. ✅ Add avatar usage tracking
+9. ✅ Update persona dashboard
+10. ✅ Testing and optimization
+
+**Files to Create**:
+- `backend/services/persona/avatar_persona_service.py`
+- `frontend/src/components/Onboarding/AvatarCreationSection.tsx`
+- `frontend/src/components/Persona/AvatarManagement.tsx`
+
+**Files to Modify**:
+- `backend/models/persona_models.py`
+- `backend/services/persona_analysis_service.py`
+- `frontend/src/components/Onboarding/PersonaGenerationStep.tsx`
+- `backend/services/story_writer/video_generation_service.py`
+
+**Success Criteria**:
+- Users can create avatar during onboarding
+- Avatar used in video content generation
+- Avatar quality good
+- Avatar linked to persona
+- Cost tracking accurate
+
+---
+
+### Phase 3: Cross-Platform Integration (Week 7-8)
+
+**Priority**: MEDIUM - Complete hyper-personalization
+
+**Tasks**:
+1. ✅ Integrate persona voice into LinkedIn Writer
+2. ✅ Integrate persona avatar into LinkedIn Writer
+3. ✅ Integrate persona voice into Blog Writer
+4. ✅ Integrate persona avatar into Blog Writer
+5. ✅ Add persona usage analytics
+6. ✅ Update all content generation services
+7. ✅ Create persona usage dashboard
+8. ✅ Documentation and user guides
+
+**Success Criteria**:
+- Persona voice/avatar used across all platforms
+- Consistent brand identity
+- Good user experience
+- Analytics working
+
+---
+
+## Cost Management
+
+### Voice Cloning Costs
+
+**One-Time Training**: $0.75 per voice
+**Per-Minute Generation**: $0.02 per minute
+
+**Cost Optimization**:
+- Train voice once during onboarding (included in Pro/Enterprise)
+- Free tier: gTTS only
+- Basic tier: Voice training available ($0.75 one-time)
+- Pro/Enterprise: Voice training included
+
+### Avatar Creation Costs
+
+**Hunyuan Avatar**: $0.15-0.30 per 5 seconds
+**InfiniteTalk**: $0.15-0.30 per 5 seconds (up to 10 minutes)
+
+**Cost Optimization**:
+- Avatar creation: One-time during onboarding
+- Video generation: Pay-per-use
+- Default to shorter videos (5 seconds)
+- Allow longer videos for premium users
+
+### Subscription Integration
+
+**Update Subscription Tiers**:
+- **Free**: Writing persona only, no voice/avatar
+- **Basic**: Writing persona + voice training ($0.75 one-time)
+- **Pro**: Writing persona + voice + avatar creation included
+- **Enterprise**: All features + unlimited usage
+
+---
+
+## User Experience Flow
+
+### Onboarding Flow (Enhanced)
+
+```
+Step 1-5: Existing onboarding steps
+         ↓
+Step 6: Persona Generation
+         ├─ Writing Style Analysis
+         │   └─ [Progress: Analyzing your writing style...]
+         │
+         ├─ Voice Training (NEW)
+         │   ├─ Upload audio sample (1-3 minutes)
+         │   ├─ [Training your voice...] (~2-5 minutes)
+         │   ├─ Preview generated voice
+         │   └─ Approve or retrain
+         │
+         └─ Avatar Creation (NEW)
+             ├─ Upload photo
+             ├─ [Creating your avatar...] (~1-2 minutes)
+             ├─ Preview avatar
+             └─ Approve or recreate
+         ↓
+Step 7: Persona Preview
+         ├─ Writing Style Summary
+         ├─ Voice Preview
+         ├─ Avatar Preview
+         └─ Approve Complete Persona
+```
+
+### Content Generation Flow (Enhanced)
+
+```
+User creates content (LinkedIn/Blog/Story)
+         ↓
+System loads user's persona
+         ├─ Writing style → Text generation
+         ├─ Voice ID → Audio generation (if available)
+         └─ Avatar ID → Video generation (if available)
+         ↓
+Content generated with full personalization
+         ├─ Text matches writing style
+         ├─ Audio uses user's voice
+         └─ Video shows user's avatar
+```
+
+---
+
+## Technical Architecture
+
+### Backend Services
+
+```
+backend/services/
+├── persona/
+│   ├── __init__.py
+│   ├── voice_persona_service.py      # NEW: Voice cloning
+│   ├── avatar_persona_service.py     # NEW: Avatar creation
+│   └── persona_analysis_service.py    # Enhanced
+├── minimax/
+│   └── voice_clone.py                 # Shared with Story Writer
+└── wavespeed/
+    └── avatar_generation.py           # Shared with Story Writer
+```
+
+### Frontend Components
+
+```
+frontend/src/components/
+├── Onboarding/
+│   ├── PersonaGenerationStep.tsx       # Enhanced
+│   ├── VoiceTrainingSection.tsx       # NEW
+│   └── AvatarCreationSection.tsx       # NEW
+└── Persona/
+    ├── PersonaDashboard.tsx            # NEW
+    ├── VoiceManagement.tsx             # NEW
+    ├── AvatarManagement.tsx            # NEW
+    └── PersonaSettings.tsx             # NEW
+```
+
+### Database Schema
+
+```sql
+-- Enhanced WritingPersona table
+ALTER TABLE writing_persona ADD COLUMN voice_id VARCHAR(255);
+ALTER TABLE writing_persona ADD COLUMN voice_training_status VARCHAR(50);
+ALTER TABLE writing_persona ADD COLUMN voice_training_audio_url VARCHAR(500);
+ALTER TABLE writing_persona ADD COLUMN voice_trained_at TIMESTAMP;
+
+ALTER TABLE writing_persona ADD COLUMN avatar_id VARCHAR(255);
+ALTER TABLE writing_persona ADD COLUMN avatar_image_url VARCHAR(500);
+ALTER TABLE writing_persona ADD COLUMN avatar_training_status VARCHAR(50);
+ALTER TABLE writing_persona ADD COLUMN avatar_created_at TIMESTAMP;
+```
+
+---
+
+## Integration with Existing Systems
+
+### Story Writer Integration
+
+**Location**: `backend/services/story_writer/audio_generation_service.py`
+
+**Enhancement**:
+```python
+def generate_scene_audio(
+    self,
+    scene: Dict[str, Any],
+    user_id: str,
+    use_persona_voice: bool = True,  # NEW: Use persona voice
+) -> Dict[str, Any]:
+    if use_persona_voice:
+        # Get user's persona
+        persona = get_persona(user_id)
+        if persona.voice_id and persona.voice_training_status == 'ready':
+            # Use persona voice
+            return self._generate_with_persona_voice(scene, persona)
+    
+    # Fallback to default provider
+    return self._generate_with_gtts(scene)
+```
+
+### LinkedIn Writer Integration
+
+**Enhancement**: Add video generation with persona avatar
+- LinkedIn video posts with user's avatar
+- Voice-over with user's voice
+- Consistent brand presence
+
+### Blog Writer Integration
+
+**Enhancement**: Add audio/video options
+- Audio narration with persona voice
+- Video explanations with persona avatar
+- Enhanced blog content
+
+---
+
+## Success Metrics
+
+### Adoption Metrics
+- Voice training completion rate (target: >60% of Pro users)
+- Avatar creation completion rate (target: >50% of Pro users)
+- Persona usage across platforms (target: >80% of content uses persona)
+
+### Quality Metrics
+- Voice quality satisfaction (target: >4.5/5)
+- Avatar quality satisfaction (target: >4.5/5)
+- Brand consistency score (target: >90%)
+
+### Business Metrics
+- User retention (persona users vs. non-persona)
+- Content engagement (persona content vs. generic)
+- Premium tier conversion (persona as differentiator)
+
+---
+
+## Risk Mitigation
+
+| Risk | Mitigation |
+|------|------------|
+| Voice training failure | Quality checks, clear error messages, retry option |
+| Avatar quality issues | Preview before approval, regeneration option |
+| Cost concerns | Clear pricing, tier-based access, cost estimates |
+| User privacy | Secure storage, opt-in consent, data encryption |
+| API reliability | Fallback options, retry logic, error handling |
+
+---
+
+## Privacy & Security
+
+### Data Storage
+- Voice samples: Encrypted storage, deleted after training
+- Avatar photos: Encrypted storage, user can delete
+- Voice/Avatar IDs: Secure API keys, no raw data stored
+
+### User Control
+- Users can delete voice/avatar anytime
+- Users can retrain voice/avatar
+- Users can opt-out of voice/avatar features
+- Clear privacy policy
+
+---
+
+## Next Steps
+
+1. **Week 1**: Set up Minimax API access
+2. **Week 1-2**: Implement voice persona service
+3. **Week 2-3**: Integrate into onboarding
+4. **Week 3-4**: Integrate into Story Writer
+5. **Week 4-5**: Set up WaveSpeed avatar API
+6. **Week 5-6**: Implement avatar persona service
+7. **Week 6-7**: Integrate into onboarding
+8. **Week 7-8**: Cross-platform integration
+
+---
+
+*Document Version: 1.0*  
+*Last Updated: January 2025*  
+*Priority: HIGH - Core Hyper-Personalization Feature*
+